NexTrain: Training Custom AI Models Without NVLink-Class Clusters

Modern AI training has a hardware problem.

Most distributed training stacks assume that serious model training requires expensive GPU clusters with high-bandwidth GPU-GPU communication. In practice, that often means NVLink, NVSwitch, InfiniBand, or other premium interconnects.

That assumption makes custom model training expensive and inaccessible for many companies.

At PocketBrains, we are building NexTrain to challenge that assumption.

NexTrain is training infrastructure for custom AI models. It is designed to reduce dependence on GPU-GPU communication and make training possible on a broader range of GPU configurations — including commodity multi-GPU machines and rented cloud GPUs that do not require NVLink-class interconnects.

Our goal is simple: train custom AI models on the GPUs customers can actually access.

The problem with modern training infrastructure

Most companies do not start with a perfect training environment. They may have:

A few RTX 4090 or 5090 machines
L40S or A6000 GPUs
A100 or H100 instances without ideal interconnects
Rented GPUs from multiple providers
On-prem GPUs not designed as a frontier-model cluster
Valuable data, but no clean training pipeline

Traditional distributed training often struggles in this environment because it assumes high-bandwidth communication between GPUs.

When GPU-GPU communication becomes the bottleneck, hardware choice becomes restrictive. The training stack starts dictating the infrastructure.

That is backwards.

The customer should be able to choose the GPU configuration based on cost, availability, and business needs. The infrastructure should adapt to the hardware — not the other way around.

The NexTrain approach

NexTrain is designed around a different principle:

Minimize GPU-GPU communication and move training toward CPU/RAM-mediated execution.

Instead of depending on direct GPU-GPU collective communication as the primary path, NexTrain uses a training architecture based on:

CPU/RAM-mediated parameter movement
Parameter streaming
Memory-aware scheduling
Prefetching and double buffering
Compute and communication overlap
Flexible GPU assignment
Hardware-aware training plans

This does not mean that every GPU can train every model. GPU memory, CUDA support, host RAM, PCIe bandwidth, storage throughput, and kernel compatibility still matter.

But it does mean that NexTrain is designed to reduce the need for premium GPU-GPU interconnects. That changes the economics of custom model training.

Why removing NVLink from the critical path matters

NVLink is powerful. For many workloads, it is extremely useful.

But requiring NVLink-class infrastructure for custom model training creates a major barrier. Many companies have access to GPUs, but not the right kind of GPU cluster. They may have compute, but not the premium interconnect.

NexTrain is designed to make that compute usable.

The key idea is not to replace NVLink as hardware. The key idea is to avoid making NVLink mandatory.

If training can be structured so that GPUs do not need to constantly communicate with each other, then customers can train models on more accessible hardware. This enables:

Lower training cost
Broader GPU compatibility
Better use of existing hardware
More flexible GPU sourcing
Faster experimentation
Reduced dependency on a single cloud or cluster type

Early benchmark results

In early experiments, we observed near-linear scaling from a single GPU to eight GPUs.

Configuration	Throughput	vs. Theoretical
1 GPU (baseline)	226 TFLOPS	—
8 GPUs (measured)	1.72 PFLOPS	95% efficiency
8 GPUs (theoretical linear)	1.808 PFLOPS	100% (target)

A single GPU reached approximately 226 TFLOPS. An eight-GPU configuration reached over 1.72 PFLOPS, against a theoretical linear target of 1.808 PFLOPS — representing approximately 95% scaling efficiency.

This result suggests that a training architecture with minimized GPU-GPU communication can preserve strong multi-GPU efficiency.

We are continuing to benchmark NexTrain across different GPU types, model sizes, sequence lengths, and training workloads.

What NexTrain is not

To be clear about scope:

NexTrain is not a GPU cloud.
NexTrain is not an inference platform.
NexTrain is not a model hosting service.
NexTrain is not trying to replace NVIDIA hardware.

NexTrain is training infrastructure that helps companies train custom AI models on flexible GPU configurations. The goal is not to sell GPUs. The goal is to make more GPUs usable for training.

How PocketBrains uses NexTrain

PocketBrains combines NexTrain with PocketAgentic, our data preparation layer.

PocketAgentic prepares messy enterprise data into training-ready datasets. NexTrain trains custom models from those datasets. The workflow looks like this:

      Messy Enterprise Data

          ↓ PocketAgentic

      Training-Ready Dataset

          ↓ NexTrain

      Custom AI Model

This lets customers go from raw internal data to a trained model without needing to build the entire data and training stack themselves.

What customers can train

NexTrain supports SFT (Supervised Fine-Tuning) and RL (Reinforcement Learning) training algorithms. Customers can bring their own model definition as a Python file, or use a standard base model.

Use cases include:

Domain-specific language understanding
Structured output generation
Formatting and extraction models
Reasoning and planning models
Enterprise knowledge models
Customer support workflows
Internal automation
Vertical AI applications

Why this matters now

The next wave of AI adoption will not be powered only by general-purpose frontier models. Enterprises want smaller, cheaper, specialized models trained on their own data.

But custom training is still too hard. The dataset is messy. The training stack is complicated. The GPU requirements are expensive.

NexTrain is built for this gap.

We believe the future of custom AI model training will be more flexible, more hardware-aware, and less dependent on premium interconnect clusters.

Our vision

PocketBrains is building the data-to-model training stack for custom AI.

PocketAgentic prepares the data. NexTrain trains the model. Customers choose the GPU path that works for them.

If your company has valuable data and wants to turn it into a custom AI model, we would love to talk. NexTrain is currently working with early design partners.

Bring your data. Choose your GPUs. Train your model.

Start a training project → Talk to the founder