Pluggable Memory Pooling for Distributed AI Workloads
CXL-style memory aggregation across heterogeneous nodes, with allocation policies tuned for the activation-tensor traffic that AI workloads actually generate.
Mechanism
CXL-style memory aggregation across heterogeneous nodes, with allocation policies tuned for the activation-tensor traffic that AI workloads actually generate. The CXL pool primitive is referenced in Infinity (US Prov. 64/055,093) as an apparatus-layer element of the gradient consensus substrate, where the temporally coherent CXL.mem pool is indexed by a 64-bit regime identifier. This research line extends that substrate primitive to a pluggable memory pool where the allocation policy is workload-aware. Activation tensors with high re-read frequency are pinned to lower-latency tiers. Gradient checkpoints with low re-read frequency are eligible for far-tier pooling. The textbook CXL allocator is workload-agnostic. The research question is what falls out when allocation is informed by the activation-tensor reuse profile rather than by a generic LRU or interleave heuristic.
Why this matters
- Activation-tensor traffic dominates training memory pressure at sub-datacenter scale. Allocator decisions made without modeling reuse frequency leave significant headroom on the table, particularly for adaptation passes where checkpoint placement and activation recomputation trade against each other.
- CXL 3.0 and 3.1 ratification has opened up the pluggable-pool design point. The policy layer remains under-explored relative to the protocol layer. This is the surface where there is room for technical differentiation that does not depend on winning a protocol-stack race.
- Status is Research. A filing decision is deferred until the allocation-policy work is empirically validated against a heterogeneous test bed. We do not want to anchor a claim on policy performance numbers we have not yet measured.
Status and what's next
Active research. Allocation-policy characterization is in progress against representative activation-tensor reuse profiles. Honest disclosure: there is no published benchmark on this surface at the time of writing. Expected output is a workshop preprint within the next 12 to 18 months, at which point the filing question will be re-opened with measured policy curves in hand rather than projections.