Skip to main content
TsugiAI / Foundation Model Adaptation

Adaptation Methods for Foundation Models on Constrained Hardware

Parameter-efficient fine-tuning targeted at hardware below datacenter scale. Phones, edge boxes, single-GPU workstations. The constraint set is real, the regulated-regime constraint set is harder.

Status
Patent pending
Filings
1 (K-Pool LoRA)
Pillar

Mechanism

Parameter-efficient fine-tuning (PEFT) methods targeted at hardware that is not a hyperscaler training cluster. The constraint set is concrete: 16 to 32 GB of DRAM on a consumer single-GPU workstation, 8 to 16 GB on an edge box, 4 to 12 GB on a phone, and a sub-watt active power budget at the bottom of that range. In regulated regimes (HIPAA, FINRA, ITAR), no replay buffer is admissible because past examples cannot be retained for future training, which structurally rules out replay-based and most regularization-based continual learning techniques. K-Pool LoRA (US Prov. 64/060,315, filed 2026-05-07) is the filed component of this research line, contributing frozen-encoder Gaussian-mixture routing over a fixed K-snapshot LoRA pool and a Signum (sign-of-momentum) optimizer applied only to the active-slot parameters. Active research extends the K-sweep operability question, the routing-stage compute and memory cost at production batch sizes, and cross-hardware reproduction at 7B to 13B scale.

Why this matters

  • The replay-buffer prohibition under HIPAA, FINRA, and ITAR makes regularization-based and replay-based continual learning methods structurally non-viable in regulated regimes. K-Pool LoRA with fragility-aware-eviction is, to our knowledge, the only realizable mechanism in that constraint set whose mean forgetting comes close to a capacity-matched, task-id-routed baseline. That baseline is not a true per-task oracle: the five-domain benchmark is run in a four-slot pool, which forces two domains to share a slot, so the baseline itself forgets. Equivalence to a true per-task oracle (one slot per domain, which forgets near zero) is an open question we are actively testing; current diagnostics attribute the residual gap to routing quality, not slot capacity. We do not claim to have eliminated catastrophic forgetting.
  • Sign-quantized LoRA active-slot updates retain only the sign of momentum, halving optimizer-state memory footprint compared with Adam-class optimizers, without measurable mean-quality regression across the empirically-anchored 1.6-decade learning-rate plateau (lr in [5e-6, 2e-4]).
  • Frozen-encoder routing decouples the routing module from the language-modeling gradient. The gate-collapse failure mode that LoRAMoE-class trainable softmax routers exhibit at small K (confirmed empirically across five seeds at both 1.5B and 7B) does not arise.
  • The filing is scoped for counsel review before non-provisional conversion. PCT conversion deadline 2027-05-07.

Status and what is next

K-sweep with K in {2, 3, 4, 5, 6} crossed with fragility-aware-eviction on and off is pre-registered for the non-provisional. Cross-hardware reproduction at 7B to 13B on Llama-3 8B and Mistral Small 7B (TRACE benchmark) is gating any sell-side artifact. Routing-stage compute and memory cost measurement at production batch sizes is in queue. The two-scale variance table (open-weights instruction models at 1.5B and 7B scale, five seeds, five sequential domains: medical, legal, code, math, multilingual) is the empirical anchor; mean forgetting lands close to a capacity-matched, task-id-routed baseline at both scales, but a pre-registered equivalence test does not establish equivalence to a true per-task oracle, so the earlier statistical non-rejection is not read as evidence of equivalence. The 9-point learning-rate cliff localization sweep in the (2e-4, 5e-4) interval is pre-registered to refine the dependent-narrowing claim 7.3.6A enablement bracket.