TsugiLabs AI Research

Cross-domain research at the boundary between algorithms, silicon, and system architecture.

TsugiAI

Algorithms and adaptation. Continual learning, signal-quantized optimization, model behavior under hardware constraints.

Active work

Hover or tap to expand

TsugiAI

6 works

TsugiAI

The algorithms lab. Continual learning, signal-quantized optimization, and the parts of model behavior that survive contact with constrained hardware. Output is methods, ablations, and write-ups.

Active work · 6 entries

K-Pool LoRA selects an active adapter slot via a frozen-encoder Gaussian-mixture router that is never updated by the language-modeling gradient. Composition with a sign-quantized active-slot optimizer and a fragility-aware eviction policy keeps a fixed-size K-pool stable across five sequential domains at both 1.5B and 7B scales. Replay buffers are not required, which matters under HIPAA, FINRA, and ITAR regimes.

Two orthogonal lines. The Signum-class sign-quantized optimizer halves optimizer-state memory against Adam-class baselines without measurable mean-quality regression at the empirical learning-rate plateau. The renormalization-derived family covers multi-resolution refinement (MRRO) and coupled-map-lattice gradient smoothing (CMLGS) at O(D) per step, where the CMA-ES family becomes infeasible above roughly D=5000 dimensions.

Diffusion-class video generators trained on web-scraped datasets exhibit color-space drift, motion-vector inconsistency, and frame-rate quantization artifacts that pass through consumer encode chains but fail under cinema-grade evaluation (PQ HDR color volume, sub-pixel motion accuracy, 24p and 48p frame-rate fidelity). The line characterizes those drift modes and proposes calibration objectives aligned to cinema-grade encode targets.

TsugiFabric

System architecture. The substrate connecting algorithms to silicon. Distributed training, compute fabric, cross-domain co-design.

Active work

Hover or tap to expand

TsugiFabric

5 works

TsugiFabric

The system-architecture lab. The substrate connecting algorithms to silicon: distributed training synchronization, compute fabric for adaptation workloads, and cross-domain co-design from decode to display to inference.

Active work · 5 entries

White Rabbit (IEEE 1588-2019 HA) endpoint plus FPGA elastic gradient-tensor buffer plus a hysteretic phase-correction sideband, with a CXL.mem pool indexed by a 64-bit regime identifier and a bounded weight-snapshot retention pool with deterministic eviction. The substrate forms a plesiochronous gradient consensus mechanism for distributed training across heterogeneous accelerators. On an internal continual-learning benchmark the mechanism delivered a substantial reduction in catastrophic forgetting against the unsynchronized baseline, reproduced cross-vendor within tight tolerance.

The thesis lens for the portfolio. The four filed surfaces (Trinity decode, DLC compression, Infinity gradient consensus, K-Pool LoRA inference adaptation) span decode, training, and inference. The bundle premium in the portfolio’s internal commercial assessment is the financial expression of the conjecture that the substrate underneath those four can be unified at the system-architecture layer.

The captive-rack case (single-vendor NVLink fabric) does not need a vendor-neutral protocol; the rented-pool case (multi-tenant cloud-edge or DePIN-style fabric) does. The research line characterizes which sideband-signaling decisions stay vendor-neutral and which collapse into vendor-specific PHY assumptions. Architecturally adjacent to UALink Common Specification 2.0; engagement target 2026-07-30.

CXL 3.0 and 3.1 ratification opened up the pluggable-pool design point at the protocol layer; the policy layer remains under-explored. Activation tensors with high re-read frequency get pinned to lower-latency tiers; gradient checkpoints with low re-read frequency are eligible for far-tier pooling. The textbook CXL allocator is workload-agnostic; the research question is what falls out when allocation is informed by the activation-tensor reuse profile.

TsugiSilicon

Hardware engineering. Multi-stream video pipelines, hardware-aware compression, the silicon that ships in front of the viewer.

Active work

Hover or tap to expand

TsugiSilicon

5 works

TsugiSilicon

The hardware lab. Multi-stream video pipelines, hardware-aware compression, and the silicon that ships in front of the viewer. Output is reference designs, FPGA bring-up notes, and provisional patent disclosures.

Active work · 5 entries

Discrete decoder ICs run on independent slave clocks. Per-decoder async dual-port FIFOs feed a hysteretic occupancy comparator that asserts a sideband phase-correction signal (electrically distinct from the video data path) when fill exceeds a high watermark. The combination phase-locks plesiochronous decoders for scanline-accurate additive composition across base, color-delta, and gray-delta streams with saturating arithmetic.

A keyed split: a pre-encode pixel-domain transform divides a high-precision master into two complementary layers such that no single layer is viewable on its own; full-fidelity output is reconstructable only in combination, given the key. This inverts the base-layer-viewability constraint of conventional scalable-coding profiles. AOMedia AV2 12-bit professional tier is the standardization track; engagement target 2026-08-01.

Reference SystemVerilog plus characterization notes (timing-closure constraints, watermark calibration sweeps, sideband electrical layout) for the Trinity synchronization fabric on commodity FPGA targets like Lattice CrossLink-NX, Xilinx Artix-7, and Intel Cyclone-V class. The output lowers integration cost for hardware partners. The defensible IP lives in the Trinity provisional, not in this reference.

Patent portfolio

Seven filed US provisional patents.

The portfolio spans distributed training synchronization, edge video compression, plesiochronous multi-decoder coordination, continual fine-tuning of frozen language models, renormalization-derived numerical optimization, and probe-routing meta-optimization. See the full portfolio →

US Prov. 63/987,139

Sole

Trinity

Plesiochronous multi-decoder video synchronization.

Dual-Layer Compression

Keyed dual-layer video compression with non-viewable layers.

Infinity

Plesiochronous gradient consensus for distributed AI training.

K-Pool LoRA

Continual fine-tuning of frozen LLMs via a K-snapshot adapter pool.

MRRO

Multi-resolution renormalization optimizer with deterministic inverse-RG refinement.

CMLGS

Coupled-map-lattice gradient smoothing at O(D) per step.

HAMO

Probe-routing meta-optimizer for high-dimensional black-box optimization.

Filed

2026-05-27

Team

Cross-domain expertise.
Full-stack AI bench under NDA.

Cofounder & CEO

Tong Liu

Cross-domain systems engineer with 7 years in big tech as a Software Engineer at Google Search, Amazon, TripAdvisor, and Rivian Automotive. Now leads hardware architecture, FPGA and ROM engineering, and dual-layer compression at TsugiCinema. Inventor on all seven filed US provisional patents (sole inventor on five; joint inventor on two). Dual-degree B.S. in Computer and Systems Engineering and Computer Science from Rensselaer Polytechnic Institute.

Chief Science Advisor & Lead Investor

Shaheen Hoque

Three decades of guidance, navigation, and control engineering at Lockheed Martin, Raytheon, Orbital Sciences, MBDA, and Draper. Currently Director of GNC Engineering at Atropos Group and Adjunct Professor at California Polytechnic State University, San Luis Obispo. Engaged part-time with TsugiCinema today; full-time path under discussion.

Advisors

Senior AI researchers and ML systems engineers from frontier-lab backgrounds, engaged under NDA.

TsugiLabs AI Research

TsugiAI

TsugiAI

Continual Learning Without Catastrophic Forgetting

Signal-Quantized Optimization for Edge Inference

Adaptation Methods for Foundation Models on Constrained Hardware

Routing Strategies for Mixture-of-Experts on Edge Silicon

Calibration of Generative Models for Cinema-Grade Output

Open-source SDKs: pip install tsugi

TsugiFabric

TsugiFabric

Infinity: Synchronization Substrate for Distributed Training

Compute Fabric for Edge AI Workloads

Cross-Domain Co-Design: Decode-to-Display-to-Inference

Sideband Synchronization Protocol for Multi-Vendor Pools

Pluggable Memory Pooling for Distributed AI Workloads

TsugiSilicon

TsugiSilicon

Trinity: Synchronization for Multi-Decoder Video Pipelines

Dual-Layer Compression for HDR Distribution

TsugiNode: Edge Hardware for Cinema-Grade Playback

FPGA Reference Designs for Low-Latency Decode

Hardware-Aware Quantization for On-Device Decoders

Seven filed US provisional patents.

Trinity

Dual-Layer Compression

Infinity

K-Pool LoRA

MRRO

CMLGS

HAMO

Cross-domain expertise.
Full-stack AI bench under NDA.

Tong Liu

Shaheen Hoque

TsugiLabs AI Research

TsugiAI

Continual Learning Without Catastrophic Forgetting

Signal-Quantized Optimization for Edge Inference

Adaptation Methods for Foundation Models on Constrained Hardware

Routing Strategies for Mixture-of-Experts on Edge Silicon

Calibration of Generative Models for Cinema-Grade Output

Open-source SDKs: pip install tsugi

TsugiFabric

Infinity: Synchronization Substrate for Distributed Training

Compute Fabric for Edge AI Workloads

Cross-Domain Co-Design: Decode-to-Display-to-Inference

Sideband Synchronization Protocol for Multi-Vendor Pools

Pluggable Memory Pooling for Distributed AI Workloads

TsugiSilicon

Trinity: Synchronization for Multi-Decoder Video Pipelines

Dual-Layer Compression for HDR Distribution

TsugiNode: Edge Hardware for Cinema-Grade Playback

FPGA Reference Designs for Low-Latency Decode

Hardware-Aware Quantization for On-Device Decoders

Seven filed US provisional patents.

Trinity

Dual-Layer Compression

Infinity

K-Pool LoRA

MRRO

CMLGS

HAMO

Cross-domain expertise. Full-stack AI bench under NDA.

Tong Liu

Shaheen Hoque

Cross-domain expertise.
Full-stack AI bench under NDA.