Hardware-Aware Quantization for On-Device Decoders

Bit-depth schedules and dequantization layouts tuned for the actual memory hierarchies on shipping consumer silicon, not the textbook abstraction.

Status

Research

Patent

None filed

Workload

HDR decode

SoC tier

S922X-J class

Mechanism

Bit-depth schedules and dequantization layouts tuned for the actual memory hierarchies on shipping consumer silicon, not the textbook abstraction. The textbook treatment of quantization assumes uniform memory bandwidth; consumer silicon has a tiered hierarchy where bit-depth choice determines whether a tensor stays in on-die SRAM or spills to LPDDR, with order-of-magnitude latency implications. The research line characterizes the bit-depth-vs-fidelity Pareto specifically for HDR decode workloads on the SoC tier targeted by TsugiNode (Amlogic S922X-J class).

Why this matters

Quantization decisions made without modeling the actual memory hierarchy regularly leave 30 to 50 percent of available silicon performance on the table. The gap is structural, not implementation lazy. A schedule that ignores the SRAM-to-LPDDR boundary will spill working sets across the slowest tier of the hierarchy regardless of how well it is coded.
The work is co-located with TsugiCinema's filed dual-layer compression decoder (US Prov. 64/054,446), which means quantization decisions are validated end-to-end against a real decode pipeline rather than a synthetic benchmark. The fidelity targets are HDR encode-decode parity on the Amlogic S922X-J integration testbed, not generic tensor-recovery error.
Status: Research. Filing decision deferred until empirical work clears the prior-art landscape on bit-depth scheduling and HDR-specific quantization techniques. The lab's posture is to develop the empirical fortress first and decide on filing only when the differentiation against existing literature is measurable.

Status and what's next

Active research. The near-term work is characterization of the bit-depth-vs-fidelity Pareto on the TsugiNode integration target, instrumented across the SRAM, LPDDR, and NAND tiers of the hierarchy. Honest disclosure: there is no published benchmark suite from the lab on this surface at the time of writing. The artifact that will close the loop is a measured Pareto curve on a real HDR decode workload running on the bring-up SoC, not a simulator.