8Z-Audio Master Plan — Two Tracks to SOTA

01Where We Stand: Current Results

−31 KB

aFLAC/avFLAC vs FLAC-12 (Ethereal Arc)

−134 KB

aFLAC/avFLAC vs FLAC-12 (Abyssal)

+6%

Gap to OptimFROG

10/13

avFLAC Benchmark Wins

Working Encoders

Planned Phases

The 8Z-Audio project has two codec families: a custom .8za format (full architectural freedom) and three complementary FLAC encoders producing spec-compliant .flac files. Both families share the same MDL+DCC philosophy.

Encoder	Format	Ethereal Arc	Abyssal	vs FLAC-12	vs OFR
OptimFROG (best)	.ofr	7,530,383	16,292,401	−5.8%	baseline
WavPack -hhx6	.wv	7,934,862	—	−0.8%	+5.4%
ffmpeg cholesky	.flac	7,966,029	—	−0.4%	+5.8%
FLAC -12	.flac	7,996,121	17,320,224	baseline	+6.2%
aFLAC/avFLAC	.flac	7,965,119	17,186,224	−0.39%	+5.8%
8Z-AC v1.7H (GPT)	.8za	8,050,982	—	+0.7%	+6.9%

The honest assessment: On the FLAC side, aFLAC/avFLAC already beat FLAC-12 on both test songs and win 10/13 benchmark clips (net −87,740 bytes). On the AC side, v1.7H trails FLAC by 0.7% on Ethereal Arc but beats it on other content. The gap to OptimFROG is ~6% on both codec families — that gap is the target.

Two Key Innovations — What Makes 8Z Different

MDL Frame Battles: Multiple predictors compete per frame, Minimum Description Length picks the winner. No heuristics — pure information-theoretic optimality. Every frame independently selects predictor + parameters + stereo mode + entropy coder.

DCC (Digital Claustrum Controller): Tracks Lempel-Ziv complexity of the decision stream to adapt search budget. Same winner repeating → narrow search (signal is simple). Winners jumping chaotically → widen search (needs deep exploration). Maps coupling parameter u ∈ [0,1] to search depth per frame.

02Two Codec Families, Two Improvement Tracks

Each family has its own improvement track with independent phases. The tracks can run in parallel — xFLAC improvements don't depend on AC changes and vice versa.

Track 1: xFLAC Sprint

Enhanced FLAC encoders — constrained by FLAC spec but targeting 100% standard-compliant .flac output

X1 Transient-Aware Block Splitting
X2 LPC Search Expansion + Tonal Boost
X3 Stitching Overhead Elimination

Files: vFLAC, aFLAC, avFLAC

Track 2: AC Classical Max

Native .8za format — unconstrained, full freedom to implement any technique

C1 rANS Entropy Coder
C2 OLS + NLMS Cascade Predictor
C3 Joint Stereo Prediction
C4 Bitplane Coder with SSE
C5 DDS Meta-Parameter Optimization

Files: 8Z-AC.py, 8Z_Encode.py

Sequencing: Each phase within a track must be validated before the next begins. But phases across tracks are independent — X1 and C1 can run in parallel sessions. All phases share a non-regression principle: MDL ensures new competitors can only improve or tie, never regress.

03Track 1: xFLAC Optimization Sprint

xFLAC · FLAC-Constrained · 3 Phases

The xFLAC track attacks three specific weaknesses exposed by the benchmark and CSV forensics. These are engineering improvements within the FLAC spec — no format changes required.

X1 — Transient-Aware Block Splitting

The #1 vFLAC Weakness: Mixed Content

vFLAC's 3-tier block sizing (tonal→16384, mixed→8192, transient→4096) catastrophically fails on mixed content: +11.2% on EOTS, +11.9% on LRR versus FLAC-12. Root cause: 8192-sample blocks containing drum hits force LPC to model transient attacks as predictable, wasting coefficient bits.

Solution: Sample-level onset detection using a two-stage energy derivative detector. Place short 1024–2048 sample blocks precisely around transient attacks, then immediately return to 16384-sample blocks for sustain/decay. The FLAC spec allows any block size 16–65535 — no other FLAC encoder exploits this at sample-level precision.

Algorithm: (1) Compute short-term energy in 64-sample windows, (2) detect rapid energy increases (derivative threshold), (3) refine onset position to ±32 samples, (4) place short block straddling onset, (5) resume large blocks after energy stabilizes.

Expected gain: Recover 50–80% of mixed-content losses → vFLAC competitive on EOTS, LRR, WDIG clips where it currently loses badly.

Risk: Low — onset detection is well-studied DSP, and MDL validates each block plan.

X2 — LPC Search Expansion & DCC-Gated Tonal Boost

The GPT Advantage: Higher Orders Win

CSV forensics prove GPT v1.7H beats GEM by 80 KB and CLA by 17 KB on Ethereal Arc because it finds better LPC order/window/qlevel combinations. On tonal frames, blackman/o=32/ql=8 consistently outperforms more conservative choices. GPT wins 262 of 263 frames.

Solution: Expand vFLAC's LPC search: TOP_K 16→24, add tukey_half window, widen qlevel range. Add DCC-gated "tonal boost" mode for HIGH-budget frames — exhaustive search over orders 20–32 with all windows when the scanner classifies a frame as pure tonal (difficulty < 0.05, pred_gain > 35 dB).

Expected gain: 0.3–0.8% on tonal-dominant content (orchestral, sustained synth, clean guitar).

Risk: Very low — wider search can only find equal or better configurations.

X3 — Stitching Overhead Elimination

The #1 aFLAC/avFLAC Bottleneck: Headers Eat Raw Wins

aFLAC's raw-optimal segment sums beat whole-file lax_t6 by 7 KB on Ethereal Arc and 31 KB on Abyssal. But segment headers eat ~29 KB and ~45 KB respectively, turning compression wins into file-size losses. The raw compression is already superior — the problem is purely structural.

Ethereal Arc:
  aFLAC raw-optimal sum:  7,936,170B  (beats lax_t6 raw by 7,133B)
  aFLAC stitched output:  7,965,119B  (headers ~28,949B for 16 segments)
  Net result: +13,530B loss after stitching

Abyssal:
  avFLAC raw-optimal:    17,280,558B  (beats lax_t6 raw by 31,038B!)
  avFLAC stitched:       17,325,102B  (headers ~44,544B for 22 segments)
  Net result: headers ate the 31 KB raw win

Solution — three complementary approaches:

(1) Frame-level competition: Compete per FLAC frame, not per segment — eliminates segment boundaries entirely. (2) Zero-overhead stitching: Extract raw subframe payloads from winners, rebuild minimal FLAC headers with correct sample offsets. (3) Adaptive segment merging: Post-arena, merge adjacent segments that chose the same winner encoder to eliminate redundant boundaries.

Expected gain: Eliminate 80–100% of stitching overhead → aFLAC/avFLAC fully realizes its raw compression advantage. On Ethereal Arc, converts the −7 KB raw win into a −7 KB file win (currently a +13 KB file loss).

Risk: Medium — frame-level extraction requires precise FLAC binary parsing, and CRC recalculation must be exact for spec compliance.

04Track 2: AC Classical Max (C1–C5)

AC · Unconstrained · 5 Phases · Goal: Close 6% Gap to OFR

The AC track upgrades the native .8za codec with classical compression techniques that exceed what any FLAC-format encoder can achieve. Each phase adds a new competitor to the MDL arena — if it doesn't help, it simply never wins, and compression stays at current levels.

The encode.su consensus: "Most of the compression of OptimFROG and SAC comes from low-order OLS" — the predictor matters more than the entropy coder. Research suggests ~60% of the 6% gap to OFR is prediction (3.6%) and ~40% is entropy coding (2.4%). C1 addresses entropy; C2 addresses the bigger predictor side.

C1 — rANS Entropy Coder (MDL Arena Addition)

Phase C1: rANS Replaces Rice on Hard Frames

Rice coding assumes Laplacian residual distribution. Real LPC residuals have heavier tails, slight asymmetry, and context-dependent clustering. rANS with adaptive probability models captures the true distribution to fractional-bit precision.

Key principle: We do NOT replace Rice — we ADD rANS alongside existing Rice and LZMA. Per subframe partition, all three compete and MDL picks the cheapest. Rice stays whenever it's genuinely optimal (clean tonal frames with near-Laplace distribution, zero overhead for probability tables).

Implementation: Port _RansEnc/_RansDec/_RansAdaptive from HYB4 (battle-tested on 50+ genomes). M=4096 precision, per-partition selection. rANS probability table cost explicitly included in MDL calculation.

Where rANS wins: Mixed content (heavier tails, bimodal), transients (spiky, non-Laplace), low-periodicity (noisy, uneven distribution). Expected 3–8% per partition on these frames, ~0% on clean tonal.

Expected gain: 2–4% overall compression improvement depending on content mix.

Risk: Low — proven technique, working code exists in HYB4.

C2 — OLS + NLMS Cascade Predictor

Phase C2: The Biggest Single Upgrade

Replace Levinson-Durbin autocorrelation LPC with OLS (Ordinary Least Squares) as primary predictor, cascaded with NLMS (Normalized Least Mean Squares) adaptive filter on residuals.

Standard LPC (current):
  1. Compute autocorrelation → assumes stationarity
  2. Solve Toeplitz system via Levinson-Durbin
  3. Coefficients minimize MSE under stationarity assumption

OLS (what we're adding):
  1. Form actual sample matrix X (past samples as rows)
  2. Solve X'X β = X'y via Cholesky decomposition
  3. Coefficients minimize MSE on ACTUAL samples, no stationarity assumption

NLMS cascade (second stage):
  1. Take OLS residuals (already much smaller than LPC residuals)
  2. Run 16–64 tap NLMS adaptive filter
  3. NLMS tracks slowly-changing correlation that OLS missed
  4. Entropy code the double-filtered residuals

Why this matters: SAC proves this architecture works — it achieves 8–12% better compression than FLAC using exactly this OLS+NLMS approach. This is the single largest identified improvement opportunity.

DCC integration: LOW budget → LPC only (fast path). MEDIUM → OLS only. HIGH → OLS + NLMS cascade. MDL always compares: LPC, OLS, OLS+NLMS.

Expected gain: 3–5% compression improvement → pushes into WavPack/OFR territory.

Risk: Medium — new code, but algorithm is well-understood. DCC gating ensures no regression on easy content.

C3 — Joint Stereo Prediction (Generalized Decorrelation)

Phase C3: Cross-Channel Prediction

Current stereo handling transforms L/R into one of 4 modes (Independent, Mid/Side, Left-Side, Right-Side) then predicts each channel independently. This misses three critical correlation types:

Delayed cross-channel correlation: Left channel at time n correlates with right at time n−k (room acoustics, mic spacing). M/S only captures k=0. Frequency-dependent correlation: Bass may be mono while treble is stereo — M/S is all-or-nothing per frame. Asymmetric correlation: One channel may predict the other better than vice versa.

Solution: Generalized stereo decorrelation (Ghido, IEEE 2003). Joint predictor uses past samples from BOTH channels as inputs — OLS naturally handles the cross-channel matrix. Add cross-channel orders {0, 4, 8, 12} alongside same-channel orders 1–32.

Expected gain: ~1.5% on stereo content with significant cross-channel correlation.

Risk: Low — well-published technique, clean integration with OLS from C2.

C4 — Bitplane Coder with SSE

Phase C4: Context-Dependent Probability Models

Rice and rANS both treat each symbol independently. A bitplane coder exploits context — the magnitude of residual[n] correlates with residual[n−1] (clustering), the other channel, position within frame, and signal class.

How it works: (1) Separate residuals into bitplanes (MSB to LSB), (2) for each bit position, build a probability model conditioned on context (upper bitplanes, neighboring samples, other channel), (3) encode each bit with rANS using the conditional probability, (4) SSE (Secondary Symbol Estimation) adaptively corrects the primary model using recent coding history.

This is fundamentally different from Rice/rANS which code entire symbols. Bitplane coding decomposes the problem into binary decisions, each with its own optimized probability. SAC and PAQ both use this approach for their strongest compression.

Expected gain: 1–3% on residuals with strong context dependencies (orchestral, dynamic content).

Risk: Medium-high — most complex phase, but enters MDL arena so zero regression risk.

C5 — DDS Meta-Parameter Optimization

Phase C5: The Exhaustive Last Mile

After C1–C4, the per-frame parameter space explodes:

Predictor:      {LPC, OLS, OLS_NLMS, PERIODIC, DELTA}     5 choices
Stereo mode:    {INDEP, M/S, L-S, R-S, JOINT, JOINT_M/S}   6 choices
Same-ch order:  1–32                                        32 choices
Cross-ch order: {0, 4, 8, 12}                                4 choices
Window:         {hann, blackman, tukey, none}                 4 choices
Qlevel:         8–16                                          9 choices
NLMS order:     {0, 8, 16}                                    3 choices
NLMS mu:        {0.3, 0.5, 1.0}                               3 choices
Entropy:        {Rice, rANS, Bitplane, LZMA}                  4 choices

Total: ~14.9 million combinations per frame

DDS (Dynamically Dimensioned Search): A black-box optimizer that finds near-optimal configurations in ~100–500 evaluations, not millions. DDS starts by exploring all dimensions, then gradually narrows to fine-tune the most sensitive parameters. Each evaluation = one complete encode-and-measure cycle for that frame.

DCC integration: DDS budget scales with DCC difficulty classification — LOW frames get 50 evaluations (fast), HIGH frames get 500 (exhaustive). MDL is the objective function.

Expected gain: 0.5–1.5% — picks up the interactions between parameters that grid search misses.

Risk: Low — DDS is well-studied optimization, and the worst case is matching grid search results.

05Beyond Classical: Novel Predictors (Horizon)

The C1–C5 phases bring 8Z to competitive parity with existing SOTA codecs using known techniques. The following predictors are where 8Z-Audio becomes something no other codec has attempted — they are research-grade additions planned after Classical Max validation.

Chebyshev Harmonic Predictor

Physics-Based Prediction for Distorted Content

Distorted audio (guitar amp, synth saturation, industrial) passes through Chebyshev polynomial waveshapers — a known mathematical function. The LPC residual on distorted content is dominated by harmonics created by this function. A single parameter (distortion level) determines the entire harmonic series.

Implementation: (1) Detect frames with harmonic structure in residual, (2) estimate fundamental frequency, (3) fit Chebyshev coefficients, (4) predict residual using Chebyshev model, (5) encode parameters + fine residual. MDL decides if cheaper than raw residual.

Expected gain: Up to 22% on heavily distorted content. This is exactly the content where OFR collapses (+26–39% regression vs FLAC). 8Z would exploit OFR's weakness.

Enhanced PERIODIC + Physical Audio Models

Signal-Specific Generator Library

Current PERIODIC never fires on Ethereal Arc (0.85 correlation threshold too strict for orchestral music). Lower threshold to 0.70 with MDL gating — let the cost function decide, not a hardcoded threshold. Run PERIODIC on LPC/OLS residuals, not just raw samples.

Beyond PERIODIC: a library of physical audio models entering the MDL arena — vibrato (FM sinusoid, 3 params), reverb tail (exponential decay, 2 params), drum transient (damped oscillator, 4 params), tremolo (AM signal, 3 params). Each model is 4–12 bytes of parameters, deterministic, physically grounded. DCC gates which to try.

06Implementation Phases

Phases are listed in recommended execution order. Within each track, phases are sequential (each validated before the next). Across tracks, phases can run in parallel.

xFLAC Track (X1 → X2 → X3)

X1 — Transient Detection & Adaptive Block Splitting

vFLAC v1.3→v1.4 · avFLAC v1.2→v1.3 · Est. 8–12 hours

Two-stage onset detector, sample-level block boundaries, recover 50–80% mixed-content losses. Test: vFLAC must not regress on tonal clips, must improve ≥5% on EOTS/LRR.

X2 — LPC Search Expansion & DCC Tonal Boost

vFLAC v1.4→v1.5 · Est. 6–8 hours

TOP_K 16→24, tukey_half window, DCC-gated exhaustive search on HIGH-budget tonal frames (orders 20–32, all windows). Test: match or beat GPT v1.7H frame-level choices on Ethereal Arc.

X3 — Stitching Overhead Elimination

aFLAC v1.3→v1.4 · avFLAC v1.2→v1.3 · Est. 10–14 hours

Frame-level competition, zero-overhead header rebuild, adaptive segment merging. Test: stitched output must be ≤ raw-optimal sum + 1 KB overhead. FLAC decoder validation (ffmpeg, flac -t) mandatory.

AC Classical Max Track (C1 → C2 → C3 → C4 → C5)

C1 — rANS Entropy Coder

8Z-AC v1.7H→v1.8 · Est. 6–8 hours

Port rANS from HYB4, M=4096 precision, per-partition MDL battle vs Rice and LZMA. Test: must improve at least 30% of frames on Ethereal Arc; total file size must decrease.

C2 — OLS + NLMS Cascade Predictor

8Z-AC v1.8→v1.9 · Est. 10–14 hours

Replace Levinson-Durbin with OLS (Cholesky), add 16–64 tap NLMS on residuals, DCC-gated depth. Test: must beat v1.8 on at least 50% of frames; total file must shrink by ≥2%.

C3 — Joint Stereo Prediction

8Z-AC v1.9→v2.0 · Est. 6–8 hours

Cross-channel OLS with {0,4,8,12} cross-orders, 6 stereo modes. Test: must improve stereo content with significant L/R correlation; no regression on mono-like content.

C4 — Bitplane Coder with SSE

8Z-AC v2.0→v2.1 · Est. 12–16 hours

MSB-to-LSB processing, context-dependent probability models, SSE adaptive correction. MDL battle vs Rice, rANS, LZMA per partition. Test: must win ≥20% of partitions on mixed content.

C5 — DDS Meta-Parameter Optimization

8Z-AC v2.1→v2.2 · Est. 8–10 hours

Black-box optimization over all C1–C4 parameters, 100–500 evaluations per frame (DCC-gated). Test: must match or beat grid search on every frame; net file size must decrease.

07Expected Performance Outcomes

AC Track (Ethereal Arc, 48kHz 16-bit, 17,197,814 bytes PCM)

Configuration	Est. Size	Ratio	vs Current	vs FLAC	vs OFR
AC v1.7H (current)	8,050,982	46.81%	baseline	+0.7%	+6.9%
+ C1 rANS	~7,890,000	~45.88%	−2.0%	−1.3%	+4.8%
+ C2 OLS+NLMS	~7,550,000	~43.90%	−6.2%	−5.6%	+0.3%
+ C3 Joint Stereo	~7,440,000	~43.26%	−7.6%	−6.9%	−1.2%
+ C4 Bitplane SSE	~7,330,000	~42.62%	−9.0%	−8.3%	−2.7%
+ C5 DDS (full stack)	~7,250,000	~42.16%	−9.9%	−9.3%	−3.7%
Reference: OFR	7,530,383	43.79%	—	—	0%

xFLAC Track (Ethereal Arc)

Configuration	Est. Size	vs FLAC-12	Notes
avFLAC v1.2 (current)	7,965,119	−0.39%	Already beats FLAC-12
+ X1 Transient Blocks	~7,940,000	−0.70%	Minor gain on Ethereal (mainly helps mixed clips)
+ X2 LPC Tonal Boost	~7,910,000	−1.08%	Tonal frames benefit most
+ X3 Stitch Elimination	~7,870,000	−1.58%	Converts raw wins to file wins
Reference: FLAC-12	7,996,121	0%

The trajectory: The AC track targets closing and surpassing OptimFROG — C2 (OLS+NLMS) alone brings us to near-parity, and C3–C5 push beyond. The xFLAC track aims to produce the world's best FLAC files, period — beating FLAC-12 by 1.5%+ while maintaining 100% spec compliance. Both tracks share the MDL+DCC foundation.

08Risk Matrix

Risk	Prob.	Impact	Mitigation
X1: Onset detector false positives → excess blocks	Med	Low	MDL validates each block plan; fallback to 4096
X3: Frame-level FLAC parsing errors	Med	High	ffmpeg + flac -t validation mandatory per output
C1: rANS overhead exceeds Rice on most frames	Low	Low	MDL gating — Rice always available as fallback
C2: OLS Cholesky fails on ill-conditioned matrices	Med	Med	Regularization + fallback to Levinson-Durbin LPC
C2: NLMS convergence too slow for short frames	Med	Med	DCC gates: skip cascade on LOW budget frames
C3: Cross-channel orders over-fit noise	Low	Low	MDL penalizes excess model bits
C4: Bitplane SSE too slow in Python	High	Med	DCC gates for HIGH-budget frames only; C-level inner loop
C5: DDS doesn't beat grid search	Med	Low	Worst case: matches existing results, no regression
Format incompatibility with v1.7H	Certain	Low	R&D phase — no backward compatibility required
Combined speed regression > 10×	Med	Med	Compression levels: fast path skips new features

09Success Criteria

xFLAC Track Ships If And Only If

✅ Bit-perfect lossless at all configurations (decoded WAV matches original)

✅ All outputs pass flac -t and ffmpeg -v error -i validation

✅ avFLAC beats FLAC-12 on ≥11/13 benchmark clips (currently 10/13)

✅ Mixed-content clips (EOTS, LRR) regression reduced by ≥50%

✅ Stitched output within 1 KB of raw-optimal segment sums

✅ Forensic CSV logging for all frames and all configurations

AC Classical Max Ships If And Only If

✅ Bit-perfect lossless at all compression levels (SHA3 round-trip verified)

✅ Each phase independently improves or ties total compression (no regression)

✅ Full C1–C5 stack beats FLAC-12 by ≥5% on Ethereal Arc

✅ Full C1–C5 stack within 5% of or beats OptimFROG on Ethereal Arc

✅ 15-clip benchmark with comparative analysis vs FLAC-12, OFR, WavPack

✅ Forensic CSV logging for all levels and all predictors

Two tracks. Eight phases. One MDL arena.

Same philosophy everywhere — let the math decide.

10The Argument

Why This Plan, Why Now

1. The LPC+Rice ceiling is real. v1.6.1 → v1.7H gained only 20.6 KB (0.25%) in 2× the time. Diminishing returns are empirically proven.

2. The gap is in the predictor, not the entropy coder. GPT's 16 KB win proves entropy tuning is marginal. OFR's 520 KB lead proves multi-layer prediction is where the bytes are.

3. The components exist. rANS is ported from HYB4. OLS is textbook linear algebra. NLMS is textbook DSP. Bitplane coding is proven in SAC/PAQ. DDS is published optimization. The MDL arena is domain-agnostic. DCC gates everything.

4. The two tracks are complementary. xFLAC produces the world's best FLAC files for interoperability. AC pushes the absolute compression frontier. Both share MDL+DCC. Results from one track inform the other.

5. The philosophy demands it. "Don't accept limits without evidence." The limit isn't FLAC — it's the assumption that LPC is the best predictor for all audio content. Evidence from SAC, OFR, and the Chebyshev insight says otherwise.

Estimated total development: 66–90 hours across 8 phases (24–34 hours xFLAC + 42–56 hours AC). Each phase is independently testable. No phase depends on cross-track success. The MDL arena ensures no regression: if a new technique doesn't help, it simply never wins, and compression stays at current levels.