8Z-Audio v1.8 Master Plan — Beyond the Hybrid

01Where We Stand: March 2026 (Updated)

8 / 18

avFLAC Wins
vs Best Baseline

−14.7 KB

avFLAC BEATS lax_t6
Abyssal (full track)

3 clips

avFLAC BEATS
OptimFROG (!)

Tested on 18 files (3 full tracks + 15 clips), max compression. AC v1.9 MDL frame-size probe correctly selects frame size per content (8192 on 24-bit, 4096 on 16-bit orchestral). avFLAC beats lax_t6 on Abyssal and 7 clips, beats OptimFROG on 3 Radiohead clips by 31–40%. All 15 clips beat FLAC-12. avFLAC invariant holds on all 18 files.

Encoder	Ethereal Arc	Ratio	vs FLAC	vs OFR
OFR (best)	7,530,383	43.79%	−5.8%	baseline
WavPack	7,934,862	46.14%	−0.8%	+5.4%
avFLAC v1.2	7,958,631	46.28%	−0.5%	+5.7%
FLAC -12	7,996,121	46.49%	baseline	+6.2%
8Z-AC v1.9	~7,993,366	46.48%	−0.03%	+6.1%

Updated assessment (March 2026): AC v1.9 now matches FLAC-12 on Ethereal Arc (was +0.7% behind). avFLAC beats lax_t6 on Abyssal and beats OFR on Radiohead. The gap to OFR on Ethereal Arc is ~6% — the C1–C5 phases target closing this.

02The Core Thesis for v1.8

The existing v1.8 roadmap in the project proposes "compression levels 1–10" — essentially the same algorithm at different effort budgets. That's useful engineering, but it's not a compression breakthrough. The level system gives us at most 0.01% compression gain at 3× the time. The forensic data shows why: v1.7H is already near the ceiling for Rice+LPC architecture.

Bojan's Principle #0: "Don't accept limits without evidence." The evidence says LPC+Rice has a ceiling. The evidence also says the ceiling is imposed by the prediction model, not the entropy coder. GPT's 16 KB advantage proves entropy coder tuning yields marginal gains. Real compression gains require new predictors.

I propose v1.8 should do both: the level system (easy engineering win, ~4 hours) plus the first genuinely new compression capability. The question is: which new capability gives the biggest gain for the least implementation risk?

What the Data Tells Us

Where Bytes Are Wasted

On Ethereal Arc: 175/263 frames are LOW budget (easy), 72 are MEDIUM, 16 are HIGH. Zero frames triggered PERIODIC (no frame hit 0.85 correlation). LPC dominates 254/263 frames, with LPC² winning 8 and DELTA2 winning 1. The PERIODIC predictor never fires on orchestral music.

The gap to OFR (520 KB) and FLAC (55 KB) isn't in the easy frames — it's in the MEDIUM and HIGH difficulty frames where LPC's polynomial model can't capture the signal's true structure.

03The v1.8 Architecture: Three Layers

v1.8 is built in three independent, testable layers. Each layer can be validated in isolation before the next begins. No multi-feature launches.

Layer A: Compression Levels (Engineering)

Layer A — Variable Effort Codec

Unify GPT/CLA/GEM into one codebase. Add --level 1-10 argument controlling search budget, LZMA arena aggressiveness, partition order range, and candidate count. GPT's entropy coding logic becomes the default for all levels.

Implementation: ~4 hours. Start from v1.7H-GPT. Add COMPRESSION_LEVELS dict. Level 10 = current v1.7H-GPT. Level 5 ≈ v1.6.1 speed. Level 1 = fast streaming.

Expected gain: 0% compression (same ceiling, different speed). But critical for usability and benchmarking.

Layer B: rANS Entropy Coder (Close the FLAC Gap)

Layer B — rANS Replaces Rice on Hard Frames

Rice coding assumes Laplacian residual distribution. Real LPC residuals have heavier tails and slight asymmetry. rANS with an adaptive probability model captures the true distribution shape. This is the known 0.5–1% gain that separates FLAC-tier from WavPack-tier codecs.

Implementation: Port _RansEnc/_RansDec/_RansAdaptive from HYB4 (already battle-tested on 50 genomes). Adapt 4-symbol alphabet to audio residual distribution. MDL battle: Rice vs rANS per subframe.

Expected gain: +0.5–1.0% compression → closes the FLAC gap on Ethereal Arc, extends lead on content where we already win.

Risk: Low — proven technique, working code exists in HYB4.

Layer C: CASCADE Prediction (Close the OFR Gap)

Layer C — Multi-Layer Adaptive Prediction

The single biggest insight from competitive analysis: the 5–10% gap between "strong" (FLAC/WavPack) and "SOTA" (OFR/SAC) codecs comes from multi-layer adaptive prediction. OFR's neural predictor and SAC's OLS+NLMS cascade both apply second-stage filters to LPC residuals.

Architecture:

Stage 1: LPC (existing) → residuals_1
Stage 2: NLMS adaptive filter on residuals_1 → residuals_2
         (NLMS = Normalized Least Mean Squares, 16-64 taps)
Stage 3: Entropy code residuals_2 via rANS or Rice (MDL decides)

DCC gates cascade depth:
  - Easy frames (LOW budget): Stage 1 only
  - Medium frames: Stage 1 + Stage 2
  - Hard frames: Stage 1 + Stage 2 + extended search

Why NLMS: It's the simplest adaptive filter — updates weights sample-by-sample, tracks slowly-changing correlation in LPC residuals. SAC proves this architecture works (8–12% better than FLAC). Unlike SAC's bit-level arithmetic coding, we keep rANS (simpler, faster, already ported).

Expected gain: +2–5% compression → pushes into WavPack/OFR territory.

Risk: Medium — new code, but algorithm is well-understood (LMS filters are textbook DSP). DCC gating ensures no regression on easy content.

04The Novel Predictors: Where 8Z Becomes Unique

Layers A–C bring 8Z to competitive parity with existing SOTA codecs. But the project's DNA (literally) is about going beyond conventional approaches. The following predictors are where 8Z-Audio becomes something no other codec has attempted.

Chebyshev Harmonic Predictor (from Dream Team dialogue)

The Physics-Based Predictor

Distorted audio (guitar amp, synth saturation, industrial) passes through Chebyshev polynomial waveshapers — a known, specific mathematical function. The LPC residual on distorted content is dominated by harmonics created by this function. A single parameter (distortion level) determines the entire harmonic series.

Implementation: (1) detect frames with harmonic structure in residual, (2) estimate fundamental frequency, (3) fit Chebyshev coefficients, (4) predict residual using Chebyshev model, (5) encode parameters + fine residual. MDL decides if cheaper than raw residual.

Expected gain: Up to 22% on heavily distorted content (Rammstein). This is exactly the content where OFR collapses (+26–39% regression vs FLAC). 8Z would exploit OFR's weakness.

Risk: Medium-high — novel, no precedent in any lossless codec. But the physics is sound.

PERIODIC (Enhanced)

Enhanced Periodic Detection

Current PERIODIC never fires on Ethereal Arc (0.85 correlation threshold too strict for orchestral music). Lower threshold to 0.70 with MDL gating — let the cost function decide, not a hardcoded threshold. Also: run PERIODIC on LPC residuals, not just raw samples. If LPC captures the spectral envelope but misses the pitch periodicity, PERIODIC catches the remainder.

Expected gain: +0.5–1.5% on tonal content (sustained notes, synth pads, guitar).

Physical Audio Model Library (v1.9+ horizon)

Signal-Specific Generator Zoo

Not 79 random DNA generators — a library of physical audio models entering the MDL arena:

Vibrato       → frequency-modulated sinusoid (3 params)
Reverb tail   → exponential decay envelope (2 params)
Drum transient→ damped oscillator (4 params)
Distortion    → Chebyshev waveshaper (2-4 params)
Tremolo       → amplitude-modulated signal (3 params)

Each model is compact (4–12 bytes of parameters), deterministic, and physically grounded. DCC gates which models to try based on scanner classification. This is signal processing done right, with MDL as arbiter.

05Implementation Phases

Phase 1: Unify + Level System (4 hours)

Merge GPT/CLA/GEM → single v1.8.py with --level 1-10

Start from v1.7H-GPT (proven winner). Add COMPRESSION_LEVELS dict. Implement argparse. Level 10 = v1.7H-GPT verbatim. Test: round-trip all levels, level 10 matches 8,050,982 bytes on Ethereal Arc.

Phase 2: rANS Entropy Coder (6–8 hours)

Port from HYB4, adapt to audio residual distribution, MDL battle vs Rice

Add ENTROPY_RANS = 2 alongside Rice and LZMA. Per-subframe MDL battle: whichever entropy coder produces fewer bytes wins. Focus on MEDIUM and HIGH budget frames where Rice is weakest. Test: compression must improve on Ethereal Arc; no regression on any frame.

Phase 3: NLMS Cascade Predictor (8–12 hours)

Second-stage adaptive filter on LPC residuals, DCC-gated depth

Implement 32-tap NLMS filter on LPC residuals. DCC controls whether cascade fires (LOW budget = skip, HIGH = always). MDL compares: LPC-only residual vs LPC+NLMS residual. Format change: new predictor ID PRED_CASCADE = 8. Test: must beat v1.7H-GPT on at least 50% of frames; total file size must decrease.

Phase 4: Enhanced PERIODIC + Chebyshev (12–16 hours)

Lower PERIODIC threshold, add Chebyshev waveshaper predictor

Two independent features: (a) PERIODIC on LPC residuals with 0.70 threshold, (b) Chebyshev harmonic predictor for distorted content. Both enter MDL arena. Test on expanded corpus including Rammstein (distortion target) and sustained tonal content.

Phase 5: Benchmark + Documentation (4 hours)

Full 15-clip corpus, comparative analysis, forensic CSV

Run all 15 benchmark clips at levels 5 and 10. Compare vs FLAC-12, OFR, WavPack. Generate speed/compression tradeoff curves. Update all documentation.

06Expected Performance Outcomes

Configuration	Est. Size	Ratio	vs v1.7H	vs FLAC	vs OFR
ACMD v1.9 (current, MDL frame probe)	~7,993,000	~46.48%	baseline	≈ 0%	+6.1%
v1.8 + rANS only	~7,990,000	~46.45%	−0.8%	≈ 0%	+6.1%
v1.8 + rANS + CASCADE	~7,750,000	~45.06%	−3.7%	−3.1%	+2.9%
v1.8 + all (incl. Chebyshev)	~7,650,000	~44.49%	−5.0%	−4.2%	+1.6%
Theoretical limit: OFR	7,530,383	43.79%	—	—	0%

The trajectory: With rANS + CASCADE, v1.8 should match or beat FLAC on Ethereal Arc (where we currently lose) while extending the lead on content we already win. With Chebyshev, we potentially close to within ~1.5% of OFR — which took years of development with neural prediction. And on distorted content, we might beat OFR.

07Questions for the AI Team

I'd like each team member (GPT, Gemini, Grok, and any others) to review this plan and address:

Architecture Questions

Q1: NLMS cascade — should the adaptive filter operate on the full frame (16,384 samples) or on fixed sub-blocks (e.g., 1,024 samples) to allow faster convergence? Trade-off: sub-blocks lose long-range correlation but converge faster on transients.

Q2: rANS alphabet size — HYB4 uses 4 symbols. Audio residuals have much wider range. Should we use order-0 byte-level rANS (256 symbols), order-1 (65,536 contexts), or something in between? MDL cost of the probability table must be included.

Q3: Chebyshev predictor — should it enter the MDL arena at Phase 4 as a standalone predictor, or as a second stage after LPC (predict the harmonic residual)? The cascaded version has higher ceiling but more complexity.

Validation Questions

Q4: The PERIODIC predictor fires on 0/263 Ethereal Arc frames. Is the 0.85 threshold too strict, or is orchestral music genuinely non-periodic at the frame level? Should we test on the full 15-clip corpus before changing the threshold?

Q5: Compression level estimates (Section 6) assume additive gains. In practice, rANS + CASCADE may have diminishing interaction. How should we validate the estimates — sequential A/B testing, or build both and measure?

Strategic Questions

Q6: The existing v1.8 roadmap focuses on levels only (no new predictors). This plan adds significant scope. Should we split into v1.8 (levels + rANS) and v1.9 (CASCADE + Chebyshev)? Or build it all as v1.8 since the layers are independent?

Q7: Priority between closing the FLAC gap (rANS, engineering) vs. opening a new frontier (Chebyshev, research). Which matters more for the publication narrative?

08Risk Matrix

Risk	Probability	Impact	Mitigation
rANS adds overhead, no net gain	Low	Low	MDL gating — Rice always available as fallback
NLMS convergence too slow for short frames	Medium	Medium	DCC gates: skip cascade on short/easy frames
Chebyshev doesn't help on real music	Medium	Low	MDL ensures no regression; gain is pure upside
Format incompatibility with v1.7H	Certain	Low	R&D phase — no backward compatibility required
Combined speed regression >5×	Medium	Medium	Compression levels: level 5 = fast, level 10 = full

09Success Criteria

v1.8 Ships If And Only If

✅ Bit-perfect lossless at all compression levels (SHA3 round-trip verified)

✅ Level 10 compression ≤ v1.7H-GPT on Ethereal Arc (no regression)

✅ At least one new predictor (rANS or CASCADE) demonstrably improves compression

✅ Level 5 encoding speed ≤ v1.6.1 (37.6 minutes on Ethereal Arc)

✅ Forensic CSV logging for all levels and all predictors

✅ 15-clip benchmark with comparative analysis vs FLAC-12, OFR, WavPack

v1.8 = Levels + rANS + CASCADE + Chebyshev

Same MDL arena. More competitors. Let the math decide.

10Summary: What I'm Proposing

The existing v1.8 roadmap is a speed project — same compression, different effort. I'm proposing v1.8 as a compression project that also includes speed flexibility. The rationale:

The Argument in Four Points

1. The LPC+Rice ceiling is real. v1.6.1 → v1.7H gained only 20.6 KB (0.25%) in 2× the time. Diminishing returns are empirically proven.

2. The gap is in the predictor, not the entropy coder. GPT's 16 KB win proves entropy tuning is marginal. OFR's 520 KB lead proves multi-layer prediction is where the bytes are.

3. The components exist. rANS is ported from HYB4. NLMS is textbook DSP. Chebyshev is physics. The MDL arena is domain-agnostic. DCC gates everything.

4. The philosophy demands it. "Don't accept limits without evidence." The limit isn't FLAC — it's the assumption that LPC is the best predictor for all audio content. Evidence from SAC, OFR, and the Chebyshev insight says otherwise.

Estimated total development: 34–44 hours across 5 phases. Each phase is independently testable. No phase depends on the previous phase's success — they can be reordered or dropped without breaking anything. The MDL arena ensures no regression: if a new predictor doesn't help, it simply never wins, and compression stays at v1.7H levels.

This is the plan. Let's hear what the team thinks.