From three hybrids to one unified encoder — and beyond LPC
Tested on 18 files (3 full tracks + 15 clips), max compression. AC v1.9 MDL frame-size probe correctly selects frame size per content (8192 on 24-bit, 4096 on 16-bit orchestral). avFLAC beats lax_t6 on Abyssal and 7 clips, beats OptimFROG on 3 Radiohead clips by 31–40%. All 15 clips beat FLAC-12. avFLAC invariant holds on all 18 files.
| Encoder | Ethereal Arc | Ratio | vs FLAC | vs OFR |
|---|---|---|---|---|
| OFR (best) | 7,530,383 | 43.79% | −5.8% | baseline |
| WavPack | 7,934,862 | 46.14% | −0.8% | +5.4% |
| avFLAC v1.2 | 7,958,631 | 46.28% | −0.5% | +5.7% |
| FLAC -12 | 7,996,121 | 46.49% | baseline | +6.2% |
| 8Z-AC v1.9 | ~7,993,366 | 46.48% | −0.03% | +6.1% |
The existing v1.8 roadmap in the project proposes "compression levels 1–10" — essentially the same algorithm at different effort budgets. That's useful engineering, but it's not a compression breakthrough. The level system gives us at most 0.01% compression gain at 3× the time. The forensic data shows why: v1.7H is already near the ceiling for Rice+LPC architecture.
I propose v1.8 should do both: the level system (easy engineering win, ~4 hours) plus the first genuinely new compression capability. The question is: which new capability gives the biggest gain for the least implementation risk?
On Ethereal Arc: 175/263 frames are LOW budget (easy), 72 are MEDIUM, 16 are HIGH. Zero frames triggered PERIODIC (no frame hit 0.85 correlation). LPC dominates 254/263 frames, with LPC² winning 8 and DELTA2 winning 1. The PERIODIC predictor never fires on orchestral music.
The gap to OFR (520 KB) and FLAC (55 KB) isn't in the easy frames — it's in the MEDIUM and HIGH difficulty frames where LPC's polynomial model can't capture the signal's true structure.
v1.8 is built in three independent, testable layers. Each layer can be validated in isolation before the next begins. No multi-feature launches.
Unify GPT/CLA/GEM into one codebase. Add --level 1-10 argument controlling search budget, LZMA arena aggressiveness, partition order range, and candidate count. GPT's entropy coding logic becomes the default for all levels.
Implementation: ~4 hours. Start from v1.7H-GPT. Add COMPRESSION_LEVELS dict. Level 10 = current v1.7H-GPT. Level 5 ≈ v1.6.1 speed. Level 1 = fast streaming.
Expected gain: 0% compression (same ceiling, different speed). But critical for usability and benchmarking.
Rice coding assumes Laplacian residual distribution. Real LPC residuals have heavier tails and slight asymmetry. rANS with an adaptive probability model captures the true distribution shape. This is the known 0.5–1% gain that separates FLAC-tier from WavPack-tier codecs.
Implementation: Port _RansEnc/_RansDec/_RansAdaptive from HYB4 (already battle-tested on 50 genomes). Adapt 4-symbol alphabet to audio residual distribution. MDL battle: Rice vs rANS per subframe.
Expected gain: +0.5–1.0% compression → closes the FLAC gap on Ethereal Arc, extends lead on content where we already win.
Risk: Low — proven technique, working code exists in HYB4.
The single biggest insight from competitive analysis: the 5–10% gap between "strong" (FLAC/WavPack) and "SOTA" (OFR/SAC) codecs comes from multi-layer adaptive prediction. OFR's neural predictor and SAC's OLS+NLMS cascade both apply second-stage filters to LPC residuals.
Architecture:
Stage 1: LPC (existing) → residuals_1
Stage 2: NLMS adaptive filter on residuals_1 → residuals_2
(NLMS = Normalized Least Mean Squares, 16-64 taps)
Stage 3: Entropy code residuals_2 via rANS or Rice (MDL decides)
DCC gates cascade depth:
- Easy frames (LOW budget): Stage 1 only
- Medium frames: Stage 1 + Stage 2
- Hard frames: Stage 1 + Stage 2 + extended search
Why NLMS: It's the simplest adaptive filter — updates weights sample-by-sample, tracks slowly-changing correlation in LPC residuals. SAC proves this architecture works (8–12% better than FLAC). Unlike SAC's bit-level arithmetic coding, we keep rANS (simpler, faster, already ported).
Expected gain: +2–5% compression → pushes into WavPack/OFR territory.
Risk: Medium — new code, but algorithm is well-understood (LMS filters are textbook DSP). DCC gating ensures no regression on easy content.
Layers A–C bring 8Z to competitive parity with existing SOTA codecs. But the project's DNA (literally) is about going beyond conventional approaches. The following predictors are where 8Z-Audio becomes something no other codec has attempted.
Distorted audio (guitar amp, synth saturation, industrial) passes through Chebyshev polynomial waveshapers — a known, specific mathematical function. The LPC residual on distorted content is dominated by harmonics created by this function. A single parameter (distortion level) determines the entire harmonic series.
Implementation: (1) detect frames with harmonic structure in residual, (2) estimate fundamental frequency, (3) fit Chebyshev coefficients, (4) predict residual using Chebyshev model, (5) encode parameters + fine residual. MDL decides if cheaper than raw residual.
Expected gain: Up to 22% on heavily distorted content (Rammstein). This is exactly the content where OFR collapses (+26–39% regression vs FLAC). 8Z would exploit OFR's weakness.
Risk: Medium-high — novel, no precedent in any lossless codec. But the physics is sound.
Current PERIODIC never fires on Ethereal Arc (0.85 correlation threshold too strict for orchestral music). Lower threshold to 0.70 with MDL gating — let the cost function decide, not a hardcoded threshold. Also: run PERIODIC on LPC residuals, not just raw samples. If LPC captures the spectral envelope but misses the pitch periodicity, PERIODIC catches the remainder.
Expected gain: +0.5–1.5% on tonal content (sustained notes, synth pads, guitar).
Not 79 random DNA generators — a library of physical audio models entering the MDL arena:
Vibrato → frequency-modulated sinusoid (3 params) Reverb tail → exponential decay envelope (2 params) Drum transient→ damped oscillator (4 params) Distortion → Chebyshev waveshaper (2-4 params) Tremolo → amplitude-modulated signal (3 params)
Each model is compact (4–12 bytes of parameters), deterministic, and physically grounded. DCC gates which models to try based on scanner classification. This is signal processing done right, with MDL as arbiter.
Start from v1.7H-GPT (proven winner). Add COMPRESSION_LEVELS dict. Implement argparse. Level 10 = v1.7H-GPT verbatim. Test: round-trip all levels, level 10 matches 8,050,982 bytes on Ethereal Arc.
Add ENTROPY_RANS = 2 alongside Rice and LZMA. Per-subframe MDL battle: whichever entropy coder produces fewer bytes wins. Focus on MEDIUM and HIGH budget frames where Rice is weakest. Test: compression must improve on Ethereal Arc; no regression on any frame.
Implement 32-tap NLMS filter on LPC residuals. DCC controls whether cascade fires (LOW budget = skip, HIGH = always). MDL compares: LPC-only residual vs LPC+NLMS residual. Format change: new predictor ID PRED_CASCADE = 8. Test: must beat v1.7H-GPT on at least 50% of frames; total file size must decrease.
Two independent features: (a) PERIODIC on LPC residuals with 0.70 threshold, (b) Chebyshev harmonic predictor for distorted content. Both enter MDL arena. Test on expanded corpus including Rammstein (distortion target) and sustained tonal content.
Run all 15 benchmark clips at levels 5 and 10. Compare vs FLAC-12, OFR, WavPack. Generate speed/compression tradeoff curves. Update all documentation.
| Configuration | Est. Size | Ratio | vs v1.7H | vs FLAC | vs OFR |
|---|---|---|---|---|---|
| ACMD v1.9 (current, MDL frame probe) | ~7,993,000 | ~46.48% | baseline | ≈ 0% | +6.1% |
| v1.8 + rANS only | ~7,990,000 | ~46.45% | −0.8% | ≈ 0% | +6.1% |
| v1.8 + rANS + CASCADE | ~7,750,000 | ~45.06% | −3.7% | −3.1% | +2.9% |
| v1.8 + all (incl. Chebyshev) | ~7,650,000 | ~44.49% | −5.0% | −4.2% | +1.6% |
| Theoretical limit: OFR | 7,530,383 | 43.79% | — | — | 0% |
I'd like each team member (GPT, Gemini, Grok, and any others) to review this plan and address:
Q1: NLMS cascade — should the adaptive filter operate on the full frame (16,384 samples) or on fixed sub-blocks (e.g., 1,024 samples) to allow faster convergence? Trade-off: sub-blocks lose long-range correlation but converge faster on transients.
Q2: rANS alphabet size — HYB4 uses 4 symbols. Audio residuals have much wider range. Should we use order-0 byte-level rANS (256 symbols), order-1 (65,536 contexts), or something in between? MDL cost of the probability table must be included.
Q3: Chebyshev predictor — should it enter the MDL arena at Phase 4 as a standalone predictor, or as a second stage after LPC (predict the harmonic residual)? The cascaded version has higher ceiling but more complexity.
Q4: The PERIODIC predictor fires on 0/263 Ethereal Arc frames. Is the 0.85 threshold too strict, or is orchestral music genuinely non-periodic at the frame level? Should we test on the full 15-clip corpus before changing the threshold?
Q5: Compression level estimates (Section 6) assume additive gains. In practice, rANS + CASCADE may have diminishing interaction. How should we validate the estimates — sequential A/B testing, or build both and measure?
Q6: The existing v1.8 roadmap focuses on levels only (no new predictors). This plan adds significant scope. Should we split into v1.8 (levels + rANS) and v1.9 (CASCADE + Chebyshev)? Or build it all as v1.8 since the layers are independent?
Q7: Priority between closing the FLAC gap (rANS, engineering) vs. opening a new frontier (Chebyshev, research). Which matters more for the publication narrative?
| Risk | Probability | Impact | Mitigation |
|---|---|---|---|
| rANS adds overhead, no net gain | Low | Low | MDL gating — Rice always available as fallback |
| NLMS convergence too slow for short frames | Medium | Medium | DCC gates: skip cascade on short/easy frames |
| Chebyshev doesn't help on real music | Medium | Low | MDL ensures no regression; gain is pure upside |
| Format incompatibility with v1.7H | Certain | Low | R&D phase — no backward compatibility required |
| Combined speed regression >5× | Medium | Medium | Compression levels: level 5 = fast, level 10 = full |
✅ Bit-perfect lossless at all compression levels (SHA3 round-trip verified)
✅ Level 10 compression ≤ v1.7H-GPT on Ethereal Arc (no regression)
✅ At least one new predictor (rANS or CASCADE) demonstrably improves compression
✅ Level 5 encoding speed ≤ v1.6.1 (37.6 minutes on Ethereal Arc)
✅ Forensic CSV logging for all levels and all predictors
✅ 15-clip benchmark with comparative analysis vs FLAC-12, OFR, WavPack
The existing v1.8 roadmap is a speed project — same compression, different effort. I'm proposing v1.8 as a compression project that also includes speed flexibility. The rationale:
1. The LPC+Rice ceiling is real. v1.6.1 → v1.7H gained only 20.6 KB (0.25%) in 2× the time. Diminishing returns are empirically proven.
2. The gap is in the predictor, not the entropy coder. GPT's 16 KB win proves entropy tuning is marginal. OFR's 520 KB lead proves multi-layer prediction is where the bytes are.
3. The components exist. rANS is ported from HYB4. NLMS is textbook DSP. Chebyshev is physics. The MDL arena is domain-agnostic. DCC gates everything.
4. The philosophy demands it. "Don't accept limits without evidence." The limit isn't FLAC — it's the assumption that LPC is the best predictor for all audio content. Evidence from SAC, OFR, and the Chebyshev insight says otherwise.
Estimated total development: 34–44 hours across 5 phases. Each phase is independently testable. No phase depends on the previous phase's success — they can be reordered or dropped without breaking anything. The MDL arena ensures no regression: if a new predictor doesn't help, it simply never wins, and compression stays at v1.7H levels.
This is the plan. Let's hear what the team thinks.