The story, in a nutshell
8Z-Audio is a lossless WAV compressor that treats each frame as a model-selection problem. Instead of guessing one predictor (like classic codecs), it runs a small arena of predictors and entropy coders, then picks the shortest description (MDL) — with byte-identical verification via SHA3.
What makes this different
MDL is the judge
- Pick the smallest encoding, per frame / subframe.
- Strict tie-breaking → deterministic archives.
- When a “fixed overhead” is always written, it’s treated as a bug.
Audio is physics-rich
- Tonal frames are where extra math pays off.
- Hard frames (industrial / transients) expose entropy-coding limits.
- So the plan is: keep LPC as safety net, add specialized predictors where they win.
“Never accept a limit without evidence. If it’s real, it will survive measurement. If it’s not, MDL will kill it.”
Three concrete wins (already achieved)
- Lossless verified via SHA3-256 matching decoded WAV.
- Competitive compression: v1.5 beats FLAC‑12 on 47% of clips.
- Cross-domain transfer: DCC + scanner pipeline ported from 8Z FASTA / TSP.
Current frontier (v1.7H)
- Two-pass scan→encode (parallel).
- PERIODIC predictor + cascaded LPC² for tonal structure.
- Forensic logs + robustness fixes (24-bit crash, length fields, streaming decode).
Evolution (v0.1 → v1.7H)
This project didn’t grow by “adding features”. It grew by killing weak ideas fast and keeping only what improved the MDL score — with checkpoints, audits, and a benchmark suite.
Day-by-day sprint (from the benchmark report)
| Day | Date | Version | Milestone |
|---|---|---|---|
| 1 | Feb 18 | v0.1 | First encoder. Beat 7‑Zip on Pink Floyd 192kHz (8.3 MB vs 13.7 MB). |
| 2 | Feb 19 | v1.0–v1.2 | LPC + Rice coding. Approached FLAC‑5 territory. |
| 3 | Feb 20 | v1.3.1 | Multiprocessing + exhaustive search. Beat FLAC‑8 by ~2.1% on Pink Floyd. (But 11-hour encode.) |
| 4 | Feb 21 | v1.4 | DCC port from HYB4 to cut time. Encode ~102 min, but ~1.53% regression. |
| 5 | Feb 22 | v1.5 | Bigger blocksize + restored qlevel search. Encode ~40 min. 15‑clip benchmark: 7/15 wins vs FLAC‑12. |
| 5+ | Feb 22–23 | v1.6 → v1.7H | Two-pass scan→parallel encode; forensic logging; PERIODIC + LPC²; robustness fixes & hybrids. |
Version highlights (compact)
v0.1 “First Light”
- Frame arena: RAW / DELTA(1–3) / LPC (selected orders).
- Channel decorrelation modes (indep, mid‑side, left‑side, right‑side).
- Backend battle (LZMA/zlib + optional zstd/brotli).
v1.1–v1.2 “Rice Engine / Partition Fix”
- Partitioned Rice coding (FLAC-style).
- Exhaustive LPC search + windowing (Tukey/Hann/Blackman trials appear later).
- Binary headers + varints + warmup stored at native bit-depth.
v1.3.1 “Unleashed + Checkpoint”
- Crash-safe checkpoint/resume per frame.
- Partition-aware screening + exhaustive Rice k-search.
- Expanded candidate set (top-16 survivors).
v1.4–v1.5 “DCC Era”
- DCCMeter allocates search budget: seize vs thrash.
- CodecLearner prunes orders/windows/qlevels once patterns stabilize.
- Blocksize bump (4096→16384) for fewer frames + better compression (v1.5).
v1.6.1 “Two-Pass”
- Pass 1 scanner builds a per-frame budget map (~100ms/frame).
- Pass 2 encodes frames independently → multiprocessing returns.
- WAVE_FORMAT_EXTENSIBLE support, checkpoint v4.
v1.7 → v1.7H “Forensic / Hybrids”
- BUG fix: length fields to u32 (24-bit crash fix).
- PERIODIC predictor + cascaded LPC²; scanner detects periodic strength.
- Hybrid hardening: streaming decode, deterministic PCM hashing, strict tie-breaks.
Benchmarks (v1.5 snapshot)
This is the first solid, end-to-end benchmark snapshot: 15 clips across 10 songs (human + AI), compared against FLAC‑12. Later we’ll drop in updated runs for v1.6+ / v1.7H.
Headline results
Where 8Z-Audio wins
- Tonal: 3/3 wins (Pink Floyd tonal + AI tonal).
- Dynamic: 1/1 win (Pink Floyd 30s dynamic).
- Biggest win: −10.26% (Lady Gaga clip).
Where it loses (and why that’s useful)
- Industrial / distortion: worst gap (+8.00%).
- DCC-stress short clips: overhead doesn’t repay at 10–30s.
- Points directly to next work: better entropy on hard residuals + transient/noise handling.
Clip-level table (8ZA v1.5 vs FLAC‑12)
| # | Clip | Category | 8ZA v1.5 | FLAC‑12 | Δ (8ZA vs FLAC‑12) |
|---|---|---|---|---|---|
| 1 | HM_03_LG-DWAS_clip_01 | diverse | 0.2090 | 0.2329 | −10.26% |
| 2 | HM_01_PF-SOYCD_clip_02 | tonal | 0.2515 | 0.2654 | −5.24% |
| 3 | AI_01_BD-FH_clip_02 | tonal | 0.2535 | 0.2670 | −5.06% |
| 4 | HM_01_PF-SOYCD_clip_01 | dynamic | 0.2004 | 0.2066 | −3.00% |
| 5 | AI_01_BD-FH_clip_01 | easiest | 0.3405 | 0.3508 | −2.94% |
| 6 | AI_03_BD-AAI_clip_01 | tonal | 0.3722 | 0.3780 | −1.53% |
| 7 | HM_02_ME-LE_clip_02 | buildup | 0.7935 | 0.7938 | −0.04% |
| 8 | AI_05_BD-LRR_clip_01 | diverse | 0.5592 | 0.5583 | +0.16% |
| 9 | HM_02_ME-LE_clip_01 | transient | 0.6449 | 0.6437 | +0.19% |
| 10 | AI_06_BD-BTS_clip_01 | dcc_stress | 0.4770 | 0.4709 | +1.30% |
| 11 | AI_04_BD-EOTS_clip_01 | dcc_stress | 0.5378 | 0.5284 | +1.78% |
| 12 | AI_02_BD-WDIG_clip_01 | dcc_stress | 0.5263 | 0.5157 | +2.06% |
| 13 | HM_04_RA-DH_clip_02 | diverse | 0.4125 | 0.4017 | +2.69% |
| 14 | HM_04_RA-DH_clip_01 | hardest | 0.5504 | 0.5209 | +5.66% |
| 15 | HM_04_RA-DH_clip_03 | dcc_best | 0.5145 | 0.4764 | +8.00% |
Δ is computed from ratios: (8ZA/FLAC − 1)×100%. Negative is a win for 8ZA.
Totals
- 8ZA: 30,730,661 bytes
- FLAC‑12: 30,416,285 bytes
- Aggregate: +1.03% behind FLAC‑12
- Interpretation: 8ZA is already in the same regime, but still needs a hard‑content unlock.
Notable anomaly: OptimFROG vs Rammstein
OFR (often best-in-class) regressed massively on industrial audio.
- Full song: OFR ratio 0.667 vs FLAC‑12 ratio 0.479 (≈39% worse).
- Three clips show 26–39% worse than FLAC.
- This is a clue: distortion/noise structure breaks some predictor stacks.
Architecture
The core idea is stable: split the job into (1) signal analysis and (2) MDL-governed coding. We evolved from one-pass brute force to a two-pass pipeline that can go deep without giving up parallelism.
Two-pass pipeline (v1.6+)
Why the scanner matters
- It’s ~100× cheaper than encoding, so it can afford broad analysis.
- It allocates compute where it pays: tonal frames get deeper search.
- It removes the “sequential feedback loop” bottleneck from DCC-era v1.4–v1.5.
Why MDL battles matter
- Audio isn’t one distribution. Different parts want different models.
- MDL naturally prevents “magic”: params count, overhead counts, residual counts.
- It gives a clean research loop: add a model → measure → keep only if it wins.
Container & safety invariants (what we don’t compromise)
- Lossless: decoded WAV must be byte-identical to the source PCM.
- Deterministic: same input + settings → identical .8za bytes (strict tie-breaks).
- Auditability: forensic per-frame CSV logs in v1.7+ for winner/runner-up margins.
- Resume safety: checkpoint/resume introduced in v1.3.1 (iterated through v1.6+).
How we used LLMs (without hallucinating ourselves)
The “secret” wasn’t prompting. It was setting up selection pressure so only measured improvements survive. LLMs explored; MDL + SHA3 decided.
The collaboration loop
Roles
- Architect (BD): chooses the objective + insists on evidence.
- Builder (LLM): implements variants quickly.
- Skeptic: hunts overhead bugs, determinism leaks, missing costs.
- Editor: turns the chaos into a crisp spec + story.
Rules that keep it real
- Tests outrank opinions: round-trip + SHA3 are mandatory.
- MDL honesty: every byte is counted, especially fixed overhead.
- Keep winners only: variants don’t graduate without benchmark wins.
- Anomalies are gold: Rammstein/OFR regression is a research signal, not a nuisance.
What LLMs accelerated
- Fast implementation of partitioned Rice + correct overhead models.
- System ports: DCCMeter / CodecLearner / FrameFeatures from other 8Z domains.
- Rapid “candidate generator” iterations (PERIODIC, LPC², logging, streaming).
What humans still do better
- Choosing the right experiments (what’s worth measuring next).
- Spotting goal drift and keeping the system honest.
- Making trade decisions: compression vs speed vs complexity.
Compact “selection protocol” (how versions become mainline)
- Run the golden suite (same corpus, fixed settings, same machine).
- Record: bytes, encode time, decode time, memory, and SHA3 verification.
- Promote a version only if it improves a declared fitness score (and passes invariants).
Roadmap (the next wins)
The goal is not “more models”. The goal is closing the gap on hard content while keeping the system deterministic and test-governed.
Immediate targets (high leverage)
Hard-content unlock (industrial / transients)
- Entropy upgrade for high-entropy residuals (rANS / better Rice).
- Transient-aware gating (don’t waste deep LPC where it can’t win).
- Noise-floor modeling & better escape handling.
Tonal ceiling (where 8Z already shines)
- Make PERIODIC stronger (better lag detection / YIN-style ideas).
- Prototype harmonic models as MDL candidates (fixed-point, deterministic).
- Temporal locality “model soup” (carry winners to next frame cheaply).
Engineering polish that pays
Repro + audit
- One benchmark runner → CSV + markdown report every time.
- Keep forensic logs for “why this won” debugging.
- Pin exact reference commands for FLAC/OFR comparisons (ffmpeg exact_rice).
Performance
- Vectorize hot loops; consider Numba/JIT where it stays deterministic.
- Memory-safe streaming encode/decode for long WAVs.
- Parallel scheduling tuned by the scanner budget map.
Later: “math generators” (8Z differentiators)
- Harmonic + distortion modeling (e.g., Chebyshev) as an MDL candidate, not a belief.
- Residual structure hunting (CA / attractors) only after LPC/PERIODIC are exhausted.
- Strict verification: fixed-point, integer-only, SHA3 test vectors.
Sources used for this v1 story
This site is built from the current code lineage and the project docs in your bundle. It’s meant to stay honest: when numbers change, we update the page.
Encoders
Note: the story text is aligned to docstrings + the benchmark report snapshot. As we plug in v1.7H results, we’ll update KPI blocks and tables.
Docs
- 8Z_Audio_v1.5_Benchmark_Report.md — the 15‑clip snapshot used here.
- CONTINUE_Audio_v1.6.md — pipeline + scanner rationale.
- 8Z_Math_in_Audio_Compression.md — why audio has extra mathematical headroom.
- 8Z_Audio_Peer_Review_Synthesis.md — peer review map (what led to Rice/precision/windowing work).
What we’ll add next (when you provide more data)
- Benchmark snapshots for v1.6.1, v1.7, v1.7H (same corpus).
- Speed table (encode/decode time per clip, per worker count).
- “Why it won” forensic summary (from v1.7 logs): top predictors by category.