Five-phase roadmap to close the OptimFROG gap — C1 through C5
Two encoder families are now validated across 18 files (3 full tracks + 15 clips): aFLAC/avFLAC (valid .flac output, beats lax_t6 on Abyssal and 8 clips) and AC (custom .8za format, MDL frame probe). avFLAC beats OptimFROG on 3 Radiohead clips (up to 9.2% smaller) by exploiting per-segment block size optimization that OFR's fixed architecture cannot match. The .8za codec is where the deeper gains live — unconstrained by FLAC spec, free to use any predictor, any entropy coder, any block size.
| Codec | Ethereal Arc | Ratio | Abyssal | Ratio | vs lax_t6 |
|---|---|---|---|---|---|
| OptimFROG | 7,530,383 | 43.79% | 16,283,000 | 47.05% | Target |
| WavPack -hh | 7,934,862 | 46.14% | 17,392,334 | 50.26% | -0.2 to +0.4% |
| lax_t6 | 7,951,589 | 46.24% | 17,320,224 | 50.05% | baseline |
| 8Z-avFLAC v1.2 | 7,958,631 | 46.28% | 17,305,557 | 50.01% | +7 KB / -14.7 KB |
| 8Z-AC v1.9 | ~7,993,000 | 46.48% | 17,512,422 | 50.61% | MDL frame probe |
| FLAC -12 (max) | 7,996,121 | 46.49% | 17,445,199 | 50.41% | old baseline |
Each phase is implemented in a separate chat session using its own CONTINUE paper. Each phase is tested and validated before the next begins. MDL is the arbiter — new techniques compete in the arena; they don't replace existing winners.
| Phase | Technique | Expected Gain | Effort | CONTINUE Paper |
|---|---|---|---|---|
| C1 | rANS entropy coder (MDL arena) | 2–4% | 2–3 weeks | CONTINUE_AC_C1_rANS.md |
| C2 | OLS+NLMS cascade predictor | 2–3% | 2–3 weeks | CONTINUE_AC_C2_OLS_NLMS.md |
| C3 | Joint stereo prediction | 1–1.5% | 1–2 weeks | CONTINUE_AC_C3_Joint_Stereo.md |
| C4 | Bitplane coder with SSE | 0.5–1% | 2 weeks | CONTINUE_AC_C4_Bitplane_SSE.md |
| C5 | DDS meta-parameter optimization | 0.5–1% | 1 week | CONTINUE_AC_C5_DDS_Optimization.md |
We do NOT replace Rice with rANS. We ADD rANS as a third entropy backend. Per partition, all three compete and MDL picks the cheapest. Rice wins clean tonal partitions (zero overhead). rANS wins mixed/transient partitions where Rice's integer k-parameter can't capture the true distribution shape.
Different partitions within the same frame may use different entropy coders. A frame might have Rice on the first partition (tonal, clean Laplace) and rANS on the third (transient, heavy tails). MDL decides per partition, not per frame.
Precision: M=4096 (12-bit frequency tables). State: 32-bit with 16-bit renormalization. Probability model: Laplace shape parameter (1 byte) when distribution fits, full RLE histogram when it doesn't.
Side-info cost: ~16–32 bits per partition vs Rice's 4 bits. rANS has a 12–28 bit handicap — must save more on residual encoding to justify itself. MDL handles this automatically.
Per partition in each subframe: Rice encoded size → X bytes (4 bits side-info: k parameter) rANS encoded size → Y bytes (16-32 bits: distribution descriptor) LZMA encoded size → Z bytes (existing fallback) MDL winner = min(X + overhead_X, Y + overhead_Y, Z + overhead_Z)
Expected gain: 2–4% on mixed/transient content. ~0% on clean tonal (Rice wins there). On Ethereal Arc, expect ~60 KB improvement; on Abyssal, ~340–680 KB.
The encode.su community consensus: "Most of the compression of OptimFROG and SAC comes from low-order OLS." The predictor matters more than the entropy coder. SAC (MIT-licensed) uses OLS+NLMS and achieves compression within ~1% of OptimFROG.
Standard LPC (Levinson-Durbin, what we use now): computes autocorrelation, solves Toeplitz system. Optimizes for the signal's average statistical properties, assuming stationarity.
OLS (Ordinary Least Squares): forms the actual sample matrix, solves X'Xβ = X'y via Cholesky decomposition. Optimizes for what actually happened in this specific block. No stationarity assumption.
On non-stationary audio (which is most music), OLS wins because it adapts to local signal behavior rather than averaging over the block.
Stage 1: OLS predictor → residuals_1 (better than LPC residuals)
Stage 2: NLMS adaptive filter on residuals_1 → residuals_2 (catches time-varying patterns OLS missed)
Key advantage: NLMS needs NO side-info. It's a causal filter — the decoder runs identical NLMS on the same residual stream, producing identical adapted weights. Free compression gain.
OLS computed in float64, then quantized to fixed-point (same qlevel path as existing LPC). Decoder uses quantized coefficients — perfectly deterministic, platform-independent. Same principle FLAC uses, just OLS instead of autocorrelation.
Expected gain: 2–3%. Combined with C1: 4–7% cumulative. OLS wins ~30–60% of frames; LPC wins the rest. MDL selects per-frame.
Current approach: transform L/R to Mid/Side, then predict each channel independently. This captures instantaneous correlation but misses delayed cross-channel correlation (room acoustics, mic spacing) and frequency-dependent correlation (mono bass, stereo treble).
Joint prediction: Each channel uses past samples from BOTH channels as prediction inputs. For left channel: 16 taps from L (same as now) + 8 taps from R (new). The OLS sample matrix simply gets more columns — no new algorithm, same Cholesky solver.
6 stereo modes compete per frame: Independent, Mid/Side, Left-Side, Right-Side, Joint Predict, Joint Mid/Side. MDL picks the cheapest. Cross-channel taps cost ~24 bytes extra per frame in coefficients — must save more than that on residuals to justify.
Expected gain: 1–1.5%. Ghido's original paper reported ~1.5%. Cumulative C1+C2+C3: ~5–8.5%.
Rice and rANS treat each symbol independently. A bitplane coder decomposes residuals into individual bits (MSB to LSB) and models each bit conditionally on context: upper bits already coded, magnitude of neighboring samples, channel correlation. This exploits clustering — large residuals follow large residuals — that symbol-level coders miss.
SSE (Secondary Symbol Estimation) refines the primary context model's probability estimates using recent coding history. This is what separates SAC/PAQ-class compressors from simpler approaches.
Fourth entropy backend in MDL arena: Rice, rANS, Bitplane, LZMA all compete per partition. Bitplane wins on ~20–40% of partitions — the hardest, most non-stationary ones.
Expected gain: 0.5–1% over C1's rANS. Cumulative C1–C4: ~5.5–9.5%.
After C1–C4, the per-frame parameter space is ~15 million combinations. Grid search samples a tiny fraction. DDS (Dynamically Dimensioned Search) is a black-box optimizer that finds near-optimal configurations in 200–500 evaluations. It automatically transitions from broad exploration to local refinement — no tuning parameters except evaluation budget.
DCC + DDS integration: DCC assigns difficulty tier, DDS refines within the effort budget. SILENCE/EASY frames skip DDS (grid search sufficient). MEDIUM gets 100 iterations. HARD gets 300–500. Warm-started from grid search best — DDS only needs to improve an already-good starting point.
SAC uses DDS for per-frame parameter optimization. Proven technique for audio compression.
Expected gain: 0.5–1%. The diminishing-returns scraper. Cumulative C1–C5: ~6–10.5%.
Start from 8Z-AC.py v1.9 (MDL frame probe). Add estimate_rans_cost(), rans_encode_partition(), rans_decode_partition(). Test: compression must improve on both Ethereal Arc and Abyssal. No regression on any frame (MDL fallback to Rice).
Add compute_ols_coefficients(), nlms_cascade(). New predictor types 3 (OLS) and 4 (OLS_NLMS). OLS tries orders {4, 8, 12, 16, 20, 24, 32}. NLMS: 16-tap, mu∈{0.3, 0.5, 1.0}. Test: OLS should win 30–60% of frames.
Extend OLS sample matrix with cross-channel columns. Q∈{0, 4, 8, 12}. New stereo modes: JOINT_PREDICT, JOINT_MID_SIDE. Test: joint modes should win on spatially complex content. No gain expected on mono-ish tracks.
Fourth entropy backend in MDL arena. MSB-to-LSB processing with context model. SSE corrects probability estimates. Test: bitplane should win on 20–40% of hardest partitions.
200–500 evaluations per frame on MEDIUM/HARD content. Warm-start from grid search best. Test: DDS must improve over grid search on at least 20% of frames. Total encode time target: 3–5 minutes per song.
| Configuration | Est. Ethereal Arc | Ratio | vs FLAC-12 | vs OFR |
|---|---|---|---|---|
| AC v1.9 (current) | ~7,993,000 | ~46.48% | ≈ 0% | +6.1% |
| + C1 (rANS) | ~7,810,000 | ~45.41% | −2.3% | +3.7% |
| + C1 + C2 (OLS+NLMS) | ~7,570,000 | ~44.02% | −5.3% | +0.5% |
| + C1–C3 (Joint Stereo) | ~7,460,000 | ~43.38% | −6.7% | −0.9% |
| + C1–C5 (Full Classical Max) | ~7,350,000 | ~42.74% | −8.1% | −2.4% |
| OptimFROG (reference) | 7,530,383 | 43.79% | — | 0% |
| Risk | Prob | Impact | Mitigation |
|---|---|---|---|
| rANS overhead exceeds savings on most frames | Low | Low | MDL gating — Rice always available as fallback |
| OLS FP determinism issues across platforms | Medium | High | Fixed-point coefficients (same path as LPC qlevel) |
| NLMS convergence too slow for short frames | Medium | Medium | DCC gates: skip cascade on SILENCE/EASY frames |
| Joint stereo coefficient overhead > savings | Medium | Low | MDL ensures no regression — falls back to M/S or independent |
| Bitplane context model doesn't converge in one frame | Medium | Medium | Adaptive from scratch; warmup cost ~0.5% on 4096-sample frames |
| DDS adds significant encode time for marginal gain | Medium | Low | DCC-gated: DDS only on MEDIUM/HARD frames |
| Gain estimates too optimistic (don't stack linearly) | High | Medium | Applied 70% discount. Each phase independently valuable. |
MDL is non-negotiable. Every technique competes. MDL decides. No hardcoded preferences. If OLS doesn't beat LPC on a frame, LPC wins that frame.
Lossless is non-negotiable. Byte-for-byte reconstruction, SHA3-verified. Fixed-point coefficients ensure determinism.
One phase at a time. Test, validate, measure. Each CONTINUE paper is a self-contained session. Don't start C2 until C1 is validated.
Python first. R&D mode. Speed optimization (C extension, Cython) comes after correctness.
No backward compatibility. Format version bumps per phase. Old files stay readable.
Test on both songs. Ethereal Arc AND Abyssal. Plus 13-clip benchmark. Plus 30-song curator set when available.
| Improvement | Target | CONTINUE Paper |
|---|---|---|
| xFLAC-1: Smarter transient detection & block splitting | 0.3–0.5% | CONTINUE_xFLAC_Improvements.md |
| xFLAC-2: LPC search expansion (TOP_K, windows) | 0.3–0.5% | |
| xFLAC-3: Segment boundary optimization (stitching overhead) | 0.2–0.5% | |
| vFLAC --fast / --full adaptive search | Speed control | CONTINUE_vFLAC_FastFull.md ✅ Done |
8Z-AC "Classical Max" — Phase Dependencies
┌─────────────────────────────────────────────────────────┐
│ CURRENT: v1.7H3 │
│ LPC (Levinson-Durbin) + Rice + LZMA + DCC + MDL │
└──────────────────────┬──────────────────────────────────┘
│
▼
┌──────────────────────────────────────────────────────────┐
│ C1: rANS Entropy │ + rANS per-partition selection │
│ (2–4% gain) │ Rice stays in arena │
└──────────────────────┬───────────────────────────────────┘
│
▼
┌──────────────────────────────────────────────────────────┐
│ C2: OLS+NLMS │ + OLS predictor (fixed-point) │
│ (2–3% gain) │ + NLMS cascade (no side-info) │
│ BIGGEST SINGLE GAIN │ LPC stays in arena │
└──────────────────────┬───────────────────────────────────┘
│
▼
┌──────────────────────────────────────────────────────────┐
│ C3: Joint Stereo │ + cross-channel OLS taps │
│ (1–1.5% gain) │ + 6 stereo modes in MDL arena │
└──────────────────────┬───────────────────────────────────┘
│
▼
┌──────────────────────────────────────────────────────────┐
│ C4: Bitplane + SSE │ + context-dependent bit coding │
│ (0.5–1% gain) │ + 4th entropy backend in MDL │
└──────────────────────┬───────────────────────────────────┘
│
▼
┌──────────────────────────────────────────────────────────┐
│ C5: DDS Optimizer │ + black-box parameter search │
│ (0.5–1% gain) │ + DCC-gated evaluation budget │
└──────────────────────┬───────────────────────────────────┘
│
▼
┌───────────────────┐
│ TARGET: ≤ OFR │
│ (~43.8% ratio) │
└───────────────────┘