8Z-xFLAC Optimization Sprint — Three-Phase Roadmap

01Where We Stand: March 2026

WINS

avFLAC BEATS lax_t6
Abyssal (−14.7 KB)

−348 KB

avFLAC vs lax_t6
Radiohead 60s (−6.8%)

+7 KB

Gap to lax_t6
Ethereal (stitching)

15 / 15

avFLAC Wins
15-Clip vs FLAC-12

The xFLAC bundle — aFLAC v1.3, vFLAC v1.5, and avFLAC v1.2 — produces valid .flac files. As of March 2026, avFLAC beats lax_t6 on Abyssal by 14,667 bytes — our first full-track win over the strongest single FLAC encoder. vFLAC v1.5 with MDL block probe won 21/22 arena segments. On 15 benchmark clips, avFLAC beats FLAC-12 on all 15 and beats the best baseline on 8. Three Radiohead clips beat even OptimFROG by 31–40%.

The remaining gap: Ethereal Arc still +7 KB behind lax_t6 (stitching overhead), and 24-bit content (LG-DWAS) has a vFLAC block probe bias that needs bit-depth-aware correction.

Sprint goal: Close the 13.5 KB gap to lax_t6 on Ethereal Arc, fix vFLAC's mixed-content losses, and make the arena stitch beat every single whole-file encoder.

Ethereal Arc Benchmark (48 kHz stereo, 89.5 s)

#	Codec	Bytes	vs FLAC-12	Notes
1	OptimFROG	7,530,383	−421,206	Range coder · closed source
2	WavPack -hh	7,934,862	−16,727	Context mixing
3	lax_t6	7,951,589	baseline	Best single FLAC encoder · TARGET
4	8Z-avFLAC v1.2	7,958,631	+7,042	✓ VF_whole · vFLAC 16/16 segs
5	8Z-vFLAC v1.5	7,958,631	+7,042	✓ MDL block probe · 190/280 changed
6	8Z-aFLAC v1.3	7,965,119	+13,530	✓ Arena + stitch
7	FLAC -12 (max)	7,996,121	+44,532	Standard reference

Abyssal Benchmark (48 kHz stereo, 180.2 s)

#	Codec	Bytes	vs FLAC-12	Notes
1	OptimFROG	16,283,191	−1,162,008	Range coder · 6% gap
2	8Z-avFLAC v1.2	17,305,557	−139,642	✓ BEATS lax_t6 by 14,667B · arena stitch
3	lax_t6	17,320,224	−124,975	Previous best single FLAC encoder
4	8Z-aFLAC v1.3	17,369,506	−75,693	✓ Beats FLAC-12
5	FLAC -12 (max)	17,445,199	baseline
6	8Z-vFLAC v1.5	17,512,422	+67,223	Standalone (VF_whole)

02The Stitching Overhead Problem

This is the single most important finding from the xFLAC R&D cycle so far. The --fast vs --full experiment on Ethereal Arc produced identical output (7,965,119 bytes) despite --full evaluating 42 candidates per subframe vs --fast's 4–11. The DCC pruning isn't losing compression — the candidate pool already contains the winners. The bottleneck is not search depth. It is stitching overhead.

Ethereal Arc Forensics

lax_t6 whole:     7,951,589B
aFLAC raw-sum:    7,936,170B  ← WINS by 7,133B
aFLAC stitched:   7,965,119B  ← headers ADD 28,949B
Net vs lax_t6:   +13,530B     ← loss

Abyssal Forensics (March 2026 — WINS!)

lax_t6 whole:    17,320,224B
avFLAC stitch:   17,305,557B  ← WINS by 14,667B
vFLAC wins:      21/22 segs   ← vFLAC v1.5 MDL block probe dominates
Seg raw sum:     17,301,297B  ← raw is even better

Abyssal is solved. vFLAC v1.5 with MDL block probe won 21/22 arena segments, and the arena stitch beat lax_t6 by 14,667 bytes. Ethereal Arc remains the target — raw wins by 7 KB but stitching overhead erases it. X3 stitching elimination will close this gap.

Overhead Budget Per Segment Boundary

Source	Bytes	Notes
Frame number UTF-8 growth	1–3	Per frame; larger numbers = more bytes
Cold-start LPC penalty	50–200	Per segment; first frame has no context
Short tail block	100–500	Per segment; remainder < blocksize
Total per boundary	200–700	16 segments × ~400B = ~6,400B structural
Context-loss penalty	~22,500	Remaining gap: per-segment LPC < whole-file

03Three-Phase Roadmap

┌──────────────────────────────────────────────────────────────┐
│  X1  Transient Blocks    │  vFLAC v1.3 → v1.4           │
│      onset detection      │  block plan rewrite            │
│      1024-sample attack   │  LPC order cap per class       │
├───────────────────────────┼──────────────────────────────────┤
│  X2  LPC Search Expand   │  vFLAC v1.4 → v1.5           │
│      TOP_K 16→24         │  DCC-gated tonal boost        │
│      improved proxy       │  +hamming window               │
├───────────────────────────┼──────────────────────────────────┤
│  X3  Stitch Elimination  │  aFLAC v1.3 → v1.4           │
│   a: segment merging     │  avFLAC v1.2 → v1.3          │
│   b: overlapping segs    │  cold-start elimination        │
│   c: frame-level arena   │  (optional, if a+b insufficient)│
└───────────────────────────┴──────────────────────────────────┘

Cumulative gain estimates (65% stacking discount):
  X1  Transient blocks      →  0.1–0.3%  (~8–24 KB)
  X2  LPC search expansion  →  0.2–0.5%  (~16–40 KB)
  X3  Stitch elimination    →  0.1–0.4%  (~8–30 KB)
  Combined:                    0.3–0.8%  (~24–65 KB)
  Need to beat lax_t6:         0.17%     (13.5 KB)

04X1: Transient Detection & Adaptive Block Splitting

CONTINUE_xFLAC_X1_Transient_Blocks.md

Problem

vFLAC assigns fixed block sizes by signal class (tonal→16384, mixed→8192, transient→4096). A drum hit's attack is ~50–200 samples but wastes an entire 4096-sample block. On mixed content: +11.19% on EOTS, +11.88% on LRR.

Solution

Two-stage onset detection: coarse scan (energy derivative between 1024-sample windows) followed by sample-level refinement (64-sample hops). Place 1024–2048 sample blocks precisely around attacks, then immediately return to optimal larger blocks for sustain/decay. FLAC spec allows any block size 16–65535. No other FLAC encoder does this.

Key Design Choices

Onset Detection Algorithm

Energy derivative + ZCR discontinuity with signal-class-dependent thresholds (tonal=6.0, mixed=3.0, transient=2.0). Stage 2 refinement only triggers on flagged windows (~5–15% of total), keeping overhead low.

Adaptive LPC Order Cap

Peer review: "Orders above ~16 overfit on mixed content." Cap per block class: transient→12, pre-onset→16, tonal→32 (unchanged). Reduces search time AND improves compression.

Expected Impact

Content Type	Current vFLAC	After X1	Δ
Pure tonal (FH clips)	−2% to −4%	−2% to −4%	0%
Mixed (WDIG, AAI)	+0.5% to +1.5%	−0.5% to +0.5%	~1%
Heavy transient (EOTS, LRR)	+11% to +12%	+3% to +6%	~6–8%
Ethereal Arc (mostly tonal)	+0.5%	≤ +0.5%	~0.1%

05X2: LPC Search Expansion & Tonal Boost

CONTINUE_xFLAC_X2_LPC_Search.md

Problem

Three-variant CSV comparison: GPT v1.7H beats GEM by 80 KB and CLA by 17 KB on Ethereal Arc (263 frames). GPT finds blackman/o=32/ql=8 where others settle for o=22/ql=10 — saving 50 bytes per frame on tonal content. The advantage concentrates on HIGH-budget tonal frames.

Solution

Wider search: TOP_K 16→24 (50% more candidates survive proxy screening).

Tonal boost: When difficulty < 0.05 AND pred_gain > 35 dB, trigger exhaustive 384-eval search (TOP_K=48, 5 windows). Affects ~40% of frames on Ethereal Arc.

Improved proxy: Coefficient-aware cost estimate includes quantization loss + coefficient overhead. Fewer false rejections of high-order candidates.

Search Parameters

Mode	TOP_K	Windows	QL Vals	Evals	vs v1.3
--fast (default)	8	2	3	24	+33%
--full (normal)	24	4	8	192	+50%
--full (boost)	48	5	8	384	+200%

CSV Evidence: Frame 0

GPT: blackman/o=32/ql=8 → 31,178B. CLA & GEM: blackman/o=22/ql=10 → 31,228B. Δ=50B on one frame. Multiply by ~105 qualifying tonal frames → potential ~5 KB from search expansion alone.

Expected Impact

Component	Ethereal Arc	Abyssal
TOP_K 16→24	0.1–0.2% (8–16 KB)	0.05–0.1%
Tonal boost	0.1–0.3% (8–24 KB)	0.05–0.1%
Improved proxy	0.05–0.1% (4–8 KB)	0.05%
Combined X2	0.2–0.5% (16–40 KB)	0.1–0.2%

06X3: Stitching Overhead Elimination

CONTINUE_xFLAC_X3_Stitching.md

The Core Problem

Raw-optimal segment sums beat lax_t6 by 7 KB (Ethereal) and 31 KB (Abyssal). Segment stitching adds 29 KB and 45 KB of header overhead respectively. We already compress better — we just waste it on headers.

X3a: Segment Merging

Simplest Win · 1 Week

After arena picks winners, merge adjacent segments with the same winner and re-encode as one larger segment. On Ethereal Arc where flake_L11 wins 15/16 segments, this collapses to ~2 segments — eliminating ~14 boundary penalties.

Estimated savings: 3,000–5,000B on Ethereal, 4,000–6,000B on Abyssal.

X3b: Overlapping Segments

Cold-Start Elimination · 1 Week

Extend each segment by one block (16384 samples) before its start. Encoder processes extended segment; we keep only frames after the warm-up block. Every frame gets preceding audio context — no cold-start LPC penalty.

Cost: ~3% encoding time increase. Estimated savings: 2,000–4,000B on Ethereal, 4,000–8,000B on Abyssal.

X3c: Frame-Level Competition (Optional)

Nuclear Option · 2 Weeks · Defer Until Needed

Encode entire file with each external encoder at the same blocksize, parse into frame arrays, select best encoder per FLAC frame via MDL. Zero stitching overhead by construction. Loses vFLAC variable-blocksize advantage. Only if X3a+X3b don't close the gap.

Gap Closure Projection

Step	Ethereal vs lax_t6	Status
Current (avFLAC v1.2 + vFLAC v1.5)	+7,042B	nearly there
After X3a (merge)	+2,000 to +5,000B	closing
After X3a+X3b	−2,000 to +2,000B	likely win
After X1+X2+X3	−5,000 to −10,000B	confident win

07Implementation Timeline

X0: --fast / --full Mode (COMPLETED)

Done · CONTINUE_vFLAC_FastFull.md

Added --fast (18 evals/subframe) and --full (128 evals) modes. Proved DCC pruning isn't losing compression — identical output on Ethereal Arc. Established that stitching overhead is the bottleneck.

X1: Transient Detection & Block Splitting

1–2 weeks · ~130 lines new · vFLAC v1.3 → v1.4

Onset detector + block plan rewrite + LPC order cap. Validate on EOTS and LRR clips (key transient tests). No regression on FH tonal clips.

X2: LPC Search Expansion & Tonal Boost

1–2 weeks · ~60 lines changed · vFLAC v1.4 → v1.5

TOP_K expansion + DCC-gated boost + improved proxy. Validate against GPT/GEM/CLA CSVs per-frame.

X3a: Segment Merging

1 week · ~50 lines · aFLAC + avFLAC

Post-arena merge of same-winner segments. Re-encode merged regions. Track boundary count reduction.

X3b: Overlapping Segments

1 week · ~40 lines · aFLAC

Extend segments by one warm-up block. Trim after encoding. Compare per-frame sizes at boundaries.

X3c: Frame-Level Arena (if needed)

2 weeks · ~200 lines · new aFLAC architecture

Whole-file multi-encoder → per-frame MDL → stream rebuild. Only if X3a+X3b insufficient.

08Expected Outcomes

Ethereal Arc Performance Projection

Stage	avFLAC (bytes)	vs FLAC-12	vs lax_t6
Current (AF_whole)	7,965,119	−0.39%	+13,530
After X1	~7,960,000	−0.45%	+8,400
After X1+X2	~7,945,000	−0.64%	−6,600
After X1+X2+X3	~7,935,000	−0.76%	−16,600

vFLAC Standalone Projection

Stage	vFLAC (bytes)	vs FLAC-8	Notes
Current v1.3	8,032,276	+0.45%	Loses on mixed content
After X1	~8,020,000	+0.30%	Transient blocks help
After X1+X2	~7,995,000	−0.01%	Match FLAC-8 standard

avFLAC < lax_t6

The arena stitch beats the best single external FLAC encoder — validated on two songs.

09The xFLAC Encoder Family

  WAV Input
    │
    │   8Z_Encode.py (orchestrator, --fast/--full)
    │
    ├───────────────────────────────────────────────┐
    │                                               │
    ▼                ▼                ▼              ▼
  aFLAC v1.3     vFLAC v1.3     avFLAC v1.2    AC v1.7H
  1633 lines     859 lines      960 lines      (separate)
  Arena:         Pure Python    Hybrid:
  segment        FLAC writer    arena +        Non-FLAC
  compete        variable BS    vFLAC as       format
  + stitch                      candidate

    │                │                │
    ▼                ▼                ▼
  _AF.flac       _VF.flac       _AVF.flac
  Best arena     Variable-BS    MDL picks
  stitch         standalone     smallest

aFLAC (Arena FLAC)

Segments the audio, encodes each segment with multiple external FLAC encoders (lax, flake, flaccl, ffmpeg), picks the smallest per segment, stitches winning segments into a valid .flac file. Currently beats FLAC-12 on both test songs. Primary bottleneck: stitching overhead.

vFLAC (Variable-Blocksize FLAC)

Pure Python FLAC encoder with variable block sizes and exhaustive LPC search. Writes standard .flac files. Wins on tonal content (−2% to −4%) but loses badly on mixed/transient (+11%). X1 and X2 target this encoder.

avFLAC (Arena + Variable)

Runs both aFLAC and vFLAC, adds vFLAC as an arena candidate. MDL selects the smallest output from: AF_whole, VF_whole, or arena stitch. Currently MDL picks AF_whole on both songs because arena stitch is larger (stitching overhead).

10Risk Matrix

Risk	Phase	Impact	Mitigation
Onset over-splitting	X1	Header overhead exceeds gains	Merge pass + min block 1024
Tonal regression	X1	Block plan fragments tonal content	High threshold (6.0) + FH validation
Speed regression	X2	Boost mode too slow	Strict should_boost() criteria
Proxy backfire	X2	New proxy rejects good candidates	A/B test on all 7 clips first
Merge re-encode time	X3a	Extra encoding passes	Net neutral: 1 large replaces N small
Overlap context mismatch	X3b	Arena comparison unfair	Full re-benchmark after change
Combined insufficient	All	Still > lax_t6 after X1+X2+X3	X3c frame-level competition

11Core Principles

FLAC Spec Compliance

Every output must pass flac -t and decode to bit-identical PCM. We operate within the FLAC specification — no custom extensions, no metadata hacks. Any standard FLAC decoder must play our files.

MDL as Sole Judge

Minimum Description Length selects winners. No human intuition about "which encoder should win" — the smallest valid output wins, period. MDL cost penalties are honest: every bit corresponds to an actual message.

Lossless Non-Negotiable

SHA3 round-trip verification on every encode. Decoded PCM must be byte-identical to input WAV. No exceptions, no "close enough," no perceptual shortcuts.

R&D Phase: Forward Only

No backward compatibility constraints. Old versions remain archived. Development moves forward. Every sprint can break the API if it improves compression.

12Companion Documents

Document	Status	Scope
CONTINUE_vFLAC_FastFull.md	DONE	--fast/--full modes for vFLAC
CONTINUE_xFLAC_X1_Transient_Blocks.md	READY	Onset detection + block plan rewrite
CONTINUE_xFLAC_X2_LPC_Search.md	READY	TOP_K expansion + tonal boost
CONTINUE_xFLAC_X3_Stitching.md	READY	Segment merging + overlap + frame-level
8Z-AC_Classical_Max_Roadmap.html	DONE	Parallel track: AC C1–C5 roadmap
8Z_Audio_Peer_Review_Feedbacks.md	DONE	DreamTeam review of both tracks

Parallel track: The AC “Classical Max” roadmap (C1–C5) targets non-FLAC compression improvements. The xFLAC sprint targets FLAC-format improvements. Both tracks share the same test corpus and peer review framework but are architecturally independent.

Raw compression wins are real.

X1 fixes vFLAC. X2 deepens the search. X3 eliminates the overhead.
The arena stitch will beat lax_t6.