AIM³ Institute · Ljubljana · March 2026

8Z-xFLAC Optimization Sprint

Three-phase roadmap to close the stitching gap — X1 through X3

Author Bojan Dobrečevič + Claude Opus 4.6
Status Planning Complete
Date 2026-03-09

01Where We Stand: March 2026

WINS
avFLAC BEATS lax_t6
Abyssal (−14.7 KB)
−348 KB
avFLAC vs lax_t6
Radiohead 60s (−6.8%)
+7 KB
Gap to lax_t6
Ethereal (stitching)
15 / 15
avFLAC Wins
15-Clip vs FLAC-12

The xFLAC bundle — aFLAC v1.3, vFLAC v1.5, and avFLAC v1.2 — produces valid .flac files. As of March 2026, avFLAC beats lax_t6 on Abyssal by 14,667 bytes — our first full-track win over the strongest single FLAC encoder. vFLAC v1.5 with MDL block probe won 21/22 arena segments. On 15 benchmark clips, avFLAC beats FLAC-12 on all 15 and beats the best baseline on 8. Three Radiohead clips beat even OptimFROG by 31–40%.

The remaining gap: Ethereal Arc still +7 KB behind lax_t6 (stitching overhead), and 24-bit content (LG-DWAS) has a vFLAC block probe bias that needs bit-depth-aware correction.

Sprint goal: Close the 13.5 KB gap to lax_t6 on Ethereal Arc, fix vFLAC's mixed-content losses, and make the arena stitch beat every single whole-file encoder.

Ethereal Arc Benchmark (48 kHz stereo, 89.5 s)

#CodecBytesvs FLAC-12Notes
1OptimFROG7,530,383−421,206Range coder · closed source
2WavPack -hh7,934,862−16,727Context mixing
3lax_t67,951,589baselineBest single FLAC encoder · TARGET
48Z-avFLAC v1.27,958,631+7,042✓ VF_whole · vFLAC 16/16 segs
58Z-vFLAC v1.57,958,631+7,042✓ MDL block probe · 190/280 changed
68Z-aFLAC v1.37,965,119+13,530✓ Arena + stitch
7FLAC -12 (max)7,996,121+44,532Standard reference

Abyssal Benchmark (48 kHz stereo, 180.2 s)

#CodecBytesvs FLAC-12Notes
1OptimFROG16,283,191−1,162,008Range coder · 6% gap
28Z-avFLAC v1.217,305,557−139,642BEATS lax_t6 by 14,667B · arena stitch
3lax_t617,320,224−124,975Previous best single FLAC encoder
48Z-aFLAC v1.317,369,506−75,693✓ Beats FLAC-12
5FLAC -12 (max)17,445,199baseline
68Z-vFLAC v1.517,512,422+67,223Standalone (VF_whole)

02The Stitching Overhead Problem

This is the single most important finding from the xFLAC R&D cycle so far. The --fast vs --full experiment on Ethereal Arc produced identical output (7,965,119 bytes) despite --full evaluating 42 candidates per subframe vs --fast's 4–11. The DCC pruning isn't losing compression — the candidate pool already contains the winners. The bottleneck is not search depth. It is stitching overhead.

Ethereal Arc Forensics
lax_t6 whole:     7,951,589B
aFLAC raw-sum:    7,936,170B  ← WINS by 7,133B
aFLAC stitched:   7,965,119B  ← headers ADD 28,949B
Net vs lax_t6:   +13,530B     ← loss
Abyssal Forensics (March 2026 — WINS!)
lax_t6 whole:    17,320,224B
avFLAC stitch:   17,305,557B  ← WINS by 14,667B
vFLAC wins:      21/22 segs   ← vFLAC v1.5 MDL block probe dominates
Seg raw sum:     17,301,297B  ← raw is even better
Abyssal is solved. vFLAC v1.5 with MDL block probe won 21/22 arena segments, and the arena stitch beat lax_t6 by 14,667 bytes. Ethereal Arc remains the target — raw wins by 7 KB but stitching overhead erases it. X3 stitching elimination will close this gap.

Overhead Budget Per Segment Boundary

SourceBytesNotes
Frame number UTF-8 growth1–3Per frame; larger numbers = more bytes
Cold-start LPC penalty50–200Per segment; first frame has no context
Short tail block100–500Per segment; remainder < blocksize
Total per boundary200–70016 segments × ~400B = ~6,400B structural
Context-loss penalty~22,500Remaining gap: per-segment LPC < whole-file

03Three-Phase Roadmap

┌──────────────────────────────────────────────────────────────┐
│  X1  Transient Blocks    │  vFLAC v1.3 → v1.4           │
│      onset detection      │  block plan rewrite            │
│      1024-sample attack   │  LPC order cap per class       │
├───────────────────────────┼──────────────────────────────────┤
│  X2  LPC Search Expand   │  vFLAC v1.4 → v1.5           │
│      TOP_K 16→24         │  DCC-gated tonal boost        │
│      improved proxy       │  +hamming window               │
├───────────────────────────┼──────────────────────────────────┤
│  X3  Stitch Elimination  │  aFLAC v1.3 → v1.4           │
│   a: segment merging     │  avFLAC v1.2 → v1.3          │
│   b: overlapping segs    │  cold-start elimination        │
│   c: frame-level arena   │  (optional, if a+b insufficient)│
└───────────────────────────┴──────────────────────────────────┘

Cumulative gain estimates (65% stacking discount):
  X1  Transient blocks      →  0.1–0.3%  (~8–24 KB)
  X2  LPC search expansion  →  0.2–0.5%  (~16–40 KB)
  X3  Stitch elimination    →  0.1–0.4%  (~8–30 KB)
  Combined:                    0.3–0.8%  (~24–65 KB)
  Need to beat lax_t6:         0.17%     (13.5 KB)

04X1: Transient Detection & Adaptive Block Splitting

CONTINUE_xFLAC_X1_Transient_Blocks.md
Problem
vFLAC assigns fixed block sizes by signal class (tonal→16384, mixed→8192, transient→4096). A drum hit's attack is ~50–200 samples but wastes an entire 4096-sample block. On mixed content: +11.19% on EOTS, +11.88% on LRR.
Solution

Two-stage onset detection: coarse scan (energy derivative between 1024-sample windows) followed by sample-level refinement (64-sample hops). Place 1024–2048 sample blocks precisely around attacks, then immediately return to optimal larger blocks for sustain/decay. FLAC spec allows any block size 16–65535. No other FLAC encoder does this.

Key Design Choices

Onset Detection Algorithm
Energy derivative + ZCR discontinuity with signal-class-dependent thresholds (tonal=6.0, mixed=3.0, transient=2.0). Stage 2 refinement only triggers on flagged windows (~5–15% of total), keeping overhead low.
Adaptive LPC Order Cap
Peer review: "Orders above ~16 overfit on mixed content." Cap per block class: transient→12, pre-onset→16, tonal→32 (unchanged). Reduces search time AND improves compression.

Expected Impact

Content TypeCurrent vFLACAfter X1Δ
Pure tonal (FH clips)−2% to −4%−2% to −4%0%
Mixed (WDIG, AAI)+0.5% to +1.5%−0.5% to +0.5%~1%
Heavy transient (EOTS, LRR)+11% to +12%+3% to +6%~6–8%
Ethereal Arc (mostly tonal)+0.5%≤ +0.5%~0.1%

05X2: LPC Search Expansion & Tonal Boost

CONTINUE_xFLAC_X2_LPC_Search.md
Problem
Three-variant CSV comparison: GPT v1.7H beats GEM by 80 KB and CLA by 17 KB on Ethereal Arc (263 frames). GPT finds blackman/o=32/ql=8 where others settle for o=22/ql=10 — saving 50 bytes per frame on tonal content. The advantage concentrates on HIGH-budget tonal frames.
Solution

Wider search: TOP_K 16→24 (50% more candidates survive proxy screening).

Tonal boost: When difficulty < 0.05 AND pred_gain > 35 dB, trigger exhaustive 384-eval search (TOP_K=48, 5 windows). Affects ~40% of frames on Ethereal Arc.

Improved proxy: Coefficient-aware cost estimate includes quantization loss + coefficient overhead. Fewer false rejections of high-order candidates.

Search Parameters

ModeTOP_KWindowsQL ValsEvalsvs v1.3
--fast (default)82324+33%
--full (normal)2448192+50%
--full (boost)4858384+200%
CSV Evidence: Frame 0
GPT: blackman/o=32/ql=8 → 31,178B. CLA & GEM: blackman/o=22/ql=10 → 31,228B. Δ=50B on one frame. Multiply by ~105 qualifying tonal frames → potential ~5 KB from search expansion alone.

Expected Impact

ComponentEthereal ArcAbyssal
TOP_K 16→240.1–0.2% (8–16 KB)0.05–0.1%
Tonal boost0.1–0.3% (8–24 KB)0.05–0.1%
Improved proxy0.05–0.1% (4–8 KB)0.05%
Combined X20.2–0.5% (16–40 KB)0.1–0.2%

06X3: Stitching Overhead Elimination

CONTINUE_xFLAC_X3_Stitching.md
The Core Problem
Raw-optimal segment sums beat lax_t6 by 7 KB (Ethereal) and 31 KB (Abyssal). Segment stitching adds 29 KB and 45 KB of header overhead respectively. We already compress better — we just waste it on headers.

X3a: Segment Merging

Simplest Win · 1 Week

After arena picks winners, merge adjacent segments with the same winner and re-encode as one larger segment. On Ethereal Arc where flake_L11 wins 15/16 segments, this collapses to ~2 segments — eliminating ~14 boundary penalties.

Estimated savings: 3,000–5,000B on Ethereal, 4,000–6,000B on Abyssal.

X3b: Overlapping Segments

Cold-Start Elimination · 1 Week

Extend each segment by one block (16384 samples) before its start. Encoder processes extended segment; we keep only frames after the warm-up block. Every frame gets preceding audio context — no cold-start LPC penalty.

Cost: ~3% encoding time increase. Estimated savings: 2,000–4,000B on Ethereal, 4,000–8,000B on Abyssal.

X3c: Frame-Level Competition (Optional)

Nuclear Option · 2 Weeks · Defer Until Needed

Encode entire file with each external encoder at the same blocksize, parse into frame arrays, select best encoder per FLAC frame via MDL. Zero stitching overhead by construction. Loses vFLAC variable-blocksize advantage. Only if X3a+X3b don't close the gap.

Gap Closure Projection

StepEthereal vs lax_t6Status
Current (avFLAC v1.2 + vFLAC v1.5)+7,042Bnearly there
After X3a (merge)+2,000 to +5,000Bclosing
After X3a+X3b−2,000 to +2,000Blikely win
After X1+X2+X3−5,000 to −10,000Bconfident win

07Implementation Timeline

X0: --fast / --full Mode (COMPLETED)
Done · CONTINUE_vFLAC_FastFull.md

Added --fast (18 evals/subframe) and --full (128 evals) modes. Proved DCC pruning isn't losing compression — identical output on Ethereal Arc. Established that stitching overhead is the bottleneck.

X1: Transient Detection & Block Splitting
1–2 weeks · ~130 lines new · vFLAC v1.3 → v1.4

Onset detector + block plan rewrite + LPC order cap. Validate on EOTS and LRR clips (key transient tests). No regression on FH tonal clips.

X2: LPC Search Expansion & Tonal Boost
1–2 weeks · ~60 lines changed · vFLAC v1.4 → v1.5

TOP_K expansion + DCC-gated boost + improved proxy. Validate against GPT/GEM/CLA CSVs per-frame.

X3a: Segment Merging
1 week · ~50 lines · aFLAC + avFLAC

Post-arena merge of same-winner segments. Re-encode merged regions. Track boundary count reduction.

X3b: Overlapping Segments
1 week · ~40 lines · aFLAC

Extend segments by one warm-up block. Trim after encoding. Compare per-frame sizes at boundaries.

X3c: Frame-Level Arena (if needed)
2 weeks · ~200 lines · new aFLAC architecture

Whole-file multi-encoder → per-frame MDL → stream rebuild. Only if X3a+X3b insufficient.

08Expected Outcomes

Ethereal Arc Performance Projection

StageavFLAC (bytes)vs FLAC-12vs lax_t6
Current (AF_whole)7,965,119−0.39%+13,530
After X1~7,960,000−0.45%+8,400
After X1+X2~7,945,000−0.64%−6,600
After X1+X2+X3~7,935,000−0.76%−16,600

vFLAC Standalone Projection

StagevFLAC (bytes)vs FLAC-8Notes
Current v1.38,032,276+0.45%Loses on mixed content
After X1~8,020,000+0.30%Transient blocks help
After X1+X2~7,995,000−0.01%Match FLAC-8 standard
avFLAC < lax_t6
The arena stitch beats the best single external FLAC encoder — validated on two songs.

09The xFLAC Encoder Family

  WAV Input
    │
    │   8Z_Encode.py (orchestrator, --fast/--full)
    │
    ├───────────────────────────────────────────────┐
    │                                               │
    ▼                ▼                ▼              ▼
  aFLAC v1.3     vFLAC v1.3     avFLAC v1.2    AC v1.7H
  1633 lines     859 lines      960 lines      (separate)
  Arena:         Pure Python    Hybrid:
  segment        FLAC writer    arena +        Non-FLAC
  compete        variable BS    vFLAC as       format
  + stitch                      candidate

    │                │                │
    ▼                ▼                ▼
  _AF.flac       _VF.flac       _AVF.flac
  Best arena     Variable-BS    MDL picks
  stitch         standalone     smallest
aFLAC (Arena FLAC)
Segments the audio, encodes each segment with multiple external FLAC encoders (lax, flake, flaccl, ffmpeg), picks the smallest per segment, stitches winning segments into a valid .flac file. Currently beats FLAC-12 on both test songs. Primary bottleneck: stitching overhead.
vFLAC (Variable-Blocksize FLAC)
Pure Python FLAC encoder with variable block sizes and exhaustive LPC search. Writes standard .flac files. Wins on tonal content (−2% to −4%) but loses badly on mixed/transient (+11%). X1 and X2 target this encoder.
avFLAC (Arena + Variable)
Runs both aFLAC and vFLAC, adds vFLAC as an arena candidate. MDL selects the smallest output from: AF_whole, VF_whole, or arena stitch. Currently MDL picks AF_whole on both songs because arena stitch is larger (stitching overhead).

10Risk Matrix

RiskPhaseImpactMitigation
Onset over-splittingX1Header overhead exceeds gainsMerge pass + min block 1024
Tonal regressionX1Block plan fragments tonal contentHigh threshold (6.0) + FH validation
Speed regressionX2Boost mode too slowStrict should_boost() criteria
Proxy backfireX2New proxy rejects good candidatesA/B test on all 7 clips first
Merge re-encode timeX3aExtra encoding passesNet neutral: 1 large replaces N small
Overlap context mismatchX3bArena comparison unfairFull re-benchmark after change
Combined insufficientAllStill > lax_t6 after X1+X2+X3X3c frame-level competition

11Core Principles

FLAC Spec Compliance
Every output must pass flac -t and decode to bit-identical PCM. We operate within the FLAC specification — no custom extensions, no metadata hacks. Any standard FLAC decoder must play our files.
MDL as Sole Judge
Minimum Description Length selects winners. No human intuition about "which encoder should win" — the smallest valid output wins, period. MDL cost penalties are honest: every bit corresponds to an actual message.
Lossless Non-Negotiable
SHA3 round-trip verification on every encode. Decoded PCM must be byte-identical to input WAV. No exceptions, no "close enough," no perceptual shortcuts.
R&D Phase: Forward Only
No backward compatibility constraints. Old versions remain archived. Development moves forward. Every sprint can break the API if it improves compression.

12Companion Documents

DocumentStatusScope
CONTINUE_vFLAC_FastFull.mdDONE--fast/--full modes for vFLAC
CONTINUE_xFLAC_X1_Transient_Blocks.mdREADYOnset detection + block plan rewrite
CONTINUE_xFLAC_X2_LPC_Search.mdREADYTOP_K expansion + tonal boost
CONTINUE_xFLAC_X3_Stitching.mdREADYSegment merging + overlap + frame-level
8Z-AC_Classical_Max_Roadmap.htmlDONEParallel track: AC C1–C5 roadmap
8Z_Audio_Peer_Review_Feedbacks.mdDONEDreamTeam review of both tracks
Parallel track: The AC “Classical Max” roadmap (C1–C5) targets non-FLAC compression improvements. The xFLAC sprint targets FLAC-format improvements. Both tracks share the same test corpus and peer review framework but are architecturally independent.
Raw compression wins are real.
X1 fixes vFLAC. X2 deepens the search. X3 eliminates the overhead.
The arena stitch will beat lax_t6.