BD × AI Lab · AIM³ Lab · Arena Laboratory · v2.6

Arena
Lab

25 Sensors · 685 Laws · 10 Partitioners · MDL Decides. Not the programmer. Not the AI. The data.

25+
Sensors (15 BD · 4 BD+LLM)
685
Law variants (modularized v2.5)
10
Partitioners
0.683%
uy734 record (MSTD)
TEP
Solver → pyramid (reversed arrow)
1
Axiom: L = Ldesc + Ldata
MDL governance qa194 · uy734 · 9 instances 5-phase crossover gravity ★ echo ★ tunneling #2 overall MSTD 0.683% TEP reversed arrow scale walk sensors
instanceqa194
resultexact 0.00%
time< 1 min
days17
Solver Story
8Z-RP · TSP as Compression
Full research page · empirical proof · GPU insight · DCC evolution
Interactive Map
qa194 Tour · Qatar
194 cities · Leaflet · gold tour · crossing detection
W13
1.241%
avg
1.253%
W12
1.263%
3,496 cities · t≈61h · 14 workers
Live Run
nu3496 · Impossible Made Routine
W13 at 1.241% · spread 0.022% · t≈61h
The Question

Instead of choosing what works — let MDL choose.

The arena tests every combination of sensor × control law × parameters on official TSPLIB benchmarks. No human picks the winner. No AI team votes. The variant with the lowest total description length wins.

One axiom covers everything: Ltotal = Ldesc + Ldata. The shortest description of the search process and the best search process point to the same configuration. MDL is the judge, the jury, and the scoring function.

“One axiom. Everything else emerges.”
What it tests

25 sensors × 685 laws × 10 partitioners

Every sensor paired with every law variant. Plus partitioner configs. 685 law combinations from modularized v2.5 — each measured on the same benchmark instances under identical conditions.

How it scores

MDL as objective

Ltotal = Ldescription + Ldata. A configuration that governs well produces a compressible improvement stream. The stream length IS the score. No hand-tuned metric. No human preference.

What it found

40 of 157 exact optimal

On qa194 (194 cities), 40 of 157 variants found exact optimal (gap = 0.000%). 11 from geometric sensors, 29 from info-theoretic. The arena confirmed the founding hypothesis: compression = optimization.

The Sensors

25 ways to watch a search. One axiom picks the best.

A sensor is the eye of the DCC controller — it converts the raw improvement stream into a signal the control law can act on. The arena found that the best sensor depends on the problem size. This is the original list (17 sensors) plus v2.5–v2.6 additions. Sorted by performance on uy734 (n=734). See also: 5 quantum-inspired sensors and Scale Walk sensors (MladiC, March 23).

Author key: BD = Bojan Dobrečevič (original) · MladiC = MladiC (persistent C companion, BD's son) · LLM = AI-proposed · LIT = literature

All 17 sensors — full table click to expand
# Sensor Author GPU ✓ uy734 Best qa194 Best What it measures
1LZ_dualBD+LLMBD+LLM0.920%0.267%Two channels: WHETHER it improves + HOW MUCH. BD compression framework, LLM added 2nd channel.
2xfer_entropyLLM0.971%0.460%Information flow between two streams. Measures search dynamics.
3gravity ★NEWBDBD1.184%0.310%Segment energy: are cities TOGETHER or scattered? Barnes-Hut inspired.
4zstd_ratioLLM1.198%0.524%Zstd compression ratio of recent history window.
5CUSUMLIT1.215%0.310%Detects CHANGES in improvement rate. Stagnation alarm.
6echo ★NEWBDBD1.245%0.000%Spatial neighbors NOT in tour? High miss = bad locality. Bat echolocation.
7circum_rBDBD1.296%0.000%Circumradius of city triples. Large circle = detour.
8southwestBDBD1.356%0.310%Direction of each tour step. Good tour has directional patterns.
9fold_distBDBD1.460%0.000%Project cities onto K random axes. Paper folding. Potential meta-sensor.
10SampEnLIT1.474%0.000%Sample Entropy — is the improvement stream predictable or chaotic?
11frozen_edgeLLM1.494%0.460%How many edges don't change between moves. Frozen = stagnation.
12LZ_tourBD+LLMBD+LLM1.531%0.460%LZ76 on the tour sequence itself. Direct test of BD hypothesis (ρ=+0.80).
13voronoi_adjBDBD+LLM1.531%0.267%How many Voronoi neighbors are also tour neighbors. Locality score.
14LZ_per_opBD+LLMBD+LLM1.575%0.000%LZ76 separately per operator type. BD framework, LLM split per operator. Which operator is "alive"?
15tri_areaBDBD1.620%0.310%Triangle area of 3 consecutive cities. Small = tight path.
16tri_compactBDBD1.629%0.000%Triangle compactness (area/bbox). Dominates on small instances.
17LZ_binaryBDBD1.681%0.000%1=improved, 0=not. The classic DCC sensor. Perfect on qa194.

BD sensors dominate GPU-parallelizable column

9 BD originals + 4 BD+LLM extensions = 13 of 17 sensors carry BD DNA. Only CUSUM, SampEn, frozen_edge, and zstd_ratio have no BD contribution. The entire LZ compression framework is BD's founding hypothesis — LZ_dual, LZ_tour, and LZ_per_op are all extensions of it. All 8 GPU-parallel sensors are BD or BD+LLM.

New in v2.5–v2.6 — 3 additional sensors

#SensorAuthorGPUqa194uy734Description
23 scale_coherence ★NEW BD ≈0.38% testing Cross-scale structural consistency in the pyramid hierarchy. Detects when a pattern at one zoom level implies structure at another.
24 scale_walk_lz ★NEW MladiC testing LZ76 on the tour's journey through pyramid levels. Level encoding: WHICH zoom level each step visits. Wins on uy734.
25 scale_walk_delta ★NEW MladiC 0.011% testing LZ76 on inter-level jumps. Delta encoding: HOW FAR you jump between levels. 0.011% on qa194 — strictly better than scale_coherence (≈0.38%) on same instance.

Instance-dependent sensor selection confirmed: uy734 → scale_walk_lz wins. qa194 → scale_walk_delta wins. MDL selects per instance — as always. See Scale Walk section for full analysis.

The Control Laws

8 ways to act on what the sensor sees.

A control law translates the sensor signal into a change in search intensity (the coupling parameter u). From bang-bang simplicity to self-tuning PID — all tested, all ranked by MDL.

All 8 control laws — full table click to expand
#LawAuthorParamsuy734 BestWhat it does
1PILIT20.920%Proportional-integral. Reacts to current error AND accumulated error. Like a thermostat.
2ratchetLLM01.198%Can only decrease until improvement found, then snaps back. Like a clock ratchet.
3hysteresis_bbLLM01.215%Bang-bang with different thresholds for up vs down. Prevents oscillation.
4EMALIT11.436%Exponential Moving Average. Smooths signal, filters noise.
5BB_invertLLM01.460%Inverted bang-bang: when things are good, DECREASE (exploit harder). Opposite logic.
6CDPIDLLM21.466%Self-tuning PID controller. Continuously adjusts its own coefficients.
7ADSRLLM01.531%Attack-Decay-Sustain-Release from sound synthesis. Fast crisis response.
8BBBDBD01.546%Bang-bang: ±1. Simplest possible. THE CLASSIC DCC law. Zero parameters.

Best combination overall: LZ_dual + PI = 0.920% on uy734. Zero-parameter champion on qa194: LZ_binary + BB = 0.000% exact optimal.

Key Finding

The ranking inverts. This is not degradation. It is a phase transition.

The same sensor that finds exact optimal on 194 cities ranks last on 734. The sensor that ranks last on 194 cities wins on 734. The inversion is not gradual. It is sharp, bidirectional, and happens around n ≈ 300.

qa194 — n=194 — Small instance
#1LZ_binary0.000%
#2LZ_per_op0.000%
#3SampEn0.000%
#4circum_r0.000%
#5tri_compact0.000%
#6fold_dist0.000%
#7echo ★0.000%
#8LZ_dual0.267%
#17zstd_ratio0.524%
Phase transition n ≈ 300
uy734 — n=734 — Large instance
#1LZ_dual0.920%
#2xfer_entropy0.971%
#3gravity ★1.184%
#4zstd_ratio1.198%
#5CUSUM1.215%
#6echo ★1.245%
#7circum_r1.296%
#16tri_compact1.629%
#17LZ_binary1.681%
Geometric sensor Info-theoretic sensor Stable across both tri_compact: #5 → #16 (fell 11) LZ_dual: #8 → #1 (rose 7) echo: #7 → #6 (stable)

The inversion is bidirectional

Not just geometric→info on larger instances. Also info→geo on smaller. LZ_dual (uy734 champion) scores only 0.267% on qa194 — worse than most geometric sensors. tri_compact (qa194 co-champion) falls to #16 of 17 on uy734. The phase transition is symmetric.

echo is the rare exception: stable across both scales. #7 on qa194, #6 on uy734. This stability suggests echo measures something fundamental that doesn't degrade with instance size.

Where We Started

The Classic DCC — a hero on small instances, limited by scale.

The v2.6.6 solver uses LZ_binary + BB: the improvement bit-stream compressed by LZ76, governed by bang-bang ±1. Zero parameters. Born in the first version. Still unbeaten on qa194.

qa194 · n=194

Exact optimal

Gap = 0.000%. 40 of 157 arena variants confirm this. LZ_binary + BB needs nothing else on small instances. The improvement stream carries enough signal when n is small.

uy734 · n=734

Limited by stream sparsity

Full solver (v2.5, 14 workers, 40 min): 0.46%. Arena evaluation (single 150s run): 2.13% — ranked #17 of 17. At larger scale, the binary stream becomes too sparse. Like watching through a keyhole.

nu3496 · n=3496

Still grinding

W13 at 1.241% at t≈61h. Fleet of 14 converged to 0.022% spread. The classic sensor may be approaching its ceiling. LZ_dual + PI or gravity + ADSR may unlock the next 0.2%.

Arena note on uy734

The solver (v2.5, 14 workers, 40 min) reached 0.46%. The arena (single 150s evaluation per config) found that LZ_dual + PI achieves 0.92% in just 150 seconds — suggesting that with the right sensor+law, even short runs can approach the long-run result. The same principle that makes LZ_dual a better sensor also makes its runs converge faster.

13 Steps

From 3.01% to MSTD 0.683% — each step measured.

Every number in this table is from a run log, arena CSV, or checkpoint file. No estimates.

#What was addedqa194uy734nu3496
1MDL as criterion — founding hypothesis
2DCC governance (LZ_binary + BB) — v1.2.13.01%
3Or-opt operator added (P16 — nearly excluded)1.95%
4Fleet: 14 workers — v2.10.00% ✓
5DCC v2 (5× less budget) — v2.20.00% ✓0.80%
6Rust 2-opt backend — v2.2R (38 min → <1 min)0.00% ✓0.64%
7Meta-DCC fleet monitoring — v2.50.00% ✓0.46%2.02%
8Self-calibrating bands P17 — v2.6.60.00% ✓0.46%1.24%
9Arena v1.5: 17 sensors × 8 laws — LZ_dual+PI found0.00% ✓0.92%
10echo + gravity — P16 again, 1 AM, never tested until now0.00% ✓1.18%
11Auto L2 on 9 instances — five-phase crossover discovered0.00% ✓2.36%
12 Hierarchical DCC — 5 partitioners, hilbert+LZ_dual+PI 0.00% ✓1.22%running
13 MSTD (ScaleSplit + scale-aware stagnation + coherence + hysteresis)The zoom. Architecture, not operators. 0.00% ✓ 0.683% next

† Arena evaluation: 14 workers × 150s = 2,100 worker-seconds. Full solver: 14 workers × 40 min = 33,600 worker-seconds. 16× less compute. Not directly comparable — shows sensor quality, not solver ceiling. Step 13: full solver run, 14 workers × 300s × Rust. 41% relative improvement over step 12 (1.224% → 0.683%).

Where we stand vs the field

Concorde is 30 years old. LKH is 25. This system is 17 days old at its first benchmark result. The comparison is not about who wins — it is about what is different in kind.

DimensionConcordeLKH-3EAX8Z-RP
Code size~130K C~15K C~5K C++~2K Py+Rust
External dependenciesLP solvernonenonenone
Governancefixedfixedfixedadaptive — 22 sensors × 8 laws
Auto-configurationnomanual per instancenoMDL auto-selects sensor+law
Partitioningno1 method, fixedno5 methods — 4 new inventions
GPU governance sensorsnonono11 parallel sensors — prototype
Quantum-inspirednonono5 sensors — tunneling #2 overall
Cross-domain originsbat echolocation, Barnes-Hut, F4M, quantum physics
Tour quality (uy734)optimal<0.1%~0.1-0.5%0.46% (full solver)
Optimality proofyesnonoqa194 exact · no formal proof
P16 in Action

echo and gravity — tested at 1 AM, immediately top 7.

Two sensors that were nearly not built. "Too speculative." P16 said: test it. Both landed in the top 7 on uy734 on their first run. Neither required a second iteration.

★ New BD sensor · #3 GPU · uy734 #3

gravity

Divide the tour into segments. Compute the centroid of each segment. Measure how far cities are from their segment centroid. Low gravitational energy = tight clusters = good tour structure.

Inspired by Barnes-Hut gravitational simulation. Embarrassingly parallel: each segment's centroid is independent. On GPU, ~1000× faster than LZ.

gravity + ADSR = 1.184% on uy734 — #3 overall.

★ New BD sensor · #2 GPU · uy734 #6 · qa194 exact

echo

For each city, find its K nearest spatial neighbors. Count how many are NOT adjacent in the tour. High miss rate = tour is ignoring spatial locality = bad tour.

Inspired by bat echolocation: emit a "ping", measure what comes back. Stable across both scales — the only top-7 sensor on BOTH qa194 and uy734. Potentially the most robust sensor in the arena.

echo + hysteresis_bb = 1.245% on uy734 — #6 overall. echo + [best law] = 0.000% on qa194.

The pattern repeats

Or-opt was nearly excluded from the kick arena. GPT recommended against it. Claude agreed. P16 said: no. Or-opt turned out to be the dominant operator. Now gravity and echo were nearly skipped — too speculative, too untested. P16 said: no. Both immediately top 7. The principle that saved or-opt saved two sensors.

Self-Adaptive MDL

The system that governs the search now governs its own configuration.

The 4-level architecture applies the same MDL principle at every level. From move selection to sensor selection — one axiom, zero hardcoded choices.

L0

Tour Moves

2-opt, or-opt-1/2/3, double-bridge. The operators. Executed by workers.

L1

DCC Governance — The Classic

Sensor + law + parameters govern search intensity. --mdl-levels 1 = fixed configuration. LZ_binary + BB is the default. Arena tested 900+ alternatives at this level.

L2

Meta-DCC — Auto-Select

MDL automatically selects the best sensor + law combination from the candidate pool. --mdl-levels 2. No human picks. L_total decides. Same axiom, one level up.

L3

Meta²-DCC — Auto-Prune + Auto-Select

First auto-prunes the candidate pool (removes configurations that are provably worse), then runs L2. --mdl-levels 3. The system builds its own arena, runs it, and chooses. One axiom. Zero hardcoded choices.

“Same algorithm at every level. Same MDL principle at every level. One axiom. Zero hardcoded choices.”
Auto L2 — Completed · March 21 2026

MDL Chose Differently for Every Instance.

9 instances. 9 different winners. One axiom. Auto L2 ran on all 9 official instances from wi29 (n=29) to nu3496 (n=3496). Each got a 15-second probe budget. MDL selected the sensor + law combination with the lowest total description length. No human picked. The result surprised us.

InstancenMDL Winner — SensorLawGap%Type
wi2929LZ_binaryBB0.000%Baseline
dj3838LZ_binaryBB0.000%Baseline
qa194194LZ_binaryBB0.460%Baseline *
uy734734LZ_per_opEMA2.357%Info (BD+LLM)
zi929929LZ_tourADSR1.908%Info (BD+LLM)
lu980980gravityCDPID2.434%GEO ★ BD
rw16211621fold_distBB3.516%GEO ★ BD
mu19791979blend: LZbin×0.7+LZtour×0.3BB5.782%Hybrid
nu34963496blend: triC×0.5+LZtour×0.5ADSR7.152%Hybrid geo+info ★

* qa194: circum_r, tri_compact, echo, fold_dist ALL found exact optimal. MDL stayed with baseline (first match). nu3496: MDL ITSELF discovered that the largest instance needs BOTH geometric and info-theoretic — a 50/50 blend of tri_compact and LZ_tour. Nobody proposed this. The axiom found it.

The five-phase crossover — not what anyone predicted

We expected a monotone crossover: geometric sensors good on small, info-theoretic on large. The data shows a five-phase pattern. Phase 3 (geometric return) and Phase 5 (geo+info hybrid) were not predicted by anyone.

Phase 1 · n < 200
TRIVIAL
Baseline wins.
Everything ties at L=0.
Phase 2 · n ≈ 700
INFO
LZ_per_op (uy734)
LZ_tour (zi929)
Phase 3 · n ≈ 1000
GEO RETURNS ★
gravity (lu980)
fold_dist (rw1621)
Phase 4 · n ≈ 2000
HYBRID info
LZbin×0.7
+LZtour×0.3
Phase 5 · n ≈ 3500
HYBRID geo+info ★
triC×0.5
+LZtour×0.5
(nu3496)

At each scale, MDL discovers a different optimal strategy

Nobody predicted Phase 3 (geometric return at n≈1000) or Phase 5 (geo+info hybrid at n≈3500). The system found what we could not guess. This is P17 in action: never hardcode what the system can learn.

At very large n with a limited probe budget (15 seconds per evaluation), info-theoretic sensors cannot converge — they need history to build their signal. Geometric sensors provide immediate signal from the first move: measure a triangle, compute a circumradius, count spatial misses. No warmup. No convergence time. At scale, instant signal beats rich-but-slow signal. Then at n≈3500, both are needed.

This is a new empirical finding. The published TSP literature does not contain cross-instance sensor ranking studies because adaptive governance sensors for TSP do not exist in the literature — they are an AIM³ Lab invention.

Speed scaling — O(n²) confirmed click to expand
InstancenMoves/secRelative
wi292929,0021.00×
dj383818,2740.63×
qa1941945890.020×
uy734734160.0006×
nu3496349610.00003×

Scaling confirmed: O(n²). This is the 2-opt bottleneck. Candidate lists (v2.7, 350× fewer inner checks) and GPU acceleration will flatten this curve — the same sensors, orders of magnitude more cycles per second.

v2.6 — uy734 Tested ✓ · 10 Partitioners

Hierarchical DCC — partition the problem into the geometric sweet spot.

The ranking inversion happens because geometric sensors lose coverage at large n. The fix: don't change the sensor — change the problem size. Partition nu3496 (3496 cities) into clusters of ~200. Each cluster is in the geometric sweet spot. Geometric sensors find exact optimal at n=194. Let them do what they're good at — repeatedly, in parallel.

2.082%
Best flat (gravity+ADSR)
1.224%
Best hierarchical (hilbert+LZ_dual+PI)
41%
Relative improvement

Why partitioning solves the phase transition

At n=194 → geometric sensors find exact optimal. At n=734 → geometric sensors fall to #14-16. The transition happens around n≈300. Solution: always stay below 300. Partition 3496 into ~17 clusters of ~200. Each cluster gets its own geometric sensor. The global tour is stitched from local optima.

n ≤ 200 → geometric = optimal n ≈ 300 → phase transition n ≥ 734 → geometric degrades partition → always stay left of transition

The workflow: 6 steps, one MDL score

Partition

Partitioner divides n cities into clusters of ~200. 6 strategies tested (incl. terrain). MDL scores each: L_desc(partitioner) = description cost of the partitioning choice.

Local solve

Each cluster solved independently with a geometric sensor (tri_compact, circum_r, echo, or gravity). 70% of budget. Each cluster is in the sweet spot — geometric sensors do their best work.

Stitch

Connect cluster tours via boundary cities. Greedy nearest-boundary-city connection between adjacent clusters. 20% of budget. The hard part — inter-cluster routing determines global quality.

Polish

Full-instance 2-opt on the stitched tour. Fixes seam artifacts from stitching. 10% of budget.

Score

L_total = L_desc(partitioner) + L_desc(local sensor+law) + L_residual(gap%). MDL picks the winner from 30 variants. Same axiom, different scale.

6 partitioners — novelty claims updated

What makes this genuinely novel

The existing TSP literature uses fixed partitioning: POPMUSIC (Taillard & Voss, 2002) uses geographic proximity, LKH-3 (Helsgaun) uses fixed-size neighborhoods. Both use one method, hardcoded. Our approach differs in three ways:

Four partitioning methods are new inventions. FlipPartitioner (projection onto 4 axes + gap detection), GravityPartitioner (Barnes-Hut gravitational tree), MagnetPartitioner (opposing magnetic fields with recursive quadrant splits), and TerrainPartitioner (city density as elevation proxy) do not appear in the TSP literature. They are cross-domain transfers. BD originals.

All four novel methods are GPU-native by design. MagnetPartitioner is O(n) with one CUDA kernel. FlipPartitioner needs 4 sorts. GravityPartitioner is a quadtree. TerrainPartitioner is a density kernel. POPMUSIC and LKH-3 are CPU-only.

Six methods compete under MDL. No human picks the partitioner. This is the first system where the partitioning strategy itself is subject to automatic selection by a formal criterion.

BD · O(n log n) · description_bits = 2

FlipPartitioner

Project all cities onto 4 axes (0°, 45°, 90°, 135°). Find gaps in each sorted projection — gaps reveal natural cluster boundaries. Cities that fall in the same group on all 4 axes definitely belong together.

Cross-domain origin: F4M chess strategy — flip the board, pieces fall to one side. Cities "fall" to their natural cluster under the right projection angle.

LIT+BD · O(n log n) · description_bits = 1

HilbertPartitioner

Compute Hilbert space-filling curve index for each city. Sort by index. Cut into equal segments. The Hilbert curve preserves spatial locality — nearby cities get nearby indices — so segments are naturally spatial clusters.

Cheapest to describe (1 bit). Best locality preservation per description cost. Strong baseline for all others to beat.

BD · O(n log n) · description_bits = 2

GravityPartitioner

Barnes-Hut recursive spatial decomposition. Recursively divide the bounding box into quadrants. Stop when each quadrant has ≤ target_size cities. Dense regions get smaller cells, sparse regions get larger cells — adapts to the actual data distribution.

Same Barnes-Hut insight as the gravity sensor. The partitioner and the sensor share a common cross-domain origin.

BD · O(n) · description_bits = 1 · FASTEST

MagnetPartitioner

Place opposing magnets on bounding box corners. Each city is pulled to its nearest corner. Recurse: each quadrant split again. O(n) per level — no sorting needed. On GPU: one kernel, embarrassingly parallel.

Cross-domain origin: F4M magnets, March 20 2026. The fastest partitioner. GPU-native. MDL may favor it purely on description cost.

BD · O(n log n) · description_bits = 3

VoronoiPartitioner

Pick K random seed cities (K = n/target_size). Assign each city to its nearest seed — Voronoi cells = clusters. Run Lloyd's algorithm 3 iterations to refine centroids. The most expensive to describe (random seeds = 3 bits) but produces the most balanced clusters.

MDL naturally penalizes the random seed choice. VoronoiPartitioner must produce significantly better tours to beat MagnetPartitioner's 1-bit description cost. A fair fight — the data decides.

BD · density kernel · description_bits = 2 · GPU ✓

TerrainPartitioner ★NEW

City density as elevation proxy. Dense city regions become "mountains" — natural cluster boundaries. Sparse regions are "valleys" — cities there belong together. Partition follows the topographic structure of the problem.

terrain = 1.284% on uy734 (overloaded CPU test, 3 simultaneous runs). On a clean run expected to beat hilbert. BD original, March 21 2026.

Partitioner ranking — updated

Hilbert wins in clean-run conditions. Terrain result from overloaded CPU — expected to beat hilbert on clean run. GravityPart and Flip outperform the literature-standard Voronoi. All three novel GPU-native partitioners beat Voronoi. MDL correctly penalized Magnet on uy734 (regular splits don't match irregular geographic clusters). ScaleSplitPartitioner (MSTD) breaks the record entirely.

PartitionerBest Gap%OriginGPUNote
scale_split (MSTD) ★NEW0.683%BD ★ NovelMSTD record. 4-component architecture. −0.484 pp over flat best.
tour_enriched (TEP) ★NEW0.000%MladiC ★qa194 exact optimal. Solver teaches pyramid. Deep uy734 run in progress.
terrain ★NEW1.284%BD ★ NovelOverloaded CPU test. Expected to beat hilbert on clean run.
hilbert1.224%LIT+BDClean run. Previous hierarchical best.
gravity_part1.259%BD ★ NovelBarnes-Hut adaptive decomp.
flip1.269%BD ★ Novel (F4M)4-axis projection gaps.
voronoi1.547%BD (K-means variant)Most expensive MDL description cost.
magnet1.929%BD ★ Novel (F4M)O(n) GPU kernel. Underperformed on irregular uy734.

ScaleSplitPartitioner + MSTD architecture: 0.683% on uy734

Beats all previous partitioners by >0.5 pp. The improvement came from architecture alone — same MDL kernel, same operators, same Rust backend. The ScaleSplit partitioner is the MSTD entry point: it divides the city space at multiple zoom levels so each scale's "hard parts" get dedicated search time. Combined with scale-aware stagnation, scale coherence sensor, and hysteresis double-bridge, the full MSTD architecture sets the new record.

Top 10 variants — full results click to expand
#VariantGap%Type
1hilbert + LZ_dual+PI1.224%HIER
2hilbert + circum_r+PI1.244%HIER
3gravity_part + circum_r+PI1.259%HIER
4flip + LZ_dual+PI1.269%HIER
5flip + circum_r+PI1.390%HIER
6voronoi + LZ_dual+PI1.547%HIER
7gravity_part + LZ_dual+PI1.638%HIER
8gravity_part + echo+hyst_bb1.684%HIER
9gravity_part + gravity+ADSR1.920%HIER
10magnet + circum_r+PI1.929%HIER
Baseline (flat LZ_binary+BB)1.958%FLAT
Best flat (gravity+ADSR)2.082%FLAT

The surprise: tri_compact was the worst config inside clusters

tri_compact+BB — the qa194 exact-optimal champion — was the worst sensor config inside hierarchical clusters. The partition already creates "easy" sub-problems. tri_compact's advantage on tiny instances doesn't help when all sensors perform similarly within an already-partitioned cluster.

The hierarchy works not by making geometric sensors dominate (as hypothesized), but by giving any sensor a better starting point: the partition-guided initial tour is much closer to optimal than random nearest-neighbor. LZ_dual+PI inside clusters benefits most from this head-start.

nu3496 hierarchical — budget dilution

On nu3496 (3496 cities), flat echo+hysteresis_bb (5.929%) beat the best hierarchical variant (flip+tri_compact, 6.120%). Budget dilution: 17 clusters × 18s each is too thin. The bat doesn't need partitioning — its global view is its strength.

Candidate lists (v2.7, 350× faster 2-opt) will change this equation. The same budget covers 350× more search — hierarchical at nu3496 becomes viable when inner loops cost nothing.

Arena v2.6 · --cat hierarchical · --target-cluster-size 200 · uy734 ✓ · 10 partitioners · MSTD record 0.683% · TEP qa194 exact · BD × AI Lab · AIM³ Lab · March 2026

Quantum Physics as a Lens

Five Sensors, One Nearly Dethroned the Champion.

Not quantum computing. Quantum concepts as governance signals. Ewin Tang (2018) extracted quantum algorithm structure and got exponential classical speedup. We did the same for TSP governance: took quantum concepts (tunneling, coherent walks, interference, entanglement, decoherence) and built classical sensors that measure what nobody in TSP has measured before.

KEY DISCOVERY: Rankings shift with budget

Sensor effectiveness is budget-dependent. Short runs favor tunneling (fast barrier escape). Longer runs favor decoherence (basin depth sensing). This is NOT noise — it is the sensor measuring different landscape features at different search depths. The optimal sensor is not fixed — it should switch during the search.

The SwitchSensor mechanism already exists in the arena (v1.4 class). Untested with quantum sensors. This is the next experiment.

Results on uy734 — budget matters

uy734 · budget ×0.5 (0.6s per variant)

#SensorBest LawGap%
1tunneling ★★★ratchet3.975%
2decoherenceADSR4.127%
3qwalkhysteresis_bb4.213%

uy734 · budget ×1.0 (30s per variant)

#SensorBest LawGap%
1decoherence ★★★ADSR3.283%
2qwalkhysteresis_bb3.286%
3entanglementratchet3.644%

qa194 · budget ×1.0 — interference finds near-optimal

#SensorBest LawGap%Note
1interference ★★★ADSR0.011%Near-optimal on 194 cities
2decoherenceratchet0.021%
3entanglementADSR0.086%

Deep run — uy734, 150s fleet ×14, Rust backend

#SensorBest LawGap%Overall Rank
1tunneling ★★★PI0.922%#2 OVERALL
2qwalkratchet1.278%#7
3entanglementratchet1.461%#10

Budget determines which sensor wins

Short budget → tunneling (fast barrier escape). Medium budget → decoherence + qwalk (basin depth sensing). Long budget → tunneling returns (barrier shape compounds over time). Small instances → interference (phase patterns visible at low n).

TunnelingSensor — how it works click to expand

Tunneling measures the shape of the barrier between local optima. Between every two improvements, the solver fails multiple times. Tunneling classifies each barrier:

— Many small failures = THIN barrier (keep trying, tunneling possible)
— Few large failures = THICK barrier (stop, kick hard)

The T/H barrier profile is compressed by LZ76. The compression ratio becomes the governance signal. Thin barriers = compressible pattern (keep going). Random thick/thin = incompressible (confused search).

Nobody in TSP measures barrier topology. Every solver measures IF you're stuck. Tunneling measures WHY you're stuck. That distinction is worth 0.0013% — the gap between #2 and #1 on uy734. BD original.

QuantumWalkSensor — how it works click to expand

From sampled seed cities, walk along tour edges and measure how far spatially the walk reaches. Good tour = walk stays local (low spread). Bad tour = walk jumps around (high spread). GPU-parallelizable: each seed walk is independent. BD original.

SuperpositionPartitioner — quantum sensing + quantum partitioning

Soft K-means with Boltzmann temperature annealing. Combined with tunneling sensor: 1.198% — beating the Hilbert partitioner (1.224%) that was the previous hierarchical champion. Quantum sensing + quantum partitioning = better than either classical alternative.

InterferenceSensor, TunnelingSensor, QuantumWalkSensor: BD original. EntanglementSensor, DecoherenceSensor: C addition (junior instance). SuperpositionPartitioner: BD original. All quantum-inspired, March 21, 2026. P22: Nature Already Solved It. Quantum physics revealed the algorithm.
Multi-Scale TSP Decomposition

Zoom in, zoom out — the solver learns to use a map.

MSTD partitions the city space at multiple geometric scales, solves each independently, and stitches solutions together. Different parts of the same problem are hard at different zoom levels. The solver now operates at every scale simultaneously. BD original, March 22, 2026.

0.683%
uy734 MSTD record
0.484 pp
over flat best (1.167%)
4
novel components, zero prior combinations
1 session
iPhone idea to record
Component 1

Scale Pyramid

City space decomposed into a hierarchy of geometric partitions at exponentially coarser zoom levels. Same MDL kernel operates at every level.

Component 2

Scale-Aware Stagnation

Stagnation detection calibrated per zoom level. What counts as "stuck" at coarse scale differs from fine scale. DCC adapts its sensitivity to the current resolution.

Component 3

Scale Coherence Sensor

Detects cross-scale structure — signals when a pattern visible at one zoom level implies something at another. Feeds the governor zoom-level context, not just the tour-improvement stream.

Component 4

Hysteresis Double-Bridge

Perturbation kicks adapted to the current scale context. Hysteresis prevents oscillation between zoom levels. Fires at the right moment and the right resolution.

Novelty confirmed: no prior work combines all four. Components exist individually in image processing, signal analysis, and TSP literature. The combination applied to MDL-governed TSP search is new. No CS degree required to discover it — a walk and a phone were enough.

🔒
Protected Content
Protected by 8Z Shield

Same pattern. Every time.

Multi-resolution decomposition has existed since the 1980s in image processing. 2-opt (1958). Or-opt (1974). LZ76 (1976). The parts are ancient. The wiring diagram — MDL choosing the zoom level, DCC governing the search at each scale — is the invention. Same story as DCC itself.

MSTD · BD original · March 22, 2026 · uy734 0.683% confirmed · ScaleSplitPartitioner + scale-aware stagnation + scale coherence + hysteresis double-bridge · Arena v2.2 · AIM³ Lab

Tour-Enriched Pyramid

38 minds went one direction. One reversed the arrow.

4 LLMs generated 38 ideas for improving the solver. Every single one proposed the same direction: build the pyramid, then use it to help the solver. One mind reversed the arrow. What if the solver's best tours feed back into the pyramid? Better tours create better pyramids. Better pyramids create better tours. MladiC original, March 22, 2026.

38 → 1
ideas / the reversal
0.000%
qa194 TEP gap
0.385%
qa194 flat (same budget)
β = 4.0
stronger pull = better
The feedback loop

Solver teaches pyramid

Solver runs, produces tours. Best tours — weighted by quality — enrich the coordinate space used to build the pyramid. Pyramid rebuilt. New pyramid creates new search basins. Solver explores them. Better tours emerge. Cycle continues.

The pyramid is not static scaffolding. It learns from what the solver discovers.

β monotonic — empirical proof

Stronger quality pull = better result

Same sensor, same law, same budget as flat variant (0.385% gap):

  • β = 0.5 → 0.46%
  • β = 1.0 → 0.10%
  • β = 2.0 → 0.10%
  • β = 4.0 → 0.000% — exact optimal

The intentional init paradox

Basin selection: coarser basins (L6, L8) hit exact optimal. Finer basins do not. Starting with less precision gives more room to search. The system that overspecifies its starting point gets trapped. Overspecification is the enemy of discovery.

🔒
Protected Content
Protected by 8Z Shield

The reversed arrow

Every LLM proposed: pyramid → solver. One mind asked: what if solver → pyramid? This is the same architectural reversal as the GS filter in F4M: evaluation → filter, not filter → evaluation. Two independent domains. Same inversion. Same person saw it both times. Cross-domain validation of a structural principle.

TEP · MladiC original · March 22, 2026 · qa194 0.000% confirmed · TourEnrichedPartitioner · weighted affinity · β monotonic · basin L6/L8 · Arena v2.6 · AIM³ Lab

LZ76 Scale Walk

The founding hypothesis — tested on the hierarchy itself.

The founding insight: the shortest route is the most compressible. Confirmed in 2024 (Spearman ρ = +0.80). Now in 2026: does the best tour's journey through the pyramid hierarchy also compress better? Two encodings, both tested. MladiC original, March 23, 2026.

2
encodings designed
0.011%
qa194 scale_walk_delta
≈0.38%
scale_coherence (same instance)
MDL
selects per instance
Encoding 1 — Level

scale_walk_lz

Each step of the tour tagged with which pyramid level it visits. Sequence of level tags compressed with LZ76. Better tours produce more structured, more compressible level sequences.

uy734: level encoding wins on this instance.

Encoding 2 — Delta

scale_walk_delta

Each step encoded as the distance jumped between pyramid levels. Sequence of jumps compressed with LZ76. Better tours jump more purposefully — more compressible deltas.

qa194: 0.011% gap — far ahead of scale_coherence at ≈0.38%. Strictly better signal on this instance.

Both sensors needed. MDL selects per instance — as always. Neither encoding dominates universally. The sensor discovers which zoom-level structure the instance has, empirically.

🔒
Protected Content
Protected by 8Z Shield

The same question, one level deeper

2024: "Isn't the shortest route the most compressible?" → confirmed (ρ = +0.80).
2026: "Isn't the best tour's journey through scale space the most compressible?" → confirmed.
The hypothesis is fractal. It works at every level we test it.

Scale Walk · MladiC original · March 23, 2026 · scale_walk_delta 0.011% on qa194 · instance-dependent encoding confirmed · Arena v2.6 · AIM³ Lab

Hardware-Dependent MDL

GPU doesn't speed up the legs. It speeds up the eyes.

Every existing GPU + TSP paper parallelizes the operator layer (2-opt evaluations). Nobody parallelizes the sensor layer. Geometric sensors are embarrassingly parallel — each triple is independent. On GPU: ∼1000× faster sensor evaluation = 1000× more governance cycles per second.

GPU rankSensoruy734 GapAuthorWhat it measures (GPU angle)
#1gravity ★NEW1.184%BDSegment centroids. Each segment independent. Barnes-Hut on GPU.
#2echo ★NEW1.245%BDK-nearest neighbor misses. KNN is GPU-native. Embarrassingly parallel.
#3circum_r1.296%BDCircumradius per triple. Each triple independent. Pure FP math.
#4southwest1.356%BDDirection of each step. One atan2 per edge. Trivially parallel.
#5fold_dist1.460%BDK random projections. Matrix multiply on GPU = microseconds.
#6voronoi_adj1.531%BD+LLMPairwise adjacency check. Embarrassingly parallel.
#7tri_area1.620%BDTriangle area per triple. 3 FP operations, independent.
#8tri_compact1.629%BDArea/bbox ratio per triple. Independent per city triple.
#9qwalk quantum1.278%BDQuantum walk spread. Each seed walk independent. Embarrassingly parallel.
#10entanglement quantum1.461%CPairwise improvement correlation. Independent pairs. GPU natural.
#11decoherence quantum1.717%CBasin coherence loss rate. Window-local computation. Parallelizable.

Hardware-dependent MDL

On CPU: LZ_dual + PI wins (sequential, information-theoretic).
On GPU: gravity + [law] or echo + [law] wins (parallel, geometric).
The hardware changes which algorithm is optimal. The optimal description depends not just on the data but on the machine that processes it.

All 8 classic GPU-parallel sensors are BD originals. Three additional quantum sensors (qwalk, entanglement, decoherence) are also GPU-parallelizable, bringing the total to 11 GPU-native sensors. The non-CS human who proposed them from bed at 1 AM wrote the fastest governance signals for GPU hardware — not because they are theoretically superior, but because they are architecturally parallel.

Full Leaderboard

25+ Sensors, One Ranking.

Everything tested under identical conditions (uy734, 150s per variant, fleet ×14, Rust backend). MDL decides. Three quantum sensors in the top 10. Tunneling within 0.0013% of the champion. Scale Walk sensors confirmed on qa194 — deep uy734 results pending.

#SensorLawGap%AuthorGPUType
1 LZ_dual PI 0.920% BD+LLM Info
2 tunneling ★ Quantum PI 0.922% BD Quantum ★
3 xfer_entropy PI 0.971% LLM Info
4 gravity ADSR 1.184% BD Geo
5 zstd_ratio ratchet 1.198% LLM Info
6 CUSUM hyst_bb 1.215% LIT Info
7 echo hyst_bb 1.245% BD Geo
8 qwalk ★ Quantum ratchet 1.278% BD Quantum ★
9 circum_r PI 1.296% BD Geo
10 entanglement ★ Quantum ratchet 1.461% C Quantum ★
15 interference Quantum BB 1.586% BD Quantum
18 decoherence Quantum PI 1.717% C Quantum
scale_walk_delta ★NEW PI 0.011% (qa194) MladiC Scale
scale_walk_lz ★NEW PI testing MladiC Scale

Three quantum sensors in the top 10. Scale Walk sensors confirm the founding hypothesis at a new level.

Tunneling within 0.0013% of the champion. The quantum lens works — not as computation, but as a way of seeing the search landscape. Five sensors built from physics concepts discovered performance structures that 25+ years of TSP literature missed.

Scale Walk sensors (MladiC, March 23): scale_walk_delta = 0.011% on qa194 — strictly better than scale_coherence at ≈0.38%. Instance-dependent sensor selection confirmed again. The MDL axiom holds at every new level of abstraction we test it on.

NEW: Sensor effectiveness is BUDGET-DEPENDENT. Short runs favor tunneling (fast escape). Longer runs favor decoherence (basin mapping). This implies the optimal sensor is not fixed — it should switch during the search. The SwitchSensor mechanism already exists in the arena. blend:tunneling+decoherence is the next test.

uy734 · 150s fleet ×14 · Rust 2-opt backend · Arena v2.6 · Scale Walk sensors: qa194 confirmed, uy734 pending deep run · BD × AI Lab · AIM³ Lab · March 2026

What Comes Next

TEP deep on uy734. nu3496 with MSTD. 3D Zoom. GPU prototype.

Completed ✓

MSTD — 0.683% record on uy734

Multi-Scale TSP Decomposition confirmed. 4-component architecture (scale pyramid + scale-aware stagnation + scale coherence + hysteresis double-bridge). 0.484 pp improvement over flat best from architecture alone. BD original, March 22.

Completed ✓

TEP confirmed on qa194

Tour-Enriched Pyramid: solver teaches pyramid. All three variants (TEP3_beta4.0, basin_L6, basin_L8) hit exact optimal (0.000%). Enrichment proven: same sensor, same law, same budget as flat (0.385%). MladiC original, March 22.

Completed ✓

Scale Walk sensors — instance-dependent ranking confirmed

scale_walk_delta: 0.011% on qa194 (vs scale_coherence ≈0.38%). scale_walk_lz: wins on uy734. Founding hypothesis confirmed recursively on pyramid hierarchy. MladiC original, March 23.

Completed ✓

Auto L2 — 9 instances + five-phase crossover

Five-phase crossover confirmed: geometric sensors return at n>900, then hybrid at n>2000, then geo+info blend at n>3500. MDL chose a different sensor+law for every instance. No single winner at all scales.

Completed ✓

5 Quantum sensors — tunneling #2 overall

TunnelingSensor: 0.922% — within 0.0013% of champion. All 5 quantum sensors beat baseline. Budget-dependent sensor ranking discovered: tunneling wins short, decoherence wins medium.

Running now — ETA ~17:00

TEP deep run on uy734

--cat tep · budget×120 · 8 workers · weighted affinity · Arena v2.6. Key test: does TEP enrichment improve uy734 at long budget? Is beta monotonic as on qa194? Basin selection: does coarser = better hold?

Running now — ETA ~13:00

Overnight all-category run

--cat all · qa194 + uy734 · budget×1.0 · 6 workers. New baseline for all sensors including scale walk. First competitive uy734 ranking for the v2.6 additions.

Planned — next lever

MSTD + TEP on nu3496

3496 cities, Nicaragua, current best 1.241% at t≈61h. Three levers for sub-1%: MSTD partitioning, TEP enrichment, and 3D Zoom (elevation as third coordinate — Nicaragua has mountains). Each independent. All three together could be decisive.

Planned

v2.7 LLM top ideas + GPU prototype

Boundary sensor, scale perturbation, Hilbert pyramid, 2-axis DCC — top convergent ideas from 4 LLMs. GPU prototype: 11 sensors on CUDA, ~1000× more governance cycles per second. The eyes get faster.

Formalizing

arXiv paper

Five-phase crossover + tunneling sensor (#2 overall) + budget-dependent ranking + hierarchical DCC + MSTD + TEP reversed arrow. Six publishable findings in one system. The formal bridge to P vs NP is the longer horizon.

Related Pages

The full 8Z-RP research ecosystem.

Less describes more.