Every paper on GPU + TSP does the same thing: take an existing operator (2-opt, 3-opt, genetic), parallelize its move evaluation on the GPU, and go faster. They speed up the legs of the solver. Nobody speeds up the eyes.
The parallelism split
Information-theoretic sensors (LZ76, SampEn, transfer entropy) are sequential — each symbol depends on previous context. LZ compression cannot be parallelized on GPU. These sensors are CPU-bound forever.
Geometric sensors (tri_area, circum_r, tri_compact, voronoi_adj, fold_dist) are embarrassingly parallel: each triple (i, i+1, i+2) is independent. Pure vector math on (x,y) coordinates. For n=3496 cities: ~3500 triples × 3 FP operations = ~10,000 ops. GPU does this in one microsecond. CPU takes milliseconds. Ratio: ∼1000× faster sensor evaluation.
Arena v1.5 evidence — qa194 (194 cities)
157 variants tested (17 sensors × 8 laws × parameter levels). 40 of 157 found exact optimal on qa194 — 11 from geometric sensors, 29 from info-theoretic. Confirms that many configurations can solve small instances; the discrimination happens at larger n.
| Category |
Exact optimal (gap=0.00%) |
GPU-parallel? |
uy734 ranking |
| Geometric sensors |
11 of 40 exact |
Yes · ~1000× |
#3–#16 (degrades) |
| Info-theoretic sensors |
29 of 40 exact |
No (sequential) |
#1–#6 (wins) |
| echo ★NEW (BD) |
exact — #7 on qa194 |
Yes · ~1000× |
#6 on uy734 — stable |
| gravity ★NEW (BD) |
0.310% — #13 on qa194 |
Yes · ~1000× |
#3 on uy734 |
Key finding: On qa194 (n=194), geometric sensors excel. On uy734 (n=734), info-theoretic sensors win and geometric sensors fall to #14–16. The ranking inverts around n≈300 — a phase transition, not gradual degradation. See Arena Lab for the full ranking.
What this means for nu3496
Our current nu3496 run uses LZ_binary DCC governance on CPU. At n=3496, the binary improvement stream is too sparse — like looking through a keyhole at a 3496-city landscape.
Arena v1.5 found that LZ_dual + PI achieves 0.92% on uy734 in just 150 seconds — vs 2.13% for the classic LZ_binary + BB. With GPU geometric sensors (gravity, echo), each cluster in a hierarchical partition would get ~1000× more governance cycles. Fleet leader W13 is at 1.241% at t≈61h — the next improvement may come from better sensor configuration, not longer runtime.
Hardware-dependent MDL — updated ranking
On CPU, large instances (n≥734): LZ_dual + PI wins (#1 on uy734).
On CPU, small instances (n≤194): geometric sensors win (40 variants exact on qa194).
On GPU: gravity + [law] and echo + [law] become optimal — parallel sensors, ~1000× faster cycles.
Top GPU candidates (uy734 rank):
gravity ★NEW — #3 overall, #1 GPU · Barnes-Hut segment energy
echo ★NEW — #6 overall, #2 GPU · spatial miss rate, stable across scales
circum_r — #7 overall, #3 GPU · circumradius of triples
tri_compact — #5 on qa194, #8 GPU (falls to #16 on uy734 — phase transition)
The hardware changes which algorithm is optimal. The optimal description depends not just on the data, but on the machine that processes it.