Every paper on GPU + TSP does the same thing: take an existing operator (2-opt, 3-opt, genetic), parallelize its move evaluation on the GPU, and go faster. They speed up the legs of the solver. Nobody speeds up the eyes.
The parallelism split
Information-theoretic sensors (LZ76, SampEn, transfer entropy) are sequential — each symbol depends on previous context. LZ compression cannot be parallelized on GPU. These sensors are CPU-bound forever.
Geometric sensors (tri_area, circum_r, tri_compact, voronoi_adj, fold_dist) are embarrassingly parallel: each triple (i, i+1, i+2) is independent. Pure vector math on (x,y) coordinates. For n=3496 cities: ~3500 triples × 3 FP operations = ~10,000 ops. GPU does this in one microsecond. CPU takes milliseconds. Ratio: ∼1000× faster sensor evaluation.
Arena evidence — L_total = 0.00 configurations (qa194, 30 sec, 1 worker)
| Sensor |
Governor |
Gap |
L_total |
Sensor type |
GPU-parallelizable? |
| CUSUM |
BB |
0.000% |
0.00 |
Sequential |
No |
| tri_area |
ADSR |
0.000% |
0.00 |
Geometric |
Yes · ~1000× |
| tri_compact |
BB |
0.000% |
0.00 |
Geometric |
Yes · ~1000× |
Two of three perfect-score configurations use GPU-parallelizable sensors. On CPU, all three perform equally. On GPU, the two geometric configurations run ~1000× more governance cycles in the same time budget — meaning faster convergence and likely exact optimal on larger instances where the sequential sensor stalls.
What this means for nu3496
Our current nu3496 run uses LZ-based DCC governance on CPU. At n=3496, each LZ measurement processes a 3496-element sequence sequentially.
With GPU geometric sensors: tri_area measurement = 3496 parallel multiplications = microseconds vs. LZ's milliseconds. More governance cycles = faster convergence = lower gap in same wall time. The geometric sensors proposed by a non-CS human from bed at 1 AM may become the fastest TSP governance signals on GPU hardware — not because they're theoretically better, but because they're architecturally parallel.
Hardware-dependent MDL
On CPU: LZ_binary + CDPID (information-theoretic) is optimal.
On GPU: tri_area + BB (geometric, zero parameters) becomes optimal.
The hardware changes which algorithm wins. The optimal description depends not just on the data, but on the machine that processes it. This is hardware-dependent MDL — a framing that does not appear in the existing literature.