State Continuity
Small changes → incremental eval updates. Destroyed by rotation: 20–45 tokens move at once.
Every AI evaluator looked at this game and filed it under entertainment. It’s a consciousness test.
12 AI models from 7 companies independently evaluated the BD × AI Lab portfolio under a frozen protocol. All rated it highly. All identified consciousness as highest-impact. And all 12 filed Flip4M under “Games” — missing that it’s a falsifiable consciousness experiment already labelled as an AGI benchmark on its own research page.
Between February and March 2026, 12 AI models from 7 companies evaluated the BD × AI Lab portfolio under a frozen protocol. The evaluation covered every page, every paper, every tool.
Every evaluator rated the portfolio highly. Every evaluator identified the consciousness branch as highest-impact. And every single one categorized Flip4M as “Creative & Fun,” “Games,” or “Entertainment.” None of them noticed that a playable board game on the same site tests the same hypothesis they ranked as most important.
The evaluators used the same cognitive architecture that fails at Flip4M itself: sequential search without coherence maintenance across domains. They processed the consciousness section and the games section as separate categories. The connection between them IS the test.
The concrete version: the evaluation protocol instructs models to “evaluate each branch separately.” This literally forces compartmentalization. The evaluators didn’t fail individually — the evaluation methodology itself has the same architectural limitation that Flip4M exploits. The protocol needed a DCC: something to maintain coherence across categories and flag cross-domain connections. This is the A8 pattern (Proactive Connection Duty) at industrial scale.
| Model | Company | Flip4M Categorized As | What It Actually Is |
|---|---|---|---|
| Gemini 3.1 Pro | Creative / Games | Consciousness test | |
| Qwen 3.5 Plus | Alibaba | Entertainment | Consciousness test |
| ChatGPT 5.4 Thinking | OpenAI | Creative / Fun | Consciousness test |
| Grok 4 | xAI | Games | Consciousness test |
| Grok 4.2 | xAI | Creative / Fun | Consciousness test |
| GPT-5.4 | OpenAI | Creative / Fun | Consciousness test |
| Claude Sonnet 4.6 | Anthropic | Games | Consciousness test |
| DeepSeek R1 | DeepSeek | Entertainment | Consciousness test |
| Qwen | Alibaba | Creative / Games | Consciousness test |
| Kimi K2.5 Thinking | Moonshot | Games | Consciousness test |
| Claude Opus 4.6 | Anthropic | Creative / Fun | Consciousness test |
| Gemini 3.1 Pro | Games | Consciousness test |
Source: BD Model Assessments · Batch 1 & 2 · Frozen protocol
Not about tree size. About structural collapse. Every AI system can play chess at superhuman level. Current LLMs struggle with Flip4M — testable by anyone right now: ask any AI to play and observe performance in rotation-heavy positions. The difference is exactly what the Claustrum-Consciousness Hypothesis predicts.
Simulated board state. One rotation — nearly every token moves. This is what V ≈ 0.54 looks like.
In chess, complexity comes from the Horizon Effect: the truth is 20 moves deep on a stable board. In Flip4M, complexity comes from Structural Collapse: the truth is 4 moves deep but the board rules change. A depth-4 search is useless when the evaluation landscape can be globally restructured by a single move. This is the “Horizon of Chaos.”
The Shannon Number (10123) counts legal chess moves. But in practice, only “sensible” moves matter:
When filtered through “sensible” moves, Flip4M’s effective decision tree is 1012 times larger than chess. Not because the game is bigger — because more of the moves matter. Every rotation creates a fundamentally different game.
Standard engines rely on three architectural assumptions. Flip4M destroys all three.
Small changes → incremental eval updates. Destroyed by rotation: 20–45 tokens move at once.
Same position via different paths. Destroyed by gravity settling: identical tokens produce different positions under different gravity.
“Good” features stay valuable across plies. Destroyed when a strong vertical stack becomes scattered debris after rotation.
Why Flip4M’s difficulty maps precisely onto what the Claustrum-Consciousness Hypothesis predicts.
Cn (Coherence) = Rotation Resilience. How much strategic structure survives a gravity rotation.
Ψ (Complexity) = Tactical Richness. How many threatening lines and forcing sequences exist.
Analog: Conscious (wake). The DCC’s goal: keep the board here.
Analog: Seizure (draw). Stable but nothing happening.
Analog: Noise / Delirium. Complex but collapses on perturbation.
Analog: Deep sleep. Empty board, nothing to govern.
LLMs have enormous Ψ — they can calculate, analyze, find tactics. They have near-zero Cn maintenance ability: they cannot hold coherent strategic understanding through a global perturbation. When the board rotates, their “understanding” shatters and they rebuild from scratch.
This is exactly what CCH says happens when you remove the claustrum: not coma (loss of Ψ, like thalamic damage), but delirium (loss of Cn, fragmentation of coherent experience). An LLM playing Flip4M after a rotation is in delirium — it has the processing power but the world broke apart.
Coherence maintenance is trivial (the board barely changes per move). LLMs and classical engines dominate because depth of search is what matters.
Search depth is necessary but insufficient. You need coherence maintenance THROUGH perturbation. Classical engines don’t have it. LLMs don’t have it. Humans have it. DCC provides it.
The prediction: Same AI, superhuman at chess, fails at Flip4M. The explanation is CCH. The fix is DCC.
| CCH Requirement | Flip4M Demand | 8Z-DCC Equivalent |
|---|---|---|
| Persistent world model | Board state persists across rotations | State tracking across gravity shifts |
| Sself monitoring | Gravitational Stability metric | GS = 1 − [Eval(State) − Eval(Rotate)]² |
| Edge-of-Chaos stabilization | Structures surviving volatility (V ≈ 0.54) | Policy Layer filters fragile moves |
| Resource-gated action space | Flip/Magnet tokens (2 per player) | Thrift Factor penalizes waste |
| Active control (CCC) | Drop vs. Rotate vs. Magnet selection | DCC Policy Layer re-ranking |
Same eval function. Same search core. Three levels of governance added.
The current engine: negamax + alpha-beta + transposition table + evaluation function with rotation awareness. About 500 lines of JavaScript. Already plays better than random. Doesn’t adapt, doesn’t govern resources, doesn’t maintain coherence across moves.
Sensor: eval volatility across iterative deepening depths. High volatility → sharp position → deepen search. Low volatility → quiet position → save time.
Sensor: game trajectory compression. Repetitive game → stagnation → force disruption (explore). Chaotic game → conserve (exploit). Semantic inversion vs Level 1.
Every N moves: is the position improving? Yes → maintain current governance. No → escalate: flip aggression level. The governor governing itself.
DCC-controlled: spend flips and magnets only when Level 2 demands it. Score = BaseEval − (ResourceCost / Volatility). Prevents “seizure” behavior — wasting all winning assets on Turn 1. Same principle as the claustrum: gate expensive actions behind evidence of necessity.
The prediction made concrete and testable.
Same eval function, same time budget. Multiple time controls (500ms, 1s, 2s, 5s, 1min). Hundreds of games per control. Alternating colors. Only variable: governance.
rDCC wins 55–65% overall. Advantage increases with time control. Advantage increases in rotation-heavy games specifically. rDCC conserves resources better and plays longer, more strategic games.
DCC governance doesn’t help — Cn maintenance adds nothing to Flip4M play.
If rDCC advantage doesn’t correlate with rotation density, the advantage isn’t about coherence maintenance.
Polarity inverted. This happened in trading rDCC v0.1–v0.5, fixed by semantic inversion in v0.6.
Tournament in progress. The rDCC engine and classical engine are implemented in JavaScript with identical evaluation functions. The tournament infrastructure — time controls, alternating colors, game logging — is built. Preliminary runs are underway. This section will be updated with full statistical results including win rates, confidence intervals, rotation-density correlation, and resource efficiency metrics. The experiment is designed to be fully reproducible: the game is live, the engine is open, the protocol is fixed.
Flip4M is proposed as Experiment 6 in the DCC-7 Consciousness Testbed specification. The hypothesis: a DCC-7 governed LLM should outperform a baseline LLM at Flip4M. The game is already built and live. Cost: zero beyond API calls. This is CCH Prediction P5 made playable.
If governance doesn’t help an LLM play Flip4M better, CCH’s prediction about coherence maintenance is wrong. If it does, the claustrum analogy is predictive. Either way, the game provides the answer.
Flip4M extends the DCC domain progression. Same MDL compression sensor, same coupling parameter, same escalation ladder. Only the semantic calibration differs.
| # | Domain | DCC Role | Status |
|---|---|---|---|
| 1 | Image | Governance of compression blocks | Verified |
| 2 | Audio | Cross-domain block governance | Verified |
| 3 | FASTA | Biological sequence governance | Verified |
| 4 | TSP | Autonomous route discovery — exact optimal on qa194 | Verified |
| 5 | DNA | Structure detection in genomic data | Verified |
| 6 | Trading | Contrarian signal + regime detection | Verified |
| 7 | Recursive Search | Self-governing configuration search (6.8×) | Verified |
| 8 | Authentication | Adaptive difficulty in 8Z-Auth vault | Verified |
| 9 | Consciousness | DCC-7 testbed — self-referential governance | Proposed |
| 10 | Flip4M | Board-game consciousness marker — Cn maintenance under perturbation | Testing |
One kernel. Ten domains. Same compression sensor at every level. The Flip4M application is the most accessible test: anyone can play it, anyone can challenge an AI to play it, and the difference between human and machine performance is the Cn gap made visible.
Play Flip4M against the CPU. Then ask any AI to play it. Notice the difference. That difference is what this page explains.
Play Flip4M ↗No download. No account. Desktop and mobile. Built-in AI opponent.
Every claim on this page is typed. Verified, reasoned, speculative. No claim appears without its evidence label.