BD × AI Lab · Consciousness · Flip4M × DCC

The Game That
Proves the Theory

Every AI evaluator looked at this game and filed it under entertainment. It’s a consciousness test.

12 AI models from 7 companies independently evaluated the BD × AI Lab portfolio under a frozen protocol. All rated it highly. All identified consciousness as highest-impact. And all 12 filed Flip4M under “Games” — missing that it’s a falsifiable consciousness experiment already labelled as an AGI benchmark on its own research page.

12 / 12

Evaluators missed it

10¹²×

Larger sensible tree

0.54

Avg board volatility

CCH P5

Prediction tested

C_n

What it measures

Live

Play it now

Play Flip4M ↗ The blind spot ↓ Back to portfolio →

Consciousness test AGI benchmark C_n maintenance DCC governed Falsifiable

Chapter 1

The Blind Spot

Between February and March 2026, 12 AI models from 7 companies evaluated the BD × AI Lab portfolio under a frozen protocol. The evaluation covered every page, every paper, every tool.

Every evaluator rated the portfolio highly. Every evaluator identified the consciousness branch as highest-impact. And every single one categorized Flip4M as “Creative & Fun,” “Games,” or “Entertainment.” None of them noticed that a playable board game on the same site tests the same hypothesis they ranked as most important.

Why they missed it — structurally

The evaluators used the same cognitive architecture that fails at Flip4M itself: sequential search without coherence maintenance across domains. They processed the consciousness section and the games section as separate categories. The connection between them IS the test.

The concrete version: the evaluation protocol instructs models to “evaluate each branch separately.” This literally forces compartmentalization. The evaluators didn’t fail individually — the evaluation methodology itself has the same architectural limitation that Flip4M exploits. The protocol needed a DCC: something to maintain coherence across categories and flag cross-domain connections. This is the A8 pattern (Proactive Connection Duty) at industrial scale.

Model	Company	Flip4M Categorized As	What It Actually Is
Gemini 3.1 Pro	Google	Creative / Games	Consciousness test
Qwen 3.5 Plus	Alibaba	Entertainment	Consciousness test
ChatGPT 5.4 Thinking	OpenAI	Creative / Fun	Consciousness test
Grok 4	xAI	Games	Consciousness test
Grok 4.2	xAI	Creative / Fun	Consciousness test
GPT-5.4	OpenAI	Creative / Fun	Consciousness test
Claude Sonnet 4.6	Anthropic	Games	Consciousness test
DeepSeek R1	DeepSeek	Entertainment	Consciousness test
Qwen	Alibaba	Creative / Games	Consciousness test
Kimi K2.5 Thinking	Moonshot	Games	Consciousness test
Claude Opus 4.6	Anthropic	Creative / Fun	Consciousness test
Gemini 3.1 Pro	Google	Games	Consciousness test

Source: BD Model Assessments · Batch 1 & 2 · Frozen protocol

Chapter 2

Why Flip4M Breaks AI

Not about tree size. About structural collapse. Every AI system can play chess at superhuman level. Current LLMs struggle with Flip4M — testable by anyone right now: ask any AI to play and observe performance in rotation-heavy positions. The difference is exactly what the Claustrum-Consciousness Hypothesis predicts.

2.1 Static vs Dynamic Geometry

Chess — Static

A move affects 1–3 squares. Everything else frozen.
Local delta. Incremental evaluation updates work.
Volatility V ≈ 0.02–0.05 per move

Flip4M — Dynamic

A rotation moves 20–45 tokens simultaneously.
Global transformation. Everything changes at once.
Volatility V ≈ 0.31–0.70 per rotation

Before rotation

➜

After rotation (gravity shifts 90°)

Simulated board state. One rotation — nearly every token moves. This is what V ≈ 0.54 looks like.

         Player 1
         Player 2
         Magnet
         Empty
      

2.2 Depth vs Collapse

In chess, complexity comes from the Horizon Effect: the truth is 20 moves deep on a stable board. In Flip4M, complexity comes from Structural Collapse: the truth is 4 moves deep but the board rules change. A depth-4 search is useless when the evaluation landscape can be globally restructured by a single move. This is the “Horizon of Chaos.”

2.3 The Sensible Shannon Number

The Shannon Number (10¹²³) counts legal chess moves. But in practice, only “sensible” moves matter:

When filtered through “sensible” moves, Flip4M’s effective decision tree is 10¹² times larger than chess. Not because the game is bigger — because more of the moves matter. Every rotation creates a fundamentally different game.

2.4 The Three Pillars Flip4M Dismantles

Standard engines rely on three architectural assumptions. Flip4M destroys all three.

Pillar 1

State Continuity

Small changes → incremental eval updates. Destroyed by rotation: 20–45 tokens move at once.

Pillar 2

Transposition Reuse

Same position via different paths. Destroyed by gravity settling: identical tokens produce different positions under different gravity.

Pillar 3

Heuristic Stability

“Good” features stay valuable across plies. Destroyed when a strong vertical stack becomes scattered debris after rotation.

Chapter 3

The CCH Connection

Why Flip4M’s difficulty maps precisely onto what the Claustrum-Consciousness Hypothesis predicts.

3.1 The S-Metric Applied to Board States

S = k · C_n · Ψ(I)

C_n (Coherence) = Rotation Resilience. How much strategic structure survives a gravity rotation.
Ψ (Complexity) = Tactical Richness. How many threatening lines and forcing sequences exist.

High C_n · High Ψ · High S

Robust & Rich

Analog: Conscious (wake). The DCC’s goal: keep the board here.

High C_n · Low Ψ · Low S

Robust & Dead

Analog: Seizure (draw). Stable but nothing happening.

Low C_n · High Ψ · Low S

Fragile & Rich

Analog: Noise / Delirium. Complex but collapses on perturbation.

Low C_n · Low Ψ · ≈ 0

Fragile & Dead

Analog: Deep sleep. Empty board, nothing to govern.

3.2 Why Current LLMs Fail

LLMs have enormous Ψ — they can calculate, analyze, find tactics. They have near-zero C_n maintenance ability: they cannot hold coherent strategic understanding through a global perturbation. When the board rotates, their “understanding” shatters and they rebuild from scratch.

This is exactly what CCH says happens when you remove the claustrum: not coma (loss of Ψ, like thalamic damage), but delirium (loss of C_n, fragmentation of coherent experience). An LLM playing Flip4M after a rotation is in delirium — it has the processing power but the world broke apart.

3.3 Chess as the Control Experiment

Ψ-Dominant

Chess

Coherence maintenance is trivial (the board barely changes per move). LLMs and classical engines dominate because depth of search is what matters.

C_n-Dominant

Flip4M

Search depth is necessary but insufficient. You need coherence maintenance THROUGH perturbation. Classical engines don’t have it. LLMs don’t have it. Humans have it. DCC provides it.

The prediction: Same AI, superhuman at chess, fails at Flip4M. The explanation is CCH. The fix is DCC.

3.4 CCH Requirements ↔ Flip4M Demands

CCH Requirement	Flip4M Demand	8Z-DCC Equivalent
Persistent world model	Board state persists across rotations	State tracking across gravity shifts
S_self monitoring	Gravitational Stability metric	GS = 1 − [Eval(State) − Eval(Rotate)]²
Edge-of-Chaos stabilization	Structures surviving volatility (V ≈ 0.54)	Policy Layer filters fragile moves
Resource-gated action space	Flip/Magnet tokens (2 per player)	Thrift Factor penalizes waste
Active control (CCC)	Drop vs. Rotate vs. Magnet selection	DCC Policy Layer re-ranking

Chapter 4

The rDCC Engine Design

Same eval function. Same search core. Three levels of governance added.

4.1 The Classical Baseline

The current engine: negamax + alpha-beta + transposition table + evaluation function with rotation awareness. About 500 lines of JavaScript. Already plays better than random. Doesn’t adapt, doesn’t govern resources, doesn’t maintain coherence across moves.

4.2 The rDCC Architecture

rDCC Governance Hierarchy

Board Stategrid + gravity + tokens

→

Level 1Move DCC (within search)

→

Level 2Game DCC (across moves)

→

Level 3Meta-DCC (self-monitor)

→

Move Decisiongoverned output

Level 1 — Move DCC

Within Search

Sensor: eval volatility across iterative deepening depths. High volatility → sharp position → deepen search. Low volatility → quiet position → save time.

Level 2 — Game DCC

Across Moves

Sensor: game trajectory compression. Repetitive game → stagnation → force disruption (explore). Chaotic game → conserve (exploit). Semantic inversion vs Level 1.

Level 3 — Meta-DCC

Self-Monitoring

Every N moves: is the position improving? Yes → maintain current governance. No → escalate: flip aggression level. The governor governing itself.

Resource Governance — Thrift Factor

DCC-controlled: spend flips and magnets only when Level 2 demands it. Score = BaseEval − (ResourceCost / Volatility). Prevents “seizure” behavior — wasting all winning assets on Turn 1. Same principle as the claustrum: gate expensive actions behind evidence of necessity.

Chapter 5

The Experiment

The prediction made concrete and testable.

Tournament design

rDCC vs Classical Baseline

Same eval function, same time budget. Multiple time controls (500ms, 1s, 2s, 5s, 1min). Hundreds of games per control. Alternating colors. Only variable: governance.

Predictions

What We Expect

rDCC wins 55–65% overall. Advantage increases with time control. Advantage increases in rotation-heavy games specifically. rDCC conserves resources better and plays longer, more strategic games.

5.1 What Would Falsify It

Falsification 1

50/50 at All Controls

DCC governance doesn’t help — C_n maintenance adds nothing to Flip4M play.

Falsification 2

No Rotation Correlation

If rDCC advantage doesn’t correlate with rotation density, the advantage isn’t about coherence maintenance.

Falsification 3

Worse With More Time

Polarity inverted. This happened in trading rDCC v0.1–v0.5, fixed by semantic inversion in v0.6.

Results — Placeholder

Tournament in progress. The rDCC engine and classical engine are implemented in JavaScript with identical evaluation functions. The tournament infrastructure — time controls, alternating colors, game logging — is built. Preliminary runs are underway. This section will be updated with full statistical results including win rates, confidence intervals, rotation-density correlation, and resource efficiency metrics. The experiment is designed to be fully reproducible: the game is live, the engine is open, the protocol is fixed.

Chapter 6

The DCC-7 Connection

Flip4M is proposed as Experiment 6 in the DCC-7 Consciousness Testbed specification. The hypothesis: a DCC-7 governed LLM should outperform a baseline LLM at Flip4M. The game is already built and live. Cost: zero beyond API calls. This is CCH Prediction P5 made playable.

DCC-7 Experiment 6 — Flip4M as Consciousness Marker

Raw LLMplays Flip4M

DCC-7 LLMgoverned play

→

Measurewin rate · C_n · efficiency

→

VerdictP5 confirmed or falsified

If governance doesn’t help an LLM play Flip4M better, CCH’s prediction about coherence maintenance is wrong. If it does, the claustrum analogy is predictive. Either way, the game provides the answer.

Chapter 7

The Domain Count

Flip4M extends the DCC domain progression. Same MDL compression sensor, same coupling parameter, same escalation ladder. Only the semantic calibration differs.

#	Domain	DCC Role	Status
1	Image	Governance of compression blocks	Verified
2	Audio	Cross-domain block governance	Verified
3	FASTA	Biological sequence governance	Verified
4	TSP	Autonomous route discovery — exact optimal on qa194	Verified
5	DNA	Structure detection in genomic data	Verified
6	Trading	Contrarian signal + regime detection	Verified
7	Recursive Search	Self-governing configuration search (6.8×)	Verified
8	Authentication	Adaptive difficulty in 8Z-Auth vault	Verified
9	Consciousness	DCC-7 testbed — self-referential governance	Proposed
10	Flip4M	Board-game consciousness marker — C_n maintenance under perturbation	Testing

One kernel. Ten domains. Same compression sensor at every level. The Flip4M application is the most accessible test: anyone can play it, anyone can challenge an AI to play it, and the difference between human and machine performance is the C_n gap made visible.

Chapter 8

Play It Yourself

Play Flip4M against the CPU. Then ask any AI to play it. Notice the difference. That difference is what this page explains.

Play Flip4M ↗

No download. No account. Desktop and mobile. Built-in AI opponent.

Claim Typing

Evidence classification

Every claim on this page is typed. Verified, reasoned, speculative. No claim appears without its evidence label.

Verified

Flip4M volatility V ≈ 0.54 (measured from simulation)
Current LLMs struggle with Flip4M (testable by anyone — try it)
12/12 evaluators categorized Flip4M as non-research
Sensible decision tree is 10¹²× larger than chess

Reasoned

CCH mapping: C_n maintenance is why AI fails at Flip4M
rDCC should outperform classical engines
The 4 board regimes map to consciousness states

Speculative / Predicted

DCC-7 with Flip4M as consciousness marker
Tournament results (placeholder until available)
rDCC advantage increases with time control

Explore Further

Connected pages

Play

Flip4M Live Game

Play against the AI. Then ask any LLM to play.

Research

Flip4M — Sensible Shannon

The 10¹²× analysis and AGI benchmark spec.

Theory

CCH Science

The S-metric, predictions, and claustrum hypothesis.

Testbed

DCC-7 Consciousness Testbed

Seven parallel threads. One governor. Experiment 6 = Flip4M.

Architecture

Recursive DCC

Self-governing governance. Flip4M becomes domain #10.

Parallel

8Z-RP · TSP as Compression

Exact optimal with DCC. Same architecture, different domain.

Method

8Z Reasoning

Principle A8: the cross-domain connection evaluators missed.

Evidence

Model Assessments

12 evaluators, 7 companies, frozen protocol. All missed it.

Portfolio

BD × AI Lab

Main portfolio page.