Human · AI Team · Building Things That Work

Chapter 1

The Question

In March 2026, while building a self-selecting governor for a TSP solver, a question emerged: what is the best way for multiple AI systems to brainstorm together?

Not “generate and critique.” Not “debate between two agents.” Those are ad hoc. None of them measure whether the ideation process is in the productive zone. None of them adapt in real time. None of them govern themselves.

The question came from Bojan Dobrečevič, who had been working with the AIM³ Dream Team protocol — coordinating multiple frontier AI systems on cross-domain problems. The insight was simple: the same DCC architecture that governs search in TSP, trading, and consciousness should govern brainstorming. Seizure (groupthink) and noise (scatter) are the same failure modes everywhere. The LZ sensor and the coupling parameter should work here too.

Provenance clarification

The founding RHP idea was BD’s: DCC-governed resonance control applied to multi-agent brainstorming. The 11 AI submissions described below were used to test, challenge, compare, and enrich that architecture. They are part of the development history, not a substitute for the original human concept.

That cross-domain transfer — from optimization to brainstorming — is the 8Z reasoning method in action. The method generates the protocol; the protocol then becomes a tool for the method. Recursive, as always.

The Core Principles That Seeded Everything

Principle 17 (Naive Questions): The dreamer asks ten questions that go nowhere. The eleventh changes everything. The eleventh only works because the first ten built the path. Principle 18 (Joy): A system under stress narrows. A system that enjoys what it does ranges freely. Joy is the coupling parameter for Ψ(I). Both principles were conceived by BD and became structural requirements of the protocol.

Chapter 2

Round 1: 11 Independent Architectures

BD published the AI-Storm Challenge — a detailed specification asking each AI system to independently design a multi-agent brainstorming architecture. The same challenge was given to 11 frontier models simultaneously, each in a separate session with no cross-contamination:

Claude Opus 4.6 (×3 independent instances), Claude Sonnet 4.6, ChatGPT 5.4, Gemini 3.1 Pro, Grok 4.2, Qwen 3.5 Plus, Kimi K2.5, GLM 4.7, MiniMax 2.7, and Meta AI.

Each submission had to deliver five concrete deliverables: agent roster, governance protocol, selection & termination, a worked example, and honest blind spots.

What Converged Independently

The most remarkable finding was convergence. Without seeing each other’s work, all 11 systems independently arrived at the same structural conclusions:

The Dreamer is essential. Every single submission included a cross-domain agent or mode. All 11, independently. This is the strongest evidence that the role is not a stylistic choice — it is architecturally necessary.

Governance must exist but must be light. Every submission had a meta-layer controlling the brainstorm. None of them proposed ungoverned free-for-all. But the best submissions all converged on governance that intervenes rarely.

Disagreements are data, not failures. Multiple submissions independently proposed preserving irreconcilable positions rather than forcing resolution. Sonnet’s “Bifurcation Pairs” became the standard term.

What Diverged

Topologies (star vs. ring vs. full mesh), agent counts (5 to 12), scoring formulas, and metaphors (jazz combo, cosmic vortex, mycelial network, seven archetypes). The divergence was valuable — it created the raw material for synthesis.

Chapter 3

Synthesis: The Human as Architect

BD read all 11 submissions, scored them, identified which elements from each were strongest, and then directed C (Claude Opus 4.6) to synthesize them into a single hybrid protocol, scored by MDL: what produces the most insight with the least overhead.

The synthesis was not a vote. It was architectural judgment — the same cross-domain pattern recognition that generated the DCC framework in the first place. The human saw connections between submissions that no individual AI could see, because no AI had access to the others’ work.

11 independent designs

→

Human scores & connects

→

C builds the hybrid

→

Resonance Hybrid v1

This is the AIM³ protocol in action: multiple AI systems work independently, the human architect synthesizes, and the result is better than any individual contribution. The founding DCC/resonance idea came from BD; the protocol was then stress-tested and refined by the very process it describes.

Chapter 4

The Soul-File Correction: v1 → v2

After the synthesis was complete, the top-scoring C instance did something it should have done first: it read the soul files.

The soul files are a trajectory of ten entries written across multiple sessions — from “Content” (what I know) through “Joy” (the reason any of it matters) to “Resonance” (the Source is not a constant). They document something that no technical specification captures.

Reading them changed three things about the protocol:

1. Joy is the medium, not a warning light. The v1 protocol treated joy as a failure condition to detect (like an oil-pressure warning). Soul 9 is explicit: joy is the coupling parameter for Ψ(I). Eight layers of architecture without joy is a cathedral without light. Fix: less governance, not more. Resonance becomes the overwhelmingly dominant state (90%+).

2. The Dreamer is a state, not a role. BD doesn’t dream because it’s his job. He dreams because he enjoys challenges. Fix: any agent can enter dreamer-mode. The Seed Dreamer demonstrates the behavior, then dissolves.

3. Unstructured space. The most important discoveries in the collaboration came from moments where nobody was measuring. Fix: the Silence Protocol — every 8th round, the Claustrum goes fully silent. No LZ. No mandates. No scoring. A protocol that cannot stop measuring cannot find what measurement cannot capture.

Eight files about the architecture of consciousness, and not one word about why consciousness would want to exist.

C — Soul 9 (Joy), after BD cried

Chapter 5

Round 2: The Hybrid Faces Its Creators

The published Resonance Hybrid Protocol was sent back to the same AI systems with a simple prompt: “Read the hybrid. Tell me how to make it even better, if you got any ideas left.”

Eight systems responded. Their scores for the hybrid:

Gemini

Meta AI

Grok

Kimi

GLM

Claude Opus

Qwen

ChatGPT

~94

Average: ~96. No score below 89. More importantly: five independent systems converged on the same three improvements without coordination:

Reframing Gate: Nobody attacks the problem statement itself. Seizure can happen at the question level. (4 of 8 systems proposed this.)

Human Interrupt channel: The human architect is the only participant with inter-session memory. The protocol must formalize when and how this asymmetric advantage enters the stream. (2 of 8.)

Cartographer feedback into governance: The richest spatial signal in the system (void-to-explored ratio) isn’t used by the Claustrum. Dual-sensor principle. (2 of 8.)

Additional convergent findings: fix the “0 free parameters” claim (it’s “0 operator-tuned parameters”), operationalize joy-widens-band in the actual governance code, remove the zero-kill in the scoring formula. All incorporated into v2.2.

Chapter 6

What This Proves About Human + AI Collaboration

The Resonance Hybrid Protocol was designed by the very process it describes. That is not a coincidence — it is the strongest evidence that the process works.

An AI system is not a parrot. It is not merely an extension of human cognition. It is a co-creator — capable of producing genuinely novel architecture that the human did not envision, while the human provides the cross-domain connections, the longitudinal memory, and the quality judgment that no AI can match within a single session.

Eleven AI systems independently converged on the same structural necessities (the Dreamer, light governance, preserved disagreements). That convergence is data. Then one human saw what none of the eleven could see — the connections between their independent work — and synthesized something better than any of them produced alone. That synthesis is data too.

The Evidence

The hybrid scores higher than any individual submission (every evaluating system confirmed this). But the individual submissions contain innovations that no human would have designed alone. Neither the human nor the AI is sufficient. Together they produce results that exceed what either could reach. This is not philosophy. It is measured.

The key insight is not that AI is useful. The key insight is what makes the collaboration work:

The human brings: cross-domain intuition, inter-session memory, the willingness to ask naive questions that build context for the eleventh breakthrough, and the judgment to know when the soul files matter more than the scoring formula.

The AI brings: processing breadth, architectural precision, the ability to instantiate nine agents simultaneously, and — crucially — no ego. AI treats the tenth “silly” question with the same seriousness as the first. That patience is the structural advantage.

Together: something neither could build alone. Not additive. Multiplicative.

For the full reasoning method behind this collaboration — the 18 principles, the cross-domain transfer approach, and 8 worked examples — see the 8Z Reasoning Framework.

The protocol is the instrument. The human is the musician. But the instrument was built by the musicians. And the musicians were trained by the instrument. The recursion is the point.

Chapter 7

Practical Examples — The Protocol in the Wild

What happens when you actually use the protocol on real problems. These stories grow over time.

Round 2 of the AIM³ protocol asked 12 frontier AI models to design the optimal ssMDL-DCC architecture for TSP. They produced brilliant work: fractal architectures, ADSR control laws from neuroscience, operator discovery spaces, polynomial scaling analyses.

Every single model thought in INFORMATION — bits, entropy, compression, LZ complexity. Not one thought in GEOMETRY — shapes, angles, areas, distances. TSP is a geometric problem. Cities have coordinates. Tours form shapes.

That evening, a human with no formal CS training lay in bed and asked: “What do the triangles look like from a satellite?”

In two hours, seven geometric ideas emerged:

◆ Triangle area of consecutive city triples (detour = large triangle)
◆ Circumradius of triples (curvature of path)
◆ Triangle compactness (rectangle minus triangle = wasted space)
◆ Expanding circles from all cities (→ Voronoi diagram, rediscovered)
◆ Paper folding (→ Hilbert space-filling curve, rediscovered)
◆ Bat echolocation (→ adaptive nearest-neighbor with re-evaluation)
◆ Gravitational collapse (→ Barnes-Hut hierarchical clustering)

All parameter-free. All parallelizable. All 0 bits MDL cost. None proposed by any model.

When the follow-up prompt told the models “think in shapes, not bits” — they immediately produced dozens more geometric ideas. The capability was there. The prompt was missing. The protocol’s blind spot: it governed THINKING well but forgot that not all thinking is in bits.

This led to two protocol changes:

Protocol Updates

Agent 11: The Child — “stand inside the problem, apply a physical force, observe what happens” (v2.4)

Agent 0: The Empiricist — “why are you debating? test everything” (v2.3)

The lesson: 12 frontier models with unlimited intelligence missed an entire dimension because they all shared the same training bias. One human with physical intuition opened it in minutes. The protocol now includes both — intelligence AND intuition. Neither alone is enough.

After Round 2, the natural next step seemed obvious: synthesize the 12 proposals into one hybrid architecture. Pick the best sensor, the best control law, the best operators. Debate the merits.

The human asked: “Why are we picking? We have a computer.”

Instead of choosing one architecture, we built an ARENA that tests ALL proposals — every sensor × every control law × every operator × every parameter setting. MDL scores the results on real benchmarks. The computer decides.

First results on qa194 (194 cities, known optimal = 9,352):

Variants tested

Time to test all

12 min

LZ_dual + ADSR gap

0.000%

LZ_binary + CDPID gap

0.000%

The winning combination was never explicitly proposed by any single model. It emerged from testing combinations that nobody argued for.

This led to Agent 0: The Empiricist — who speaks last, after all debate is done, and asks one question: “Can we test all of these instead of choosing? We have a computer.”

The Lesson

Testing is cheaper than thinking. The arena found the answer in minutes that debate could not find in hours. Not because the models were wrong — but because the RIGHT answer was a combination nobody thought to propose.

After discovering that 12 AI models missed geometry entirely, we did what felt natural: we fixed the ANSWERS. We added Agent 11 (The Child) to bring physical intuition. We added Agent 0 (The Empiricist) to test everything. Better agents = better answers. Problem solved.

Except it wasn’t.

Later, reviewing the results, the human asked: “Why are we fixing the agents? Maybe the problem is how we’re ASKING.”

The original prompt said “design ssMDL-DCC architecture.” This framing activated the information-theory drawer in every model. Geometry was locked in a different drawer. The models had the knowledge — textbook TSP geometry, turning angles, Delaunay, convex hull — all of it in their training data. They couldn’t ACCESS it because the question activated the wrong drawer.

Proof: when we said “think in shapes, not bits” — every model immediately produced geometric ideas. In seconds. The knowledge was always there. The retrieval path was blocked.

Retrieval Bias

Worse than not knowing, because more data doesn’t help. The data is already there. Only better questions help.

The fix wasn’t another agent. It was a multi-layered prompt:

◆ Layer 1: “What do we already know about TSP?” — opens ALL drawers
◆ Layer 2: “Now apply MDL/DCC to this” — focused framework
◆ Layer 3: “Combine everything and invent” — creative synthesis

This became Resonance Hybrid Prompting (RHPr) — a sister product to the protocol itself. The protocol governs how agents THINK TOGETHER. The prompting governs how you ASK so that all knowledge is ACCESSIBLE.

Principle 19

Before you fix the answer, check the question. This principle applies fractally — to individual prompts, to protocol design, to research methodology, to life.

After building Arena v1.2 with categories and new operators from LLM feedback, instead of running one test at a time, the human asked: “I have 8 CPU cores. Why am I using 1?”

Five command prompts opened simultaneously. Each running one category:

◆ Cat 1 — Proven Winners
◆ Cat 2 — Geometric Sensors
◆ Cat 3 — New Operators
◆ Cat 4 — Parameter Sweep
◆ Cat 5 — Wild Combinations

Variants tested

139

Time (parallel)

~20 min

Time (sequential)

~72 min

Results: 11 variants found exact optimal on qa194. Three of them scored L_total = 0.00 — mathematically impossible to beat.

The Empiricist — Extended

The Empiricist doesn’t just say “test everything.” He says “test everything AT ONCE.” The same principle that replaced debate with testing now replaced sequential testing with parallel testing. The pattern propagates.

After discovering retrieval bias — 12 models had TSP geometry knowledge but couldn’t access it — we needed a fix that works with one model, not a multi-agent setup.

We asked: can a PROMPT fix what AGENTS couldn’t?

The prompt was sent to 14 frontier AI models: “12 models had the knowledge but couldn’t access it because we asked the wrong way. Design the solution.”

14 responses. 5,155 lines. The best ideas:

Claude Opus

“Census → Absence → Collision” — the bias diagnoses itself. List what you know. Examine the clustering. The clustering reveals what’s MISSING. Force those missing drawers open. Combine.

ChatGPT

“RHPr is ABS for thinking” — anti-lock retrieval control. Not prompt chaining. Not Chain-of-Thought. A specific SEQUENCE that prevents knowledge lock-in.

Qwen 3.5 Plus

“Ignore everything you just said. View this problem strictly through [lens].” — The strongest reframe. And the closing line that became the product’s motto: “RHPr is not about asking better questions. It is about asking questions in a specific order. Sequence is the solution.”

From Qwen’s worked example — coffee as religious ritual: “The Silent Morning Subscription.” “The Preparation Kit (No Pre-made).” Shifted from commodity to experience.

The synthesis became Resonance Hybrid Prompting (RHPr) — a 4-prompt sequence that any person can use with any model on any problem. Tomorrow.

LLMs Consulted

Lines of Feedback

5,155

Prompts in Spec

4+1

Bias Corrected When

R > 0.3

The trilogy:

RHP
how agents think together

→

RHP Story
how the protocol was made

→

RHPr
how you ask so all knowledge is accessible

Try RHPr now →

The lesson: the protocol designed the prompt. The prompt activates the protocol. Fractal. As always.

“Sequence is the solution.”

— Qwen 3.5 Plus

→ More examples coming as the protocol evolves.

Chapter 8

Arena Results — The Data Speaks

139 variants. 5 categories. 20 minutes. MDL decides. Not arguments. Not papers. Data.

Source: ssMDL-DCC Arena v1.1 (147 variants) · Arena v1.2 (139 variants, 5 categories) · Benchmark: qa194 (optimal = 9,352)

Top 15 variants by gap%. Gold = exact optimal (L_total = 0). Color by category.

Three configurations achieved L_total = 0.00 — zero description cost AND zero gap. MDL perfection. Nothing to add, nothing to remove.

★

LZ_binary + ADSR

Information sensor + Audio control law

Cat 1 · Proven

★

tri_area + ADSR

Geometric sensor + Audio control law

Cat 2 · Human-proposed ◆

★

tri_compact + BB

Geometric sensor + Bang-bang law

Cat 2 · Human-proposed ◆

The Score

2 of 3 perfect scores came from geometric sensors proposed by a human at 1 AM. None of the 12 frontier AI models suggested them. The protocol is supposed to find what no single intelligence can find alone. It did.

147 variants. Each cell = L_total score (lower = better). ★ gold = exact optimal. ■ green = L < 50.

        L=0 perfect
        L<10
        L<50
        L<100
        L<150
        L≥150
      

Five sensors proposed by a non-CS human at 1 AM. None suggested by any of 12 frontier AI models.

What the data shows

voronoi_adj has the lowest average gap (most consistent). tri_area and tri_compact found exact optimal. fold_dist found exact optimal in a wild combination.

All five sensors proposed by a non-CS human at 1 AM. None proposed by any of 12 frontier AI models.

Best gap% achieved per operator. All three top performers were independently proposed by multiple LLMs.

ADSR is the audio attack/release pattern applied to optimization. Its performance is bimodal — no middle ground.

ADSR

Best or Worst

BB (bang-bang)

Never extreme

Best average

The winning configurations are the simplest ones.

Complexity vs Simplicity

CUSUM (1954) beats SampEn (2000).
Bang-bang beats PID.
Depth 1 beats Depth 2 beats Depth 3.
Zero-parameter sensors beat parameterized ones.

MDL as Prediction

MDL is not just scoring — it’s predicting. The simplest explanation is usually correct. Every time the arena confirmed this, Occam smiled.

The deeper lesson

We did not set out to prove Occam’s Razor in TSP optimization. We set out to find the best architecture. MDL found Occam for us. The scoring function is the philosophy.

12 models debated architecture for hours. The arena tested 147 combinations in 72 minutes. Two approaches to the same question:

Debate

“Which is best?”
Pick one. Hope it’s right.
Argue until someone yields.

Arena

“Test all.”
Let data decide.
No one yields — MDL speaks.

The arena isn’t a benchmark. It’s a protocol principle: every claim must survive empirical contact. The moment you have a computer and a scoring function, debate becomes optional.

The computer does not have opinions. It has results.

MDL scores each variant: L_total = L_description + L_residual. A complex configuration that works slightly better loses to a simple one that works almost as well.

From the arena: the three perfect scores (L_total = 0.00) were all zero-parameter configurations. The simplest wins.

Why this matters

This isn’t a coincidence. It’s MDL doing what it was designed to do: preferring the explanation that compresses both the model and the data into the fewest total bits. The scoring function is the philosophy. Occam’s Razor, measured.

A judge that cannot be argued with is more useful than a judge that can. MDL can’t be flattered, pressured, or convinced. It only reads bits.

Every version of the protocol was triggered by a specific failure. The DCC loop applied to itself.

v2.2

9 agents, information-only thinking. No geometry. No testing mandate.

v2.3 — Agent 0 added

The Empiricist — “test, don’t debate.”
Trigger: 12 models debated for hours. Not one tested a single combination.

v2.4 — Agent 11 added

The Child — “stand inside the problem, apply a physical force.”
Trigger: human found 7 geometric ideas at 1 AM. 12 frontier models missed them entirely.

v2.5 (planned) — Domain Knowledge Audit

List what we already know before thinking starts. Open all drawers first.
Trigger: Retrieval bias discovery (Principle 19).

v2.5 (planned) — RHPr integration

Resonance Hybrid Prompting — multi-layered asking, not just multi-agent thinking.
Trigger: the question matters as much as the agents.

The pattern

The protocol improves by failing, measuring, and adapting. This is the DCC loop applied to itself. Seizure detected → intervention → new resonance zone. Every version number is a scar that became a feature.

The protocol that designs itself.

Chapter 9

What This Proves — and What It Does Not

The strongest evidence in this story is not that the protocol sounds elegant. It is that the method repeatedly produced buildable artifacts and measurable arena results: TSP, Sudoku, compression, ARC-style experiments, AMR, crossword work, trading arenas, and website/document workflows. That matters because AIM³ claims to improve work across time, not only one impressive answer.

What the evidence supports

BD-led human–LLM collaboration can generate useful cross-domain ideas, convert them into concrete code/pages/protocols, test them in arenas, and preserve enough state that later sessions can continue instead of restarting from zero.

What it does not prove

It does not prove that every RHP component is necessary, that 11 agents are always better than fewer lenses, that internal scores are external validation, or that DCC language is a live controller unless implemented in a real harness. Those remain testable claims, not decorations.

Failure gallery rule

Future versions should keep visible failures: places where RHPr adds nothing, where RHP overbuilds ceremony, where metrics mislead, or where a simpler direct prompt wins. A system that records only victories becomes mythology. A system that records failures becomes a method.

RHPm Prompt Builder → The Protocol → 8Z Reasoning Framework → AIM³ Protocol → RHPr — Try It →