AIm³ MentalArena
mRHP · RHP that improves RHP

AIm³ MentalArena

A human-led multi-LLM council OS plus a protocol fitness harness for improving the external reasoning workflow around today’s LLMs.

Council EngineProtocol Fitness HarnessAIM³ / RHPmhuman-lednot model-weight training

What it is

MentalArena is not a new LLM and not an AGI claim. It is an external thinking/workflow layer. It coordinates multiple LLMs through structured rounds, preserves memory and lineage, extracts conflicts, asks targeted follow-ups, gathers judge opinions, and produces a final synthesis or builder prompt.

Short version: MentalArena automates BD’s manual multi-LLM RHP/RHPr/RHPm workflow, then adds logging, rounds, conflict maps, judge prompts, and a protocol fitness harness.

1. Council Engine

The heart of the system. It runs the actual thinking loop:

BD seed
→ prompt expansion
→ LLM answers
→ conflict extraction
→ targeted critique
→ judge panel
→ final synthesis
→ next build/test

2. Protocol Fitness Harness

The measuring / CI / anti-bullshit layer. It tests protocol genes, baselines, ablations, holdout tasks, scoring regressions, and report quality. It does not replace LLM judgement.

3. Local Lab

A private working UI with API keys, local logs, provider calls, council runs, and exports. It is separate from this public page.

4. Public Page

This page explains the architecture only. It should never contain real API keys, private logs, or local working transcripts.

Workflow

1. BD writes a seed/problem.
2. MentalArena expands it using Plain / RHP / RHPr / RHPm / Council profiles.
3. Selected LLMs answer.
4. Python stores answers and builds a conflict map.
5. Round 2 asks targeted critique questions.
6. GPT/Claude/Gemini or other judges score and critique.
7. Final synthesis produces a document, builder prompt, code spec, or test plan.
8. Protocol Harness checks regressions and weak spots.
What Python does and does not judge

Python can measure structure: next actions, falsifiers, missing fields, repetition, provenance, logs, hashes, and basic anti-handwave checks. Python should not pretend to know deep truth alone. Semantic judgement comes from LLM judges plus BD’s final override.

Status

v0.6 splits the project into a clean public architecture page, a private Local Lab, and the existing Protocol Harness. The next proof step is real council runs using DeepSeek first, then manual or API-imported GPT/Claude/Gemini judge rounds.

Local Lab

The working lab is not meant to be public-hosted with keys. In the v0.6 package, open:

local_lab/RUN_LOCAL_MENTALARENA_LAB.bat
http://127.0.0.1:8787/AIM3_MentalArena_Lab.html

For future Netlify deployment, provider calls should go through Netlify Functions and secrets, not browser-exposed keys.

AIm³ MentalArena / mRHP · public architecture page · v0.6 Council Engine split

v0.7 direction — Project Context Packs

New design rule: API models should not reason from a bare prompt when BD project materials matter. The private Lab can attach compact JSON context packs from local_lab/context/ so each LLM receives the same distilled project brain.

Context packs are small machine-readable capsules, not huge file dumps. They preserve source manifests, CTX ids, core principles, known failures, and current baselines. The current proof target remains narrow: conflict extraction must save BD time and preserve useful disagreement before larger automation is justified.