AI8 Components

01 · WHY THIS PAPER EXISTS

The third paper in the AI8 document family

AI8 Architecture defines the deeper continuity architecture. AI8 Companion makes that architecture readable and operational. This third paper has a different job: to ask whether the same underlying kernel can be used not only to coordinate human–AI work, but to improve AI systems themselves.

The claim here is intentionally disciplined. We are not claiming that AI8 has already solved AI design. We are proposing a research program: start with one clean, bounded arena such as Neural Architecture Search, learn to read the terrain properly, improve search inside that terrain, then try to shape better terrains and extend the same logic to other AI components.

Core

AI8 Architecture

The base continuity architecture: governed differentiation, recursive coordination, and process-level structure.

Bridge

AI8 Companion

The readable map: what the layers do, how they relate, and how AI8 avoids collapsing into roleplay, ideology, or noise.

Application

The domain line: can the same kernel help us improve model architecture, data, training flow, routing, memory, and broader AI design choices?

Working stance

This is a living paper. NAS carries the main weight for now. Other components are introduced as targets and future work, not as claims of finished success.

02 · KERNEL

From coordination logic to design logic

The candidate kernel is self-selecting MDL × DCC.

MDL contributes descriptive pressure: choose structures, routes, or candidate families that explain observed success with fewer wasted bits and less arbitrary complication.
DCC contributes adaptive governance: when a search becomes too predictable, too chaotic, too flat, or too trapped, alter the search regime rather than blindly continuing.
Self-selection means the system does not assume one fixed heuristic is always right. It chooses, compares, and re-routes under pressure.

The long goal is not only to navigate one given search space better, but to learn how better spaces themselves might be recognized and eventually generated.

AI8 Components · Program statement

Read existing landscape

→

Improve search inside it

→

Shape better landscapes

→

Extend across AI components

Discipline

Cross-domain ambition only becomes credible if the same kernel survives cheap tests in one arena after another. This paper therefore treats NAS as a first wedge, not as the final destination.

03 · NAS AS THE FIRST WEDGE

Why NAS is the right first arena

Neural Architecture Search is a good first domain because it is bounded enough to test search ideas honestly, yet rich enough to expose family structure, local traps, ridges, corridors, and competing winner regimes. A benchmark NAS world does not reveal the globally ideal architecture; it reveals the best architecture inside a finite, human-designed universe. That is exactly why it is useful.

With a benchmark such as NATS-Bench TSS, the search problem becomes clean: can our method read and navigate the terrain better than simpler baselines? If yes, the next question becomes stronger: can the same method eventually improve the terrain itself?

Question	Benchmark NAS can test	Benchmark NAS cannot yet prove
Search quality	Whether a method finds strong regions, families, and escape routes more effectively than baselines.	Whether the method has discovered the globally best neural architecture beyond the benchmark universe.
Landscape reading	Whether peaks, families, valleys, boundaries, and local climb paths are real and exploitable.	Whether the current search space itself is the right universe to search.
Generative potential	A weak but useful precursor signal: whether the method sees structure rather than only ranks points.	Whether it can already invent better architectures or better search spaces outside the benchmark.

Current reality

Our immediate target is modest: better reading of an existing NAS space. Only after that do we earn the right to ask for generative NAS landscapes and broader AI-system design claims.

Goal / target regime

Small benchmark NAS worlds are wedge tests, not the final judge. The real target of this research program is very large architecture search spaces where simple coverage-heavy search cannot practically map enough terrain, and where the value of a complexity-resilient search/governance kernel should become more visible. In that sense, NAS-101, NAS-201, and similar benchmark worlds are calibration arenas, not the final destination.

04 · FLOOD / DRAIN

Flood-Reveal, Rain-Lift, and Drain-Escape

The current NAS wedge is organized around a dual reading of landscape structure. One lens tells us what kind of terrain exists; the other tells us how movement inside that terrain should be governed.

Flood-Reveal — top-down structure

Imagine the landscape flooded above the highest terrain, then slowly drained. As the water level falls, peaks, ridges, and families emerge. The goal is not merely to ask which single point is best, but to ask:

Which regions emerge first?
Which peaks remain visible for many thresholds?
Which strong architectures appear together and form families?
Which regions are broad and robust, and which are narrow spikes?

Rain-Lift / Boat-in-the-Valley — bottom-up navigation

Now invert the view. We are not looking from above; we are in a valley. Heavy rain raises the water around us and exposes the slopes that actually lead upward. The question becomes practical rather than descriptive:

Which nearby directions are reliably uphill?
Which routes only look good for one step, then collapse?
Which paths preserve optionality instead of trapping the search?
How do we escape mediocre basins without blind randomness?

Structure Lens

Flood-Reveal

Best for persistence, family emergence, region-first thinking, and top-down reading of terrain.

Navigation Lens

Rain-Lift / Drain-Escape

Best for local movement, valley escape, multi-step climb quality, and route selection from ordinary starting points.

Current signal

Early v16 diagnostics suggest that rain/drain navigation carries more immediate teeth than flood alone. Flood still matters as a family/region lens, but local uphill escape may be the stronger practical operator in the near term.

# v16 program logic
Read current NAS terrain
Detect families, boundaries, corridors, valleys
Route through robust uphill escape directions
Compare against simple baselines and random controls
Promote only what survives the cheap tests

05 · COMPONENT MAP

The broader AI stack this paper targets next

NAS is the first major body of content here, but it is not the only intended target. If self-selecting MDL × DCC is real, it should gradually become useful across a wider set of AI components. The order matters: architecture first, then adjacent layers of the AI design stack.

Current wedge

1. Model architecture

Topology, operator patterns, winner families, ridges, corridors, and family-aware search inside bounded NAS universes.

Next extension

2. Search space design

Not just searching the given terrain better, but deciding which operators, edges, and architectural motifs belong in a better terrain.

Data layer

3. Training data

Selection, mixing, filtering, de-noising, and phase-aware data composition under descriptive pressure and adaptive governance.

Training flow

4. Curriculum and schedule

What to teach first, what to delay, when to harden, when to revisit, and how to move between phases without frozen hand-tuning.

Optimization

5. Optimizer and loss policy

Learning-rate regimes, regularization, objective mixing, augmentation, and other training controls that may benefit from self-selecting governance.

Reasoning layer

6. Retrieval and reasoning governance

Anti-lock retrieval, lens census, forced reframing, idea collision, and arena conversion before a system commits to a frame.

System routing

7. Routing, memory, and tools

Which module or agent should act, when to branch, when to call a tool, what to keep in memory, and what to discard.

Inference layer

8. Inference policy

How much compute to spend, when to go shallow or deep, when to request a second view, and when to stop early.

Quality control

9. Evaluation and self-improvement

Detecting fake progress, separating lucky spikes from true winner families, and deciding which improvements deserve promotion.

Compression side

10. Compression, pruning, distillation

Preserving functional structure while cutting waste, in the same spirit that drives 8Z across other domains.

AI8 bridge

11. Multi-agent collaboration

Role assignment, coordination logic, diversity preservation, and agent-level orchestration as part of a broader intelligent system stack.

What this means

This paper is not just about one better NAS heuristic. It is a first attempt to frame AI system design itself as a family of landscapes that may be read, navigated, and eventually improved by one cross-domain kernel.

06 · RESEARCH PROGRAM

The proof ladder

The right progression is strict. We should not claim the whole ladder before climbing the first steps. This sequencing is the paper's main safeguard against overclaiming.

Search better inside an existing NAS world. Beat simple baselines, random controls, and point-only thinking by using families, persistence, and valley escape logic.
Read the existing world well enough to describe its structure. Peaks, families, boundaries, corridors, and local route quality should become explicit rather than poetic.
Use those readings to propose better NAS worlds. Modify the search space itself, not just the traveler inside it.
Transfer the kernel to adjacent AI components. Data, curriculum, optimization, memory, routing, and evaluation become next testing grounds.
Only then argue for a wider AI design kernel. A cross-domain claim is earned, not declared.

Failure mode to avoid

The main danger is overclaiming. A method that reads one bounded benchmark well has not yet proven that it can redesign frontier AI. The strength of this paper should come from its ladder, not from inflation.

07 · V18 FINDINGS AND THE NEXT NAS TEST

What v18 actually showed on NB101

The v18 NAS work sharpened the picture. On smaller benchmark worlds, a simple greedy finisher can look deceptively strong because the benchmark is finite, cached, and already close to saturation. That makes greedy vs arena a useful internal diagnostic, but a poor final story.

The cleaner question is different: how far is the arena from the known optimum inside the benchmark, and what kind of structure does it detect on the way? On the harder NB101 run, v18 did not prove final superiority, but it did show something important: the landscape is open enough that strong family structure, signed affinity, macro structure, and polyhedral intersections are all visible at the same time.

Plain reading of v18

Signal yes. Final win not yet. v18 on NB101 produced a strong structural read of the space. It has not yet converted that read into a decisive allocator-over-baseline victory. That is still honest progress, because it separates terrain understanding from terrain exploitation.

Compact evidence card

Benchmark: NAS-Bench-101 · Regime: open · Known optimum: 95.055 · Best simple baseline: 94.600 · Best current governance: 94.532

Strongest Phase A structural signals: signed affinity, family structure, macro structure, and polyhedral filtering. Best poly pool: poly_3face_router with AUC vs random 0.803 and candidate-pool size 33,421.

Reporting caveat

The v18 artifacts already contain the main structural signal, but the reporting layer is not yet fully clean. In particular, some Phase B regime and budget-sweep emission is incomplete in the current JSON/TXT outputs. That should be fixed in the next pass so the evidence pipeline matches the search quality.

What survived the v18 pass

Family structure is real. Winner families are not poetic language. They show up as measurable, separable clusters.
Signed affinity is real. The space is not only rankable; it contains directional signal.
Macro structure is real. Coarse architectural features already recover useful separation.
Polyhedral filtering is alive. Multiple views of the same space can be intersected to create a smaller, richer candidate pool without obviously discarding the top region.
NB101 is a better test than tiny saturated worlds. It leaves enough headroom that a real allocator should still have room to show itself.

What v18 did not yet prove

v18 did not yet prove that the current allocator stack beats the strongest simple finisher inside NB101. That matters, but it does not erase the deeper result. If the reading layer is strong and the exploitation layer is weak, the next move is to improve the exploiter, not to declare the reading false.

Layer	What v18 now supports	What remains open
Landscape reading	Families, signed affinity, macro features, and poly intersections appear to contain real signal.	How stable those signals remain under larger budgets, larger spaces, and stricter controls.
Search quality	The arena can build richer candidate pools than plain local scoring.	Whether the current routing/exploitation layer can systematically convert that structure into better final search outcomes.
Generative promise	Intersected views now look like a plausible bridge toward generative candidate construction.	True generation beyond the benchmark space. That belongs to a later stage, not to v18.

Why the real target is not these benchmarks

The benchmarks we test on are tiny. NAS-Bench-201 has 15,625 architectures. NAS-Bench-101 has 423,624. A simple greedy baseline with budget 180 spends roughly 57% of the search-space worth of evaluations on NAS-201 and about 2% of the search-space worth of evaluations on NAS-101. On NAS-201, greedy lands within 0.01% of the optimum by sheer reach. On NAS-101, the gap narrows to about 0.5%. In both cases, the benchmark is small enough that a brute-force style strategy can still stumble onto the best regions.

That is not where our approach needs to prove itself. It needs to prove itself where brute-force coverage drops to effectively zero.

The scale of real NAS

Consider what architecture search actually looks like for frontier AI systems. DeepSeek V3 is a 671-billion-parameter Mixture-of-Experts model with 256 routed experts per layer across 61 layers, Multi-Head Latent Attention, mixed FP8/BF16 precision, multi-token prediction, and a custom routing strategy with auxiliary-loss-free load balancing. The design space for such a system spans expert count, expert size, routing topology, attention mechanism, precision choices, training recipe, curriculum schedule, and their interactions. The number of meaningful architectural configurations is not thousands or hundreds of thousands. It is combinatorially vast—far beyond exhaustive evaluation.

No public tabular benchmark captures this. The published NAS benchmarks are deliberately small because exhaustive evaluation is their design contract. But the real design decisions behind models like DeepSeek V3, GPT-4, Claude, or Grok involve search spaces where evaluating a single candidate costs thousands of GPU hours. In that regime, greedy sampling of 50 random candidates per step is not “strong.” It is blind.

The Concorde analogy

The 8Z TSP solver cannot beat Concorde on 1,000-city problems. Concorde is exact and provably optimal at that scale. But Concorde does not run on a million cities, or ten million, or a billion. The 8Z solver does. The same principle applies here: a method that reads landscape structure, navigates by family and multi-resolution signal, and governs its own search regime is not competing with brute force on toy problems. It is designed for the regime where brute force cannot even start.

Complexity resilience as the real claim

The founding hypothesis of the 8Z Research Program is that compressibility correlates with quality across domains. The operational consequence is that self-selecting MDL × DCC should be complexity-resilient: its value should increase, not decrease, as the search space grows. On a small benchmark, simple methods win because coverage is cheap. On a vast design space, coverage collapses and structure-aware navigation becomes the only viable path.

This is the working claim of the program, and it is grounded in how the method is designed to operate:

Family decomposition does not enumerate the space. It clusters observed winners and builds affinity models from a small sample. Cost scales with sample size, not with space size.
Polyhedral intersection filters candidates by agreement across multiple views. Each view is a compression of the space. The intersection compresses further. On a 423k space, this produces a 33k candidate pool. On a 10-billion-configuration space, the same mechanism would still produce a tractable candidate set.
MDL governance decides how much complexity each layer of the search deserves. It does not need to visit every candidate. It needs to visit enough to build a reliable compression model, then use that model to navigate.

The benchmarks exist to validate the mechanism on known ground truth, not to demonstrate the method’s ultimate ceiling. The ceiling is in the spaces too large for any tabular benchmark to exist.

What we are actually testing

NAS-201 and NAS-101 are validation grounds, not battlefields. We test there because we can verify every answer. The real question is not “can we beat greedy on 423k architectures?” The real question is: does the method build correct structural understanding that would scale to spaces where greedy is useless? Family splits with AUC 0.999, signed affinity with ρ=0.56, and polyhedral intersections that enrich candidate pools by 4× suggest the answer is yes.

08 · LIVING ROADMAP

How this document should grow next

For now, NAS carries the most detail because it is the cleanest active laboratory. Over time, this document should accumulate new sections with the same discipline used here: first a bounded arena, then cheap tests, then route-level evidence, then broader design implications. Each added component should deepen this template rather than dilute it.

Section	Current state	Next expansion
NAS	Main active body	v18 evidence pack, reporting cleanup, family/poly maps, budget-sweep analysis, and larger-space NAS validation
Training data	Announced	Selection and curriculum wedge experiments
Optimization / loss	Announced	Self-selecting schedules and adaptive objective blends
Memory / tools / routing	Announced	AI8-linked orchestration tests and policy kernels
Compression / distillation	Announced	Bridge to 8Z-style structure preservation and pruning

Document rule

Every future component section should follow the same contract: state the bounded arena, define the cheap test, show what survived, then cautiously widen the claim toward harder and larger regimes.

09 · REASONING COMPONENTS

Retrieval and Reasoning Governance

NAS is the first concrete arena because it is bounded and verifiable. The next component family is not another model architecture. It is the reasoning stack that decides which ideas, lenses, tests, and memories should exist before any model-design decision is made.

Component stance

RHPr and RHP should not be treated as nice prompting language. They are candidate AI-system components: retrieval governance, idea-space mapping, bifurcation preservation, arena conversion, and continuity writing.

Component	Input	Output	Primary metric
Lens Census Engine	Problem statement + first-pass answers	Inventory of active domains, assumptions, metaphors, and missing views	Drawer count, cluster dominance
Absence Detector	Census output	Missing lenses and blocked knowledge drawers	Diversity delta, absence confidence
Forced-Lens Generator	Missing lens	New candidate ideas from the forced viewpoint	Recovery rate R
Child Mode	Problem representation	Physical/sensory/geometric seed candidates	Non-symbolic seed yield
Collision Composer	Outputs from distant lenses	Bridge candidates and structural hybrids	Compression gain, novelty score
Bifurcation Keeper	Disagreements	Preserved forks with assumptions exposed	Decision-trail clarity
Empiricist Gate	Surviving candidates	Cheap arena / parallel test plan	Testability, cost, falsifier strength
Session Genome Writer	Result + lineage	Correct AI8 file updates	Continuity gain without bloat
Skill Extractor / Registry	Repeated or hard workflow	Named, versioned, tested skill	Future-loop cost reduction without quality loss

Cheapest first experiment

Run the same difficult prompt under three conditions: normal prompting, RHP multi-agent debate, and RHPr sequence before RHP. Measure whether the RHPr/RHP condition recovers more distinct structural lenses and produces more testable candidate bridges without increasing hallucinated confidence.

Pass condition

The component earns promotion only if it increases recoverable useful lenses, improves cheap-test yield, and reduces premature convergence. Otherwise it remains a prompt pattern, not an AI8 component.

10 · BRIDGE BACK

Loop + skill bridge

AI8 should not preserve only ideas, roles, and state. Repeated or difficult AIM³ / RHPm / RHP / RHPr workflows can be promoted into skills: named, versioned, tested procedures that future loops can call without starting from zero. This keeps the public AIM³ tree and the private AI8 roots aligned: prompts ask, loops repeat, skills compound.

How this paper connects back to AI8

AI8 Architecture remains the deeper process architecture. AI8 Companion remains the readable map of that architecture. AI8 Components is where the architecture starts touching concrete AI-system design questions.

In that sense, this paper is neither a replacement for the core AI8 framework nor an unrelated tangent. It is the first serious attempt to ask what happens when continuity architecture stops being only about coordination and becomes a wedge for better model design, better training choices, and better system structure.

Simple formulation

Architecture says what AI8 is. Companion says how to read it. Components asks what it can help improve.

RHPm as public interface

RHPm is the practical front door between casual human intent and the heavier AIM³/RHP/RHPr stack. It converts rough requests into strong session prompts with role, goal, files/context, constraints, tests, output format, stop conditions, and optional skill extraction.