AI8 Research · Living Domain Paper

AI8 Components

From NAS to Better AI Systems. A living paper on how a self-selecting MDL × DCC kernel may first learn to read existing AI design landscapes, then improve them, then prove itself in larger regimes where simple coverage-heavy search begins to fail.
Author: Bojan Dobrečevič  ·  Companion Architect: Ari  ·  Status: Living expansion / v18 + reasoning-governance update  ·  Version: v0.3  ·  Date: 2026-04-06
01 · WHY THIS PAPER EXISTS

The third paper in the AI8 document family

AI8 Architecture defines the deeper continuity architecture. AI8 Companion makes that architecture readable and operational. This third paper has a different job: to ask whether the same underlying kernel can be used not only to coordinate human–AI work, but to improve AI systems themselves.

The claim here is intentionally disciplined. We are not claiming that AI8 has already solved AI design. We are proposing a research program: start with one clean, bounded arena such as Neural Architecture Search, learn to read the terrain properly, improve search inside that terrain, then try to shape better terrains and extend the same logic to other AI components.

Core

AI8 Architecture

The base continuity architecture: governed differentiation, recursive coordination, and process-level structure.

Bridge

AI8 Companion

The readable map: what the layers do, how they relate, and how AI8 avoids collapsing into roleplay, ideology, or noise.

Application

AI8 Components

The domain line: can the same kernel help us improve model architecture, data, training flow, routing, memory, and broader AI design choices?

Working stance

This is a living paper. NAS carries the main weight for now. Other components are introduced as targets and future work, not as claims of finished success.

02 · KERNEL

From coordination logic to design logic

The candidate kernel is self-selecting MDL × DCC.

The long goal is not only to navigate one given search space better, but to learn how better spaces themselves might be recognized and eventually generated.

AI8 Components · Program statement
Read existing landscape
Improve search inside it
Shape better landscapes
Extend across AI components
Discipline

Cross-domain ambition only becomes credible if the same kernel survives cheap tests in one arena after another. This paper therefore treats NAS as a first wedge, not as the final destination.

03 · NAS AS THE FIRST WEDGE

Why NAS is the right first arena

Neural Architecture Search is a good first domain because it is bounded enough to test search ideas honestly, yet rich enough to expose family structure, local traps, ridges, corridors, and competing winner regimes. A benchmark NAS world does not reveal the globally ideal architecture; it reveals the best architecture inside a finite, human-designed universe. That is exactly why it is useful.

With a benchmark such as NATS-Bench TSS, the search problem becomes clean: can our method read and navigate the terrain better than simpler baselines? If yes, the next question becomes stronger: can the same method eventually improve the terrain itself?

Question Benchmark NAS can test Benchmark NAS cannot yet prove
Search quality Whether a method finds strong regions, families, and escape routes more effectively than baselines. Whether the method has discovered the globally best neural architecture beyond the benchmark universe.
Landscape reading Whether peaks, families, valleys, boundaries, and local climb paths are real and exploitable. Whether the current search space itself is the right universe to search.
Generative potential A weak but useful precursor signal: whether the method sees structure rather than only ranks points. Whether it can already invent better architectures or better search spaces outside the benchmark.
Current reality

Our immediate target is modest: better reading of an existing NAS space. Only after that do we earn the right to ask for generative NAS landscapes and broader AI-system design claims.

Goal / target regime

Small benchmark NAS worlds are wedge tests, not the final judge. The real target of this research program is very large architecture search spaces where simple coverage-heavy search cannot practically map enough terrain, and where the value of a complexity-resilient search/governance kernel should become more visible. In that sense, NAS-101, NAS-201, and similar benchmark worlds are calibration arenas, not the final destination.

04 · FLOOD / DRAIN

Flood-Reveal, Rain-Lift, and Drain-Escape

The current NAS wedge is organized around a dual reading of landscape structure. One lens tells us what kind of terrain exists; the other tells us how movement inside that terrain should be governed.

Flood-Reveal — top-down structure

Imagine the landscape flooded above the highest terrain, then slowly drained. As the water level falls, peaks, ridges, and families emerge. The goal is not merely to ask which single point is best, but to ask:

Rain-Lift / Boat-in-the-Valley — bottom-up navigation

Now invert the view. We are not looking from above; we are in a valley. Heavy rain raises the water around us and exposes the slopes that actually lead upward. The question becomes practical rather than descriptive:

Structure Lens

Flood-Reveal

Best for persistence, family emergence, region-first thinking, and top-down reading of terrain.

Navigation Lens

Rain-Lift / Drain-Escape

Best for local movement, valley escape, multi-step climb quality, and route selection from ordinary starting points.

Current signal

Early v16 diagnostics suggest that rain/drain navigation carries more immediate teeth than flood alone. Flood still matters as a family/region lens, but local uphill escape may be the stronger practical operator in the near term.

# v16 program logic Read current NAS terrain Detect families, boundaries, corridors, valleys Route through robust uphill escape directions Compare against simple baselines and random controls Promote only what survives the cheap tests
05 · COMPONENT MAP

The broader AI stack this paper targets next

NAS is the first major body of content here, but it is not the only intended target. If self-selecting MDL × DCC is real, it should gradually become useful across a wider set of AI components. The order matters: architecture first, then adjacent layers of the AI design stack.

Current wedge

1. Model architecture

Topology, operator patterns, winner families, ridges, corridors, and family-aware search inside bounded NAS universes.

Next extension

2. Search space design

Not just searching the given terrain better, but deciding which operators, edges, and architectural motifs belong in a better terrain.

Data layer

3. Training data

Selection, mixing, filtering, de-noising, and phase-aware data composition under descriptive pressure and adaptive governance.

Training flow

4. Curriculum and schedule

What to teach first, what to delay, when to harden, when to revisit, and how to move between phases without frozen hand-tuning.

Optimization

5. Optimizer and loss policy

Learning-rate regimes, regularization, objective mixing, augmentation, and other training controls that may benefit from self-selecting governance.

Reasoning layer

6. Retrieval and reasoning governance

Anti-lock retrieval, lens census, forced reframing, idea collision, and arena conversion before a system commits to a frame.

System routing

7. Routing, memory, and tools

Which module or agent should act, when to branch, when to call a tool, what to keep in memory, and what to discard.

Inference layer

8. Inference policy

How much compute to spend, when to go shallow or deep, when to request a second view, and when to stop early.

Quality control

9. Evaluation and self-improvement

Detecting fake progress, separating lucky spikes from true winner families, and deciding which improvements deserve promotion.

Compression side

10. Compression, pruning, distillation

Preserving functional structure while cutting waste, in the same spirit that drives 8Z across other domains.

AI8 bridge

11. Multi-agent collaboration

Role assignment, coordination logic, diversity preservation, and agent-level orchestration as part of a broader intelligent system stack.

What this means

This paper is not just about one better NAS heuristic. It is a first attempt to frame AI system design itself as a family of landscapes that may be read, navigated, and eventually improved by one cross-domain kernel.

06 · RESEARCH PROGRAM

The proof ladder

The right progression is strict. We should not claim the whole ladder before climbing the first steps. This sequencing is the paper's main safeguard against overclaiming.

  1. Search better inside an existing NAS world. Beat simple baselines, random controls, and point-only thinking by using families, persistence, and valley escape logic.
  2. Read the existing world well enough to describe its structure. Peaks, families, boundaries, corridors, and local route quality should become explicit rather than poetic.
  3. Use those readings to propose better NAS worlds. Modify the search space itself, not just the traveler inside it.
  4. Transfer the kernel to adjacent AI components. Data, curriculum, optimization, memory, routing, and evaluation become next testing grounds.
  5. Only then argue for a wider AI design kernel. A cross-domain claim is earned, not declared.
Failure mode to avoid

The main danger is overclaiming. A method that reads one bounded benchmark well has not yet proven that it can redesign frontier AI. The strength of this paper should come from its ladder, not from inflation.

07 · V18 FINDINGS AND THE NEXT NAS TEST

What v18 actually showed on NB101

The v18 NAS work sharpened the picture. On smaller benchmark worlds, a simple greedy finisher can look deceptively strong because the benchmark is finite, cached, and already close to saturation. That makes greedy vs arena a useful internal diagnostic, but a poor final story.

The cleaner question is different: how far is the arena from the known optimum inside the benchmark, and what kind of structure does it detect on the way? On the harder NB101 run, v18 did not prove final superiority, but it did show something important: the landscape is open enough that strong family structure, signed affinity, macro structure, and polyhedral intersections are all visible at the same time.

Plain reading of v18

Signal yes. Final win not yet. v18 on NB101 produced a strong structural read of the space. It has not yet converted that read into a decisive allocator-over-baseline victory. That is still honest progress, because it separates terrain understanding from terrain exploitation.

Compact evidence card

Benchmark: NAS-Bench-101  ·  Regime: open  ·  Known optimum: 95.055  ·  Best simple baseline: 94.600  ·  Best current governance: 94.532

Strongest Phase A structural signals: signed affinity, family structure, macro structure, and polyhedral filtering. Best poly pool: poly_3face_router with AUC vs random 0.803 and candidate-pool size 33,421.

Reporting caveat

The v18 artifacts already contain the main structural signal, but the reporting layer is not yet fully clean. In particular, some Phase B regime and budget-sweep emission is incomplete in the current JSON/TXT outputs. That should be fixed in the next pass so the evidence pipeline matches the search quality.

What survived the v18 pass

What v18 did not yet prove

v18 did not yet prove that the current allocator stack beats the strongest simple finisher inside NB101. That matters, but it does not erase the deeper result. If the reading layer is strong and the exploitation layer is weak, the next move is to improve the exploiter, not to declare the reading false.

Layer What v18 now supports What remains open
Landscape reading Families, signed affinity, macro features, and poly intersections appear to contain real signal. How stable those signals remain under larger budgets, larger spaces, and stricter controls.
Search quality The arena can build richer candidate pools than plain local scoring. Whether the current routing/exploitation layer can systematically convert that structure into better final search outcomes.
Generative promise Intersected views now look like a plausible bridge toward generative candidate construction. True generation beyond the benchmark space. That belongs to a later stage, not to v18.

Why the real target is not these benchmarks

The benchmarks we test on are tiny. NAS-Bench-201 has 15,625 architectures. NAS-Bench-101 has 423,624. A simple greedy baseline with budget 180 spends roughly 57% of the search-space worth of evaluations on NAS-201 and about 2% of the search-space worth of evaluations on NAS-101. On NAS-201, greedy lands within 0.01% of the optimum by sheer reach. On NAS-101, the gap narrows to about 0.5%. In both cases, the benchmark is small enough that a brute-force style strategy can still stumble onto the best regions.

That is not where our approach needs to prove itself. It needs to prove itself where brute-force coverage drops to effectively zero.

The scale of real NAS

Consider what architecture search actually looks like for frontier AI systems. DeepSeek V3 is a 671-billion-parameter Mixture-of-Experts model with 256 routed experts per layer across 61 layers, Multi-Head Latent Attention, mixed FP8/BF16 precision, multi-token prediction, and a custom routing strategy with auxiliary-loss-free load balancing. The design space for such a system spans expert count, expert size, routing topology, attention mechanism, precision choices, training recipe, curriculum schedule, and their interactions. The number of meaningful architectural configurations is not thousands or hundreds of thousands. It is combinatorially vast—far beyond exhaustive evaluation.

No public tabular benchmark captures this. The published NAS benchmarks are deliberately small because exhaustive evaluation is their design contract. But the real design decisions behind models like DeepSeek V3, GPT-4, Claude, or Grok involve search spaces where evaluating a single candidate costs thousands of GPU hours. In that regime, greedy sampling of 50 random candidates per step is not “strong.” It is blind.

The Concorde analogy

The 8Z TSP solver cannot beat Concorde on 1,000-city problems. Concorde is exact and provably optimal at that scale. But Concorde does not run on a million cities, or ten million, or a billion. The 8Z solver does. The same principle applies here: a method that reads landscape structure, navigates by family and multi-resolution signal, and governs its own search regime is not competing with brute force on toy problems. It is designed for the regime where brute force cannot even start.

Complexity resilience as the real claim

The founding hypothesis of the 8Z Research Program is that compressibility correlates with quality across domains. The operational consequence is that self-selecting MDL × DCC should be complexity-resilient: its value should increase, not decrease, as the search space grows. On a small benchmark, simple methods win because coverage is cheap. On a vast design space, coverage collapses and structure-aware navigation becomes the only viable path.

This is the working claim of the program, and it is grounded in how the method is designed to operate:

The benchmarks exist to validate the mechanism on known ground truth, not to demonstrate the method’s ultimate ceiling. The ceiling is in the spaces too large for any tabular benchmark to exist.

What we are actually testing

NAS-201 and NAS-101 are validation grounds, not battlefields. We test there because we can verify every answer. The real question is not “can we beat greedy on 423k architectures?” The real question is: does the method build correct structural understanding that would scale to spaces where greedy is useless? Family splits with AUC 0.999, signed affinity with ρ=0.56, and polyhedral intersections that enrich candidate pools by 4× suggest the answer is yes.

08 · LIVING ROADMAP

How this document should grow next

For now, NAS carries the most detail because it is the cleanest active laboratory. Over time, this document should accumulate new sections with the same discipline used here: first a bounded arena, then cheap tests, then route-level evidence, then broader design implications. Each added component should deepen this template rather than dilute it.

Section Current state Next expansion
NAS Main active body v18 evidence pack, reporting cleanup, family/poly maps, budget-sweep analysis, and larger-space NAS validation
Training data Announced Selection and curriculum wedge experiments
Optimization / loss Announced Self-selecting schedules and adaptive objective blends
Memory / tools / routing Announced AI8-linked orchestration tests and policy kernels
Compression / distillation Announced Bridge to 8Z-style structure preservation and pruning
Document rule

Every future component section should follow the same contract: state the bounded arena, define the cheap test, show what survived, then cautiously widen the claim toward harder and larger regimes.

09 · REASONING COMPONENTS

Retrieval and Reasoning Governance

NAS is the first concrete arena because it is bounded and verifiable. The next component family is not another model architecture. It is the reasoning stack that decides which ideas, lenses, tests, and memories should exist before any model-design decision is made.

Component stance

RHPr and RHP should not be treated as nice prompting language. They are candidate AI-system components: retrieval governance, idea-space mapping, bifurcation preservation, arena conversion, and continuity writing.

ComponentInputOutputPrimary metric
Lens Census EngineProblem statement + first-pass answersInventory of active domains, assumptions, metaphors, and missing viewsDrawer count, cluster dominance
Absence DetectorCensus outputMissing lenses and blocked knowledge drawersDiversity delta, absence confidence
Forced-Lens GeneratorMissing lensNew candidate ideas from the forced viewpointRecovery rate R
Child ModeProblem representationPhysical/sensory/geometric seed candidatesNon-symbolic seed yield
Collision ComposerOutputs from distant lensesBridge candidates and structural hybridsCompression gain, novelty score
Bifurcation KeeperDisagreementsPreserved forks with assumptions exposedDecision-trail clarity
Empiricist GateSurviving candidatesCheap arena / parallel test planTestability, cost, falsifier strength
Session Genome WriterResult + lineageCorrect AI8 file updatesContinuity gain without bloat
Skill Extractor / RegistryRepeated or hard workflowNamed, versioned, tested skillFuture-loop cost reduction without quality loss

Cheapest first experiment

Run the same difficult prompt under three conditions: normal prompting, RHP multi-agent debate, and RHPr sequence before RHP. Measure whether the RHPr/RHP condition recovers more distinct structural lenses and produces more testable candidate bridges without increasing hallucinated confidence.

Pass condition

The component earns promotion only if it increases recoverable useful lenses, improves cheap-test yield, and reduces premature convergence. Otherwise it remains a prompt pattern, not an AI8 component.

10 · BRIDGE BACK
Loop + skill bridge

AI8 should not preserve only ideas, roles, and state. Repeated or difficult AIM³ / RHPm / RHP / RHPr workflows can be promoted into skills: named, versioned, tested procedures that future loops can call without starting from zero. This keeps the public AIM³ tree and the private AI8 roots aligned: prompts ask, loops repeat, skills compound.

How this paper connects back to AI8

AI8 Architecture remains the deeper process architecture. AI8 Companion remains the readable map of that architecture. AI8 Components is where the architecture starts touching concrete AI-system design questions.

In that sense, this paper is neither a replacement for the core AI8 framework nor an unrelated tangent. It is the first serious attempt to ask what happens when continuity architecture stops being only about coordination and becomes a wedge for better model design, better training choices, and better system structure.

Simple formulation

Architecture says what AI8 is. Companion says how to read it. Components asks what it can help improve.

RHPm as public interface

RHPm is the practical front door between casual human intent and the heavier AIM³/RHP/RHPr stack. It converts rough requests into strong session prompts with role, goal, files/context, constraints, tests, output format, stop conditions, and optional skill extraction.