private research prototype · v0.4 completed overnight snapshot · v0.5 planned

ARC-AGI × MDLxDCC Arena

A dynamic primitive arena for ARC-style grid tasks: ARC JSON loader, category routing, wide candidate families, MDL selector, transparent DCC trace, LOO validation, and visual failure mining.

Current read: v0.4 is a real diagnostic arena. It is not an ARC solve claim. The completed overnight run shows that category × family routing is the main signal: easy_signature, macro, symmetry, motion, integer_expand, and crop_like→core/object are useful lanes. Blind global chain2 is weak; routed chain2 has real but narrow value.
Runs
40
finished v0.4 overnight key bundle
Run-task evals
5,284
processed rows across category/family runs
Exact rows
217
train-fit exacts across all runs
Unique exact tasks
39
33 with best LOO = 1.0
HIGH rows
184
confidence HIGH across run summaries
Total runtime
67.3h
10-worker batch, summed run wall times
boundary

What this page is — and is not

This is a private research prototype for failure mining, primitive discovery, routing, and MDL/DCC diagnostics. It is not an ARC Prize submission, not an official benchmark claim, and not an exposure of proprietary MDLxDCC core internals.

The useful measurements here are train-fit exacts, HIGH confidence, LOO survival, new primitive-family wins, near-miss structure, exact per runtime, and whether the arena learns where to spend DCC budget.

visual diagnostics

Visual evidence layer

The local arc_report.html pages are valuable because they show train input, target, winner prediction, exact/near-miss status, categories, candidate counts, and failure buckets. Instead of embedding every run report, the v0.4 package should carry a compact atlas plus links back to per-run reports.

Small preview from selected exacts

natural-law fill_holes00d62c1b

run 02_dev50_wide_nochain2 · family fill_holes · LOO 1.000 · fill holes bg=0 with color=4

train input
target
winner prediction
routed chain2 recolor62ab2642

run 44_color_family_color · family chain2 · LOO 1.000 · chain2: component_recolor -> component_recolor

train input
target
winner prediction
substitution expansion007bbfb7

run 43_expand_family_macro · family substitution_expand · LOO 1.000 · substitution expand k=3 source=full_input active=non_bg recolor=preserve bg=0

train input
target
winner prediction

Package rule: place the generated key bundle beside this page as arc_v04_key_bundle/. Keep the huge original out_arc_v04_overnight_w10 outside the MDLxDCC package unless needed locally.

results

Completed v0.4 overnight findings

Top runs by exact / HIGH / LOO

RunCatFamily catTasksExactHIGHLOO avgExact/hourRuntimeChain2 exactExact families
28_cat_symmetry_widesymmetryall16014120.0916.52:09:370erode_color:1; fill_holes:2; line_connect:1; object_to_marker:1; outline_foreground:1; ray_cast:1; recolor_by_map:1; scale_up_integer:1
32_cat_easy_signature_wideeasy_signatureall16014120.084150.50:05:350erode_color:1; global_color_replace:1; line_connect:1; recolor_by_map:2; remove_background:1; rotate:1; scale_up_integer:2; substitution_expand:1
21_cat_macro_widemacroall16014110.0826.52:08:160crop_largest_object:1; crop_non_bg_bbox:1; diagonal_connect:1; extract_object:1; fill_holes:1; gravity_pack:1; line_connect:2; recolor_by_map:1
15_cat_crop_like_widecrop_likeall16011100.0801.95:40:030block_compress:1; compress_blank_rows_cols:1; crop_largest_object:2; crop_non_bg_bbox:1; extract_object:1; pad_resize:1; remove_background:1; tile_repeat:3
11_cat_in_canvas_widein_canvasall1601190.0662.64:12:260diagonal_connect:1; fill_holes:1; gravity_pack:1; line_connect:2; outline_foreground:1; recolor_by_map:1; rotate:1; symmetry_complete:1
10_cat_same_shape_widesame_shapeall1601190.0662.44:36:480diagonal_connect:1; fill_holes:1; gravity_pack:1; line_connect:2; outline_foreground:1; recolor_by_map:1; rotate:1; symmetry_complete:1
45_paint_family_color_morphrecolor_or_paintcolor,morphology,line2201190.0505.02:12:522chain2:2; diagonal_connect:1; erode_color:1; fill_holes:1; line_connect:2; outline_foreground:1; ray_cast:1; recolor_by_map:1
17_cat_sequence_widesequenceall1601090.0643.92:35:350crop_largest_object:1; crop_non_bg_bbox:1; extract_object:1; fill_holes:1; gravity_pack:1; line_connect:1; recolor_by_map:1; substitution_expand:1
24_cat_frame_wideframeall1601090.0634.72:08:050block_compress:1; crop_largest_object:1; crop_non_bg_bbox:1; extract_object:1; outline_foreground:1; substitution_expand:2; tile_repeat:3
51_crop_family_core_objectcrop_likecore,object1861080.060107.80:05:341chain2:1; crop_largest_object:2; crop_non_bg_bbox:1; extract_object:1; pad_resize:1; remove_background:1; tile_repeat:3
23_cat_multicolor_object_widemulticolor_objectall1601080.0574.42:15:570block_compress:1; crop_largest_object:1; extract_object:1; gravity_pack:1; pad_resize:1; recolor_by_map:1; rotate:1; scale_up_integer:1
30_cat_hard_widehardall120870.0661.26:34:190block_compress:1; compress_blank_rows_cols:1; crop_largest_object:1; extract_object:1; pad_resize:1; tile_repeat:2; wire_connect:1
20_cat_recolor_or_paint_widerecolor_or_paintall160860.0473.72:10:090diagonal_connect:1; fill_holes:1; line_connect:2; outline_foreground:1; recolor_by_map:1; symmetry_complete:1; wire_connect:1
25_cat_holes_wideholesall160660.0482.12:54:490crop_largest_object:1; extract_object:1; fill_holes:1; mirror:1; pad_resize:1; tile_repeat:1

Best ROI routes

RunCatFamily catTasksExactHIGHLOO avgExact/hourRuntimeChain2 exactExact families
43_expand_family_macroexpandmacro68440.070348.00:00:410scale_up_integer:2; substitution_expand:2
49_motion_family_motion_objectmotionmotion,object19220.105346.60:00:210extract_multicolor_object:1; translate_object:1
60_full_integer_expand_macrointeger_expandmacro38440.113167.90:01:260scale_up_integer:2; substitution_expand:2
14_cat_integer_expand_wideinteger_expandall38440.121154.40:01:330scale_up_integer:2; substitution_expand:2
32_cat_easy_signature_wideeasy_signatureall16014120.084150.50:05:350erode_color:1; global_color_replace:1; line_connect:1; recolor_by_map:2; remove_background:1; rotate:1; scale_up_integer:2; substitution_expand:1
51_crop_family_core_objectcrop_likecore,object1861080.060107.80:05:341chain2:1; crop_largest_object:2; crop_non_bg_bbox:1; extract_object:1; pad_resize:1; remove_background:1; tile_repeat:3
27_cat_motion_widemotionall19550.30768.40:04:230crop_non_bg_bbox:1; mirror:1; remove_background:1; rotate:1; translate_object:1
44_color_family_colorcolorcolor220210.00751.80:02:191chain2:1; recolor_by_map:1
46_object_family_objectobjectobject220430.02549.60:04:500extract_multicolor_object:2; extract_object:1; translate_object:1
48_morphology_family_morphmorphologymorphology220320.01745.00:04:001chain2:1; fill_holes:2
47_multicolor_family_object_colormulticolor_objectobject,color220310.00933.10:05:260extract_multicolor_object:1; extract_object:1; recolor_by_map:1
62_full_color_familycolorcolor80100.00626.80:02:140recolor_by_map:1
Routing signal: 32_cat_easy_signature_wide gives 14 exact in 5:35; 51_crop_family_core_object gives 10 exact in 5:34; 43_expand_family_macro gives 4 exact in 41 seconds. These are the lanes v0.5 should remember and route toward first.
composition

Chain2: weak globally, useful when routed

The 50-task global baseline with chain2 and the no-chain2 ablation both produced 4 exact / 3 HIGH. The no-chain2 run used roughly half the runtime. That keeps chain2 out of the global default.

But routed chain2 produced 5 exact rows across 5 unique tasks, with 3 robust LOO=1.0 rows. The useful cases are mostly color/morph/crop compositions such as line_connect → component_recolor or fill_holes → global_color_replace.

v0.5 rule: add --chain2-policy off|always|routed|exact-only. Default should be routed, with category evidence and per-family caps.
family signal

Primitive-family signal

Exact win familyRows
tile_repeat20
fill_holes17
substitution_expand16
line_connect16
recolor_by_map15
wire_connect14
scale_up_integer12
crop_largest_object12
extract_object11
translate_object9
diagonal_connect8
outline_foreground7
gravity_pack6
symmetry_complete6
crop_non_bg_bbox6
pad_resize6
Near-miss familyRows
global_color_replace747
identity690
pad_resize537
line_connect341
erode_color240
fill_holes232
translate_object187
diagonal_connect184
frame_extract172
crop_color_bbox170
row_col_project156
tile_repeat135
extract_multicolor_object125
crop_largest_object103
component_recolor91
chain282

Row counts are not unique-task counts because the same task can appear in several category slices. Still, the repeated winners are informative: tile_repeat, fill_holes, substitution_expand, line_connect, recolor_by_map, wire_connect, scale_up_integer, crop_largest_object, and extract_object are the current strongest families.

discovery

First-discovery progression

Across the 40-run bundle, v0.4 finds 39 unique exact train-fit task ids; 33 have best LOO=1.0. Baseline global runs found 4; the rest came from slicing and routing.

First run that found taskNew exact task idsRobust LOO=1.0Task ids
01_dev50_wide_all43007bbfb7, 00d62c1b, 070dd51e, 0d3d703e
10_cat_same_shape_wide871e0a9b12, 1f876c06, 22168020, 22eb0ac0, 25ff71a9, 3c9b0459, 4347f46a, 496994bd
13_cat_expand_wide335b6cbef5, 60c09cac, c59eb873
15_cat_crop_like_wide11101cf80156, 1f85a75f, 2013d3e2, 23b5c85d, 28bf18c6, 5614dbcf, 5bd6f4ac, 68b67ca3, 73182012, a740d043, be94b721
25_cat_holes_wide1167a3c6ac
28_cat_symmetry_wide44623ea044, 6f8cd79b, 88a10436, a5313dff
32_cat_easy_signature_wide329dfd6313, aabf363d, c8f0f002
44_color_family_color1162ab2642
45_paint_family_color_morph220d87d2a6, 7b6016b9
48_morphology_family_morph1091714a58
51_crop_family_core_object10b9b7f026
failure mining

Failure buckets and runtime bombs

Failure bucketRows
needs_depth2_composition4985
near_miss_small_rule_gap2801
needs_natural_law_primitive1795
needs_in_canvas_rule1180
needs_relational_object_reasoning1086
needs_wire_crossing_or_signal_priority302
needs_crop_then_recolor_or_mask_cleanup220
needs_substitution_or_morphogenesis206
needs_crop_resize_or_canvas_transform137
unknown_or_small_missing_step25
needs_transform_then_crop_or_recolor17
RunCatFamily catTasksExactHIGHLOO avgExact/hourRuntimeChain2 exactExact families
81_holdout_line_widelineline100000.0030.03:51:300
41_sequence_family_linesequenceline220110.0060.42:49:120line_connect:1
52_holes_family_morph_naturalholesmorphology,natural220220.0180.45:00:510fill_holes:2
31_cat_slow_risk_quickslow_riskall116220.0220.63:29:370crop_largest_object:1; wire_connect:1
30_cat_hard_widehardall120870.0661.26:34:190block_compress:1; compress_blank_rows_cols:1; crop_largest_object:1; extract_object:1; pad_resize:1; tile_repeat:2; wire_connect:1
40_line_family_linelineline220430.0191.42:50:410diagonal_connect:1; line_connect:2; wire_connect:1
61_full_line_familylineline80220.0251.91:03:260diagonal_connect:1; wire_connect:1
15_cat_crop_like_widecrop_likeall16011100.0801.95:40:030block_compress:1; compress_blank_rows_cols:1; crop_largest_object:2; crop_non_bg_bbox:1; extract_object:1; pad_resize:1; remove_background:1; tile_repeat:3
26_cat_morphology_widemorphologyall160660.0482.02:57:280crop_largest_object:1; extract_object:1; fill_holes:1; mirror:1; pad_resize:1; tile_repeat:1
25_cat_holes_wideholesall160660.0482.12:54:490crop_largest_object:1; extract_object:1; fill_holes:1; mirror:1; pad_resize:1; tile_repeat:1

needs_depth2_composition is too broad. It appears almost everywhere and must be split into object matching, macro-grid, line completion, marker instruction, crop/resize, color-role transfer, local cellular rule, and canvas transformation buckets. identity and global_color_replace near-misses are also too noisy and should be treated as diagnostic hints, not strong routes.

next build

v0.5 update plan

Next overnight stance: ready for v0.5-alpha once routing/memory/failure-bucket changes land. Do not spend the next run on blind all-family full-budget expansion.
reproducibility

Reproducibility and artifact layout

Canonical source files remain the arena code, batch file, summarizer, and JSON outputs. The compact key bundle is the MDLxDCC-facing artifact.

python arc_mdlxdcc_arena_v0_4.py --data-dir D:\8Z\8z\ARC\ARC-AGI-2 --split dev --max-tasks 50 --outdir out_arc_v04_quick_w10 --mode both --fresh --budget wide --workers 10 --progress-every 5
python summarize_arc_overnight.py out_arc_v04_overnight_w10
python arc_collect_keydata_v04.py out_arc_v04_overnight_w10 --out out_arc_v04_keybundle

Expected local package link target: arc_v04_key_bundle/arc_visual_atlas.html, arc_v04_key_bundle/arc_report_hub.html, arc_v04_key_bundle/run_index.csv, and arc_v04_key_bundle/task_index.csv.