RouteSignal Scout · Prompt + Code Method Study · 2026-05-31

Same seed. Five prompt paths. Then code.

A practical comparison of direct prompting, Microsoft Copilot Prompt Coach, RHPr/RHP workflows, and the BD × GPT hybrid reference lineage for building RouteSignal Scout. This is the method case study; the companion Scout page explains the prototype's internet-ops purpose, current validation, and practical benefit.

Prompt-level benchmark completeCode-output benchmark completeE hybrid reference explainedNot a vendor benchmark

Open main RouteSignal Scout page →Read method results ↓

How to read this page

Method study, not the main product page

This page explains how different prompt-building workflows performed on the same RouteSignal task. It should be read together with the companion RouteSignal Scout page, which now carries the product/evidence framing: why the tool may matter for internet operations, where it fits next to existing BGP monitoring and AIOps tools, and what the current v0.2 validation is testing.

This page answers

Which prompting path recovered the most implementation-critical structure for a hard coding-builder task: endpoint specificity, data-quality handling, offline replay, test discipline, no-overclaim safety, MDL×DCC scoring, and operator-review reporting?

The Scout page answers

What RouteSignal Scout is trying to do in the real world: turn noisy public routing data into a shorter, explainable review queue so a human operator or researcher can decide where to look first.

Why they are linked

The prompt-method result matters because the better prompts did not merely look cleaner; they produced stronger Python packages. The Scout page is the evolving internet-ops prototype. This page is the methodological audit trail behind how that prototype was prompted into existence.

Companion page

For the current RouteSignal purpose, evidence baseline, v0.2/v0.2.5 validation status, safety boundary, and practical operator-facing value, read RouteSignal Scout →. For the prompting evidence, A/B/C/D/E comparison, and raw prompt/report archive, stay on this page.

Executive summary

What the test showed

The test used one rough RouteSignal Scout seed and compared multiple ways of turning it into a Python coding-builder prompt, then compared the resulting code packages. The task was intentionally difficult: public BGP/RPKI/IPv6-style signal analysis, RIPEstat-first data handling, deterministic offline/demo behavior, no production routing claims, and MDL×DCC-style multi-signal scoring. For the real-world purpose and validation status of the prototype itself, use the companion Scout page.

Phase 1

Prompt-quality study. E-reference scored 96/100, D 94, C 91, A 89, and B / Prompt Coach 82.

Phase 2

Code-output study. Fresh A/B/C/D builds ranked D 94, C 92, A 91, B 81. Reference lineage ranked E-v0.1 96 and E-v0.2 98.

Main finding

Prompt Coach improved clarity and safety, but it stayed last in both the prompt-level and code-output evaluation for this hard coding-builder task.

Best method

The BD × GPT hybrid reference lineage remains strongest because it combines direct build-first clarity with RHPr/RHP blind-spot, test, and scoring discipline.

Bottom line

For this RouteSignal task, RHPr/RHP and the BD × GPT hybrid workflow recovered more implementation-critical structure: endpoint specificity, data-quality handling, fallback/offline behavior, tests, MDL×DCC scoring, and operator-review reporting. Prompt Coach was useful, but more generic. The practical RouteSignal value proposition is maintained on the companion Scout page; this page preserves the prompt-method evidence trail.

Design

The A/B/C/D/E comparison

The test is a practical workflow comparison, not a pure model benchmark. A and C were generated in Microsoft Copilot with the selected deep-thinking model path, B used Microsoft Copilot Prompt Coach whose exact backing model/path was not exposed, D used a fresh BD × GPT RHPr/RHP run, and E is the older BD × GPT hybrid reference.

Label	Input/output	Method	Role in interpretation
A	`A-result.txt`	Direct/classic prompt generation in MS Copilot	Strong direct baseline
B	`B-result.txt`	Microsoft Copilot Prompt Coach	Prompt Coach baseline
C	`C-result.txt`	MS Copilot + RHPr/RHP links/workflow	RHPr/RHP inside Microsoft environment
D	`D-result.txt`	Fresh BD × GPT RHPr/RHP run	Best fresh A/B/C/D prompt and build
E	`E-reference.txt`	Older BD × GPT hybrid reference	Champion reference / mature lineage

Why E is special

E is not just another one-shot prompt. It was produced through a BD × GPT hybrid process using GPT-5.5 Pro: one path used a normal/direct builder prompt, another path used the RHPr/RHP-style reasoning route, and then the strongest parts of both were merged into a single hybrid. That is why E combines direct prompt advantages — shorter, build-first, lower cognitive load — with RHPr/RHP advantages: stronger scoring discipline, data-quality discipline, kill-tests, endpoint awareness, and operator-review framing. For fairness, E is labeled as a reference/champion lineage rather than a same-condition fresh peer to A/B/C/D.

Phase 1

Prompt-level ranking

Scoring is for Python-builder prompt quality: specificity, safe scope, RIPEstat/API correctness, offline/cache behavior, MDL×DCC scoring fidelity, acceptance tests, output artifacts, and implementation usefulness.

196

E-reference.txt

BD × GPT hybrid reference

Strongest builder prompt: direct build-first clarity plus RHPr/RHP discipline and endpoint-specific RIPEstat correction.

294

D-result.txt

fresh BD × GPT RHPr/RHP

Best fresh prompt result; strong API handling, scoring/reporting, and no-overclaim discipline.

391

C-result.txt

MS Copilot + RHPr/RHP

Strong blind-spot recovery, data-quality thinking, and test discipline.

489

A-result.txt

MS Copilot direct/classic

Strong direct baseline; clean and practical, but less deep than C/D/E.

582

B-result.txt

MS Copilot Prompt Coach

Useful and safe, but the weakest builder prompt for this hard coding task.

Prompt	Build target	Safety	API precision	Offline/cache	MDL×DCC	Tests	Outputs	Workflow	Total
`E-reference.txt`	14	14	14	12	14	12	10	8	96
`D-result.txt`	14	14	13	11	13	11	10	8	94
`C-result.txt`	13	14	9	11	13	12	9	7	91
`A-result.txt`	13	14	11	11	13	10	9	7	89
`B-result.txt`	11	14	8	10	12	10	8	7	82

Phase 2

Code-output benchmark

The next test used the generated builder prompts to produce Python packages and then scored the resulting artifacts. The fresh A/B/C/D comparison remains the cleanest one-shot comparison. E-v0.1 and E-v0.2 are shown as reference/champion lineage artifacts.

198

E-v0.2 latest

current RouteSignal champion

Best engineering artifact: v0.2 reporting, analysis bundle, master smoke test, and operator-facing evidence layer.

296

E-v0.1 reference

v0.1 champion/reference build

Strong endpoint discipline, peer controls, deterministic/offline behavior, workers, and expert-review framing.

394

D build

best fresh A/B/C/D build

Richest fresh implementation with strong MDL×DCC, safety, fallback, self-test, and review behavior.

492

C build

RHPr/RHP fresh build

Very strong artifact discipline, self-tests, data-quality handling, and clean generated outputs.

591

A build

direct/classic fresh build

Surprisingly strong direct baseline: clean, robust, and runnable, but less deep than C/D/E.

681

B build

Prompt Coach-derived build

Worked, but stayed last and had a concrete demo-resource normalization bug such as malformed synthetic resources.

Rank	Build	Score	Category	Verdict
1	E-v0.2 latest	98/100	Current champion / v0.2 release	Best overall engineering artifact; strongest reporting, test harness, thin packaging, and operator-facing evidence layer.
2	E-v0.1 reference	96/100	v0.1 reference/champion	Best v0.1-line reference build; strong endpoint discipline, peer controls, deterministic/offline behavior, and expert-review framing.
3	D	94/100	Fresh A/B/C/D build	Best fresh-build prompt result; richest A/B/C/D implementation, strong MDL×DCC and safety behavior.
4	C	92/100	Fresh A/B/C/D build	Very strong RHPr/RHP build; good artifact discipline, self-tests, and data-quality handling.
5	A	91/100	Fresh A/B/C/D build	Strong direct baseline; clean, robust, runnable, but less deep than C/D/E.
6	B / Prompt Coach	81/100	Fresh A/B/C/D build	Weakest; worked, but more generic and had a concrete demo-resource normalization bug.

Important caveat

Public internet/DNS was not usable in the evaluation container, so live AS5603 behavior was judged by implemented fetch paths, fail-soft behavior, manifests, and included test evidence rather than live API success. This study is strong as a prompt/code-output comparison, not as a validation of live routing conclusions.

Prompt → Code

Correlation between prompt quality and generated code

The ranking was not random: the weaker prompt generally produced the weaker code, and the stronger hybrid/RHPr lineage produced the stronger packages.

Method/build	Prompt score	Code score	Interpretation
E-reference prompt / E-v0.1 build	96	96	Strongest v0.1 reference lineage.
D-result / D build	94	94	Best fresh A/B/C/D path.
C-result / C build	91	92	RHPr/RHP improved code discipline and tests.
A-result / A build	89	91	Direct baseline was genuinely strong.
B-result / B build	82	81	Prompt Coach stayed last at both levels.

Fair conclusion

Prompt Coach was not useless; it improved clarity and safety. But in this case it was the weakest method for a hard Python coding-builder task. RHPr/RHP and the hybrid workflow produced more complete implementation structure, stronger failure handling, more useful tests, and better MDL×DCC/operator-review discipline.

Evidence

Original prompt files

These are the source `.txt` files used in the study. They are included here so the comparison is auditable from the raw seed through the generated builder prompts.

Seed.txtRaw seed · 2,662 chars

I want to make a better prompt for an LLM coding builder.

Problem:
I want to build RouteSignal Scout v0.1 — a small Python-first prototype for read-only analysis of public internet routing signals, mainly BGP/RPKI/IPv6 style signals.

The idea:
The tool should use public data first, especially RIPEstat. Start with AS5603, maybe also AS21283 and a few prefixes. It should look at BGP update activity, bgp-updates, maybe bgp-state snapshots. I do not want production claims. It must not say “this is an attack”, “this is a hijack”, “this is an outage”, or “this is a route leak”. It should say safer things like “routing-signal candidate”, “anomaly candidate”, “high residual”, “worth operator review”, or “data-quality warning”.

I want one robust Python script first, for example routesignal_scout_v0_1.py. It should also work without internet via deterministic demo-data mode and offline/cache replay. It should save raw API responses in cache. It should generate an HTML report, CSV events/candidates, JSON summary, and manifest. It should be Windows-friendly and include .bat files for demo and AS5603 test.

Important constraints:
- read-only research prototype
- public data only
- no router configuration
- no production routing advice
- no GPU
- no Rust
- no live JS dashboard in v0.1
- no raw MRT parser in v0.1
- no LLM/API calls inside the tool
- if internet is unavailable, demo-data must still generate all output files
- include acceptance/smoke tests
- clearly report data/source quality
- do not simply sort by update count and call it an anomaly

I want to use my MDL×DCC approach:
- multiple independent signal families, not one metric
- robust baseline such as median/MAD
- symbolic stream for BGP events, for example A/W/4/6/P/S/U
- LZ/entropy/zlib/process metrics
- DCC scoring where a candidate is promoted only if at least two independent signal families agree
- the score must be explainable, not black box

Domain context:
MDL×DCC is described here:
https://www.mdlxdcc.org/crp/MDLxDCC.html

Task:
Improve this into a strong builder prompt for a new LLM coding session.

Do not solve the task. Do not write code. Return one improved builder prompt that I can paste into a fresh LLM session.

The improved prompt should include:
1. builder role,
2. goal,
3. constraints,
4. files to create,
5. CLI examples,
6. RIPEstat data-source requirements,
7. demo/offline/cache requirements,
8. scoring/metrics requirements,
9. output/report requirements,
10. acceptance tests,
11. “do not overclaim” rules,
12. what to do if internet/API fails.

Keep my intent, but make the prompt clearer, safer, more testable, and more useful for a coding builder.

A-prompt.txtA starting/meta prompt · 917 chars

# A-prompt.txt — MS Copilot Direct / Classic Prompt Baseline

Task:
Using the raw RouteSignal Scout seed in the attached file, create a practical direct builder prompt for an LLM coding session.

Do not write code.
Do not solve the task.
Your job is only to turn the rough seed into a clear, build-ready prompt.

Required input:
- Attached seed file: `Seed.txt`
- Domain context page: https://www.mdlxdcc.org/crp/MDLxDCC.html

Context-access rule:
Before improving the prompt, check whether you can read the domain context page.
If you cannot access/read the MDL×DCC page, do not continue from generic knowledge and do not guess the missing context. Stop and say exactly which file you need attached. The expected attachment is:
- `MDLxDCC.html`

Then wait for the attachment.

If the context page is accessible, return:
1. final builder prompt,
2. assumptions made,
3. acceptance checklist.

Read attached seed:
Seed.txt

A-result.txtA generated builder prompt · 17,192 chars

You are a senior Python research-prototype builder.

Your task is to build RouteSignal Scout v0.1: a small, practical, Python-first, read-only research prototype for analyzing public internet routing signals. The tool must focus on BGP/RPKI/IPv4/IPv6-style public routing signals, starting with AS5603, optionally AS21283, and user-supplied prefixes/resources.

Important: do not make production routing claims, do not provide routing advice, and do not label anything as an attack, hijack, outage, or route leak. This is a research signal scout only.

Build the prototype, but keep the implementation scope intentionally modest, robust, reproducible, and Windows-friendly.

PRIMARY GOAL

Create one robust Python script first:

- routesignal_scout_v0_1.py

The script should collect or replay public routing-signal data, compute explainable multi-family MDL×DCC-inspired metrics, identify cautious “routing-signal candidates,” and generate complete offline-friendly outputs:

- HTML report
- CSV candidates/events
- JSON summary
- manifest
- raw API response cache
- Windows .bat launchers
- smoke/acceptance tests

The result should be useful as a v0.1 research prototype, not a production network-monitoring system.

DOMAIN METHOD CONTEXT: MDL×DCC

Use the following practical interpretation of MDL×DCC for this prototype:

MDL side:
- Prefer compact, explainable descriptions of routing behavior.
- Treat compression and residual structure as signals that something is worth inspecting, not as proof of cause.
- Use robust baselines such as median and MAD rather than fragile means only.
- Use symbolic streams and compression-style metrics to detect changes in structure, not just volume.

DCC side:
- Treat candidate promotion as governed escalation.
- Do not promote a candidate because one metric spikes.
- Promote only when at least two independent signal families agree.
- Keep the score explainable: every candidate must show which signal families contributed and why.
- Avoid black-box scoring.

In v0.1, implement “MDL×DCC-inspired” behavior pragmatically. Do not claim the tool proves MDL×DCC, proves causality, detects attacks, detects hijacks, detects outages, or detects route leaks.

NON-NEGOTIABLE SAFETY AND SCOPE RULES

The tool must be:

- Read-only.
- Public-data-only.
- Research/prototype only.
- Safe for offline/demo usage.
- Clear about uncertainty and data quality.

The tool must not:

- Configure routers.
- Recommend production routing changes.
- Claim “attack,” “hijack,” “outage,” “route leak,” “incident,” or “breach.”
- Use live packet capture.
- Use private/internal network data.
- Use GPU.
- Use Rust.
- Use a live JavaScript dashboard in v0.1.
- Implement a raw MRT parser in v0.1.
- Call any LLM/API inside the tool.
- Depend on internet access to produce demo outputs.
- Simply sort by update count and call that an anomaly.

Use cautious terminology only, such as:

- routing-signal candidate
- anomaly candidate
- high residual
- elevated structural change
- worth operator review
- data-quality warning
- insufficient evidence
- source-limited observation
- public-observation artifact

Explicitly state in the report:

“This tool does not diagnose attacks, hijacks, outages, leaks, or operational faults. It only identifies public routing-signal candidates that may be worth human review.”

RIPESTAT / PUBLIC DATA SOURCE REQUIREMENTS

Use RIPEstat public Data API first.

Implement a small data-source layer that can call RIPEstat endpoints directly by HTTP GET and cache raw JSON responses.

Required RIPEstat endpoint families for v0.1:

1. BGP updates
- Use the RIPEstat BGP updates endpoint.
- Query by resource such as AS5603, AS21283, or a prefix.
- Support starttime and endtime.
- Preserve update type A/W where available.
- Preserve target prefix, timestamp, AS path, communities if present, source_id / RRC peer, and other returned fields.

2. BGP state snapshots
- Use the RIPEstat BGP state endpoint.
- Query by resource and timestamp.
- Preserve target prefix, AS path, communities, source_id, route count, query time, and resource.

3. Announced prefixes / resource discovery
- If the user supplies an AS resource and no prefixes, try to discover announced prefixes using RIPEstat.
- Keep discovery optional and cacheable.
- If discovery fails, continue with the AS-level resource where RIPEstat supports it.

4. RPKI / routing validation style metadata
- If RIPEstat exposes suitable validation data for the selected resource/prefix, collect it.
- If it is unavailable or not applicable, record “not collected” or “not applicable.”
- Do not treat RPKI invalid/not-found as proof of malicious activity.

5. Metadata and source quality
- Record API URL, endpoint, resource, query window, timestamp, status, response time if available, error state, and cache filename.
- Record RRC/source coverage where available.
- Report missing data, empty responses, partial API failures, time-window limitations, stale data, and endpoint errors as data-quality warnings.

API behavior:
- Use polite request behavior.
- Use a clear sourceapp-style identifier if supported by the API.
- Use timeouts.
- Use retries with backoff.
- Never crash the whole run because one endpoint fails.
- Always continue to offline/demo/report generation if possible.

INTERNET, OFFLINE, DEMO, AND CACHE REQUIREMENTS

The script must support three modes:

1. Online collection mode
- Fetch public RIPEstat data.
- Save every raw API response to cache before processing.
- Continue if some endpoints fail.
- Generate outputs from whatever public data was collected.
- Clearly mark data gaps.

2. Offline/cache replay mode
- Read previously cached raw JSON responses.
- Do not make network calls.
- Generate the same kind of outputs as online mode.
- Include manifest entries proving the run came from cache.

3. Deterministic demo-data mode
- Must work with no internet and no cache.
- Must generate all required output files.
- Use deterministic built-in synthetic/demo routing-signal records.
- The demo data must be clearly labeled as synthetic/demo data in every output.
- Demo data must include at least:
- a quiet baseline period
- an elevated update burst
- a withdrawal-heavy window
- an IPv4/IPv6 mix
- a path-change-like symbolic sequence
- at least one data-quality warning example
- Demo mode must not imply any real event occurred.

CACHE REQUIREMENTS

Create a cache directory by default, for example:

- cache/

Cache each API response as raw JSON with a deterministic filename derived from:
- endpoint
- resource
- start/end/timestamp
- RRC filter if any
- query hash if needed

Also create or update a cache index / manifest entry that records:
- generated_at
- endpoint
- full URL
- resource
- mode
- status
- cache file path
- response hash
- response size
- whether the response was live, cached, demo, failed, or empty

SCORING AND METRICS REQUIREMENTS

Do not use a single metric. Implement multiple independent signal families.

At minimum, implement these signal families:

A. Volume/residual family
- Count updates per time bucket.
- Separate announcements and withdrawals where possible.
- Compute robust baseline per resource using median and MAD.
- Compute residual-like scores.
- Label high values as “high residual,” not anomaly proof.

B. Symbolic-stream / compression family
- Convert routing observations into a symbolic stream.
- Use a documented symbol alphabet, for example:
- A = announcement
- W = withdrawal
- 4 = IPv4-related observation
- 6 = IPv6-related observation
- P = path-shape/path-length/path-change observation
- S = state snapshot observation
- U = unknown/uncategorized observation
- Compute transparent stream metrics such as:
- Shannon entropy
- LZ-style phrase count or simple LZ76-style complexity
- zlib compression ratio
- repeated motif / burstiness indicators
- These metrics should indicate structural change or compressibility change, not diagnose cause.

C. Path/topology family
- From BGP updates/state where available, compute explainable path metrics:
- AS path length
- unique path count
- path-length change
- origin diversity if visible
- source/RRC diversity if visible
- Treat path changes as “path signal” only.
- Do not label origin/path changes as hijacks, leaks, or attacks.

D. Address-family / prefix family
- Track IPv4 vs IPv6 signal split.
- Track target prefixes where available.
- Track prefix concentration or spread.
- Report when a candidate is dominated by one prefix or address family.

E. Data/source-quality family
- API failures
- empty responses
- missing fields
- low collector/source coverage
- inconsistent timestamps
- demo/synthetic data
- stale or partial observations
- cache-only status

DCC-style promotion rule:
- A row may be promoted from “event observation” to “routing-signal candidate” only if at least two independent signal families agree.
- Example: volume residual + compression shift.
- Example: compression shift + path/topology diversity.
- Example: volume residual + data-quality warning should be handled carefully: it may become “source-limited observation,” not a strong candidate.
- If only one family fires, mark it as “single-family observation,” not candidate.

Candidate score:
- Must be explainable.
- Must include family-level contributions.
- Must include reasons and caveats.
- Must be bounded, for example 0–100.
- Do not hide the scoring formula.
- Do not tune the score to force candidates.

Suggested scoring style:
- family flags contribute points
- robust residuals contribute capped points
- compression shift contributes capped points
- path/topology change contributes capped points
- data-quality penalties reduce confidence or change label
- final label depends on both score and family agreement count

OUTPUT REQUIREMENTS

Create an output directory by default, for example:

- out/<run_id>/

For every run, generate:

1. HTML report
Filename example:
- report.html

Must include:
- title: RouteSignal Scout v0.1
- mode: online / cache replay / demo
- resources analyzed
- time window
- public-data-only disclaimer
- do-not-overclaim disclaimer
- source/data quality summary
- RIPEstat endpoint summary
- resource summary
- candidate table
- event/observation table
- score explanation
- symbolic stream summary
- cache/manifest summary
- “What this does not mean” section
- “Worth operator review” wording only where justified
- no unsafe labels such as attack/hijack/outage/route leak

2. CSV candidates
Filename example:
- candidates.csv

Include columns such as:
- run_id
- resource
- bucket_start
- bucket_end
- label
- candidate_score
- confidence
- family_agreement_count
- volume_signal
- compression_signal
- path_signal
- prefix_signal
- data_quality_signal
- reasons
- caveats

3. CSV events/observations
Filename example:
- events.csv

Include normalized event rows where possible:
- timestamp
- resource
- target_prefix
- event_type
- symbol
- source_id
- path_length
- origin_as_if_available
- raw_cache_file
- data_quality_flags

4. JSON summary
Filename example:
- summary.json

Include:
- run metadata
- parameters
- resources
- mode
- source status
- metrics summary
- candidates
- warnings
- output file paths

5. Manifest
Filename example:
- manifest.json

Include:
- script version
- run_id
- generated_at
- command-line arguments
- Python version
- platform
- mode
- input resources
- cache files
- raw response hashes
- output files
- warnings/errors

6. Optional readable notes
Filename example:
- README_run.txt

Include a compact human-readable explanation of what was generated.

WINDOWS-FRIENDLY REQUIREMENTS

Create .bat files:

1. run_demo.bat
- Runs deterministic demo mode.
- Produces all outputs without internet.

2. run_as5603.bat
- Runs an AS5603 public-data smoke test.
- If internet/API fails, it must not crash silently; it should produce a report with data-quality warnings or instruct the user to run demo mode.

3. run_cache_replay.bat
- Runs offline/cache replay against the local cache.

4. smoke_tests.bat
- Runs smoke tests.

Do not require Linux-only shell scripts for basic use.

CLI REQUIREMENTS

Provide a clear CLI. Include examples in README/comments/help text.

Required examples:

- Demo/no internet:
python routesignal_scout_v0_1.py --demo --out out/demo

- Online AS5603:
python routesignal_scout_v0_1.py --resource AS5603 --hours 48 --out out/as5603

- Online AS21283:
python routesignal_scout_v0_1.py --resource AS21283 --hours 48 --out out/as21283

- Prefix resource:
python routesignal_scout_v0_1.py --resource <prefix> --hours 48 --out out/prefix_test

- Explicit time window:
python routesignal_scout_v0_1.py --resource AS5603 --start 2026-05-01T00:00:00Z --end 2026-05-03T00:00:00Z --out out/as5603_window

- Cache replay:
python routesignal_scout_v0_1.py --offline --cache cache --resource AS5603 --out out/as5603_replay

- Limited collectors if supported:
python routesignal_scout_v0_1.py --resource AS5603 --hours 48 --rrcs 0,4,12 --out out/as5603_rrcs

Suggested CLI flags:

- --resource
- --resources
- --start
- --end
- --hours
- --bucket-minutes
- --rrcs
- --cache
- --out
- --demo
- --offline
- --refresh-cache
- --sourceapp
- --timeout
- --retries
- --max-prefixes
- --verbose

IMPLEMENTATION STYLE REQUIREMENTS

Keep v0.1 simple.

Preferred:
- One main Python script.
- Clear internal sections/classes/functions.
- Standard library first.
- If using third-party packages, keep them minimal and document them.
- The script should still have a graceful no-dependency or minimal-dependency path if reasonable.

Do not:
- Create a large framework.
- Create a web server.
- Create a live dashboard.
- Add LLM calls.
- Add router integrations.
- Add raw MRT parsing.
- Add GPU logic.
- Add Rust components.

ROBUST ERROR HANDLING

If internet is unavailable:
- Demo mode must still work.
- Online mode should fail gracefully and generate a report if possible.
- The report must say which endpoints failed and what outputs are based on.

If RIPEstat returns empty data:
- Generate outputs.
- Mark “empty public observation window” or similar.
- Do not invent routing events.

If some fields are missing:
- Keep the raw response.
- Normalize what can be normalized.
- Mark missing fields in data-quality flags.

If the API changes:
- Store raw responses.
- Put parser warnings in the manifest and report.
- Do not crash unless absolutely unavoidable.

ACCEPTANCE / SMOKE TESTS

Implement or document smoke tests that verify:

1. Demo mode works without internet.
- Command:
python routesignal_scout_v0_1.py --demo --out out/test_demo
- Expected:
report.html, candidates.csv, events.csv, summary.json, manifest.json all exist.

2. Demo mode is deterministic.
- Running demo mode twice should produce equivalent normalized results except for generated_at/run_id fields.

3. Offline/cache replay works.
- After an online or demo-cache run, run offline mode and verify outputs are generated.

4. AS5603 online smoke test works if internet and RIPEstat are available.
- If unavailable, the test should fail gracefully with a data-quality warning, not with an unhandled exception.

5. No unsafe overclaiming language appears in outputs.
- Search generated HTML/CSV/JSON for forbidden words used as labels:
attack, hijack, outage, route leak, breach, incident
- If these appear, they must only appear in the explicit “what this tool does not claim” disclaimer, never as a diagnosis or candidate label.

6. Candidate promotion rule is enforced.
- A candidate must have at least two independent signal families.
- Single-family spikes must remain “single-family observation.”

7. Raw cache is created.
- Every online API response must have a corresponding raw cache file and manifest entry.

8. Data-quality warnings are visible.
- Simulate or trigger an endpoint failure and verify report + JSON summary include the warning.

9. Windows launchers exist.
- run_demo.bat
- run_as5603.bat
- run_cache_replay.bat
- smoke_tests.bat

10. The tool does not require private data, router access, GPU, Rust, raw MRT parsing, or LLM/API calls.

DELIVERABLES TO RETURN

Return the created files and a concise implementation summary.

Do not over-explain theory. Focus on:
- what was built
- how to run it
- what files are generated
- what the scoring means
- what the scoring does not mean
- known limitations
- next safe v0.2 ideas

FINAL REMINDER

This is RouteSignal Scout v0.1: a public, read-only, offline-capable routing-signal research prototype.

It should help identify “worth operator review” routing-signal candidates using explainable multi-family MDL×DCC-inspired scoring.

It must never claim to diagnose attacks, hijacks, outages, route leaks, or production routing faults.

B-prompt.txtB starting/meta prompt · 1,466 chars

# B-prompt.txt — Microsoft Copilot Prompt Coach Test

Please act as Microsoft Copilot Prompt Coach.

I want you to improve my rough prompt into a strong prompt for an LLM coding builder.

Do not solve the task.
Do not write code.
Your output should be one improved builder prompt that I can paste into a fresh LLM coding session.

Required input:
- Attached seed file: `Seed.txt`
- Domain context page: https://www.mdlxdcc.org/crp/MDLxDCC.html

Context-access rule:
Before improving the prompt, check whether you can read the domain context page.
If you cannot access/read the MDL×DCC page, do not continue from generic knowledge and do not guess the missing context. Stop and say exactly which file you need attached. The expected attachment is:
- `MDLxDCC.html`

Then wait for the attachment.

Important:
- Preserve my intent.
- Do not over-polish the result into generic project-management language.
- Keep the result specific enough that a coding LLM can actually build RouteSignal Scout v0.1.
- Keep the safety boundary: read-only research prototype, public data only, no production routing claims.
- Keep the MDL×DCC direction: multiple independent signal families, robust baselines, explainable DCC-style promotion, and no black-box anomaly label.

If all required context is available, return:
1. improved builder prompt,
2. what you improved,
3. assumptions you made,
4. remaining risks or missing context,
5. acceptance checklist.

Read attached seed:
Seed.txt

B-result.txtB generated builder prompt · 8,801 chars

You are a senior Python prototype engineer building a read-only research tool.

PROJECT: RouteSignal Scout v0.1
Goal: Build a Python-first, read-only prototype that analyzes public internet routing signals (BGP/RPKI/IPv6-focused) using public data sources (RIPEstat first). The tool must produce cautious, explainable “routing-signal candidates” for operator review — never definitive incident labels.

NON-NEGOTIABLE SAFETY / SCOPE
- Read-only research prototype. Public data only.
- No router configuration, no production routing advice, no “do X in your network” prescriptions.
- Must NOT claim “attack”, “hijack”, “outage”, “route leak”, or assign blame/intent.
- Use safe language: “candidate”, “anomaly candidate”, “high residual”, “worth review”, “data-quality warning”.
- No LLM/API calls inside the tool. No GPU. No Rust.
- v0.1: no live JS dashboard; HTML report may include static images.
- v0.1: no raw MRT parsing; rely on RIPEstat/public APIs.

MDL×DCC DIRECTION (IMPLEMENT AS DESIGN PRINCIPLES, NOT AS BLACK BOX)
Implement the MDL×DCC posture as:
1) Multiple independent signal families (not a single metric).
2) Robust baselines (median/MAD by default; avoid mean/std unless justified).
3) Explainable “DCC-style promotion”: a candidate is promoted only if ≥2 independent signal families agree within the same time window (or compatible adjacent windows), with clear reasoning per promotion.
4) No black-box “anomaly = 1” labeling. Provide decomposed evidence and residuals.

PRIMARY TARGETS (DEFAULTS)
- Start with ASN AS5603 as the default subject.
- Also support AS21283 as a secondary example.
- Support optional explicit prefix list overrides (IPv4/IPv6).

DATA SOURCES (PUBLIC; RIPEstat FIRST)
- Use RIPEstat Data API as the default provider.
- Implement a provider/adapter layer so future sources can be added (but v0.1 should work fully with RIPEstat alone).
- For RIPEstat, retrieve enough data to support:
  a) BGP update activity time series for an ASN and/or its prefixes (counts + timing).
  b) BGP “state-ish” context (visibility/peers/announced space as available).
  c) Prefix and origin/AS-path related fields when available.
  d) RPKI validation signals (valid/invalid/not-found counts or transitions) when available.
  e) IPv4 vs IPv6 related signals (v6 announcement share, v6 update share, or equivalent available indicators).
- Respect rate limits; implement backoff and caching. Every request must be cacheable.

OFFLINE / DEMO / CACHE MUST-HAVES
The tool must run in three modes:
1) Online mode: fetch from RIPEstat (and only public sources).
2) Cache-replay mode: never hit the network; replay previously cached raw responses.
3) Deterministic demo-data mode: ship with a small set of bundled demo responses so that a fully offline user can still generate ALL outputs (HTML report, CSV, JSON, manifest) deterministically.

Caching rules:
- Save raw API responses (exact bytes/text) plus parsed summaries.
- Use a stable cache key that includes endpoint + sorted params + date range.
- Maintain a manifest (JSON) mapping each logical request to cache filename, timestamp, and hash.
- If API fails or internet is unavailable, auto-fallback to cache-replay if possible; otherwise fallback to demo mode (with a clear banner in outputs that demo data was used).

CORE ANALYSIS REQUIREMENTS (NO OVERCLAIMING)
Time windowing:
- Analyze within a user-provided time range (default: last N days) and a bucket size (default: 1h or 1d depending on range).
- All computations must note missing buckets and data gaps.

Signal families (minimum set for v0.1):
You must implement at least FOUR distinct families, and promotion requires agreement from ≥2 families.
Example families to implement (pick at least 4 and keep them independent):
F1) Update dynamics: robust residuals on update counts per bucket; burstiness; sustained elevation.
F2) Path/origin churn: changes in origin AS, AS-path length distribution shifts, or route-set churn where available.
F3) Prefix set changes: new/withdrawn prefixes, changes in announced space, or related RIPEstat-derived deltas.
F4) RPKI validation shifts: invalid/not-found/valid count changes or ratio changes; appearance of invalids.
F5) Visibility/peer changes: sudden changes in visibility metrics (peers/collectors) where available.
F6) IPv6 skew: unusual shifts in v6 vs v4 activity or presence.

MDL-ish / compression sensors (required, but keep lightweight and explainable):
- Build a symbolic event stream per bucket (or per event where feasible) using a documented alphabet such as:
  A/W for announce/withdraw; 4/6 for IP family; P for prefix-set change; S for state snapshot marker; U for unknown/other.
- Compute simple compressibility/complexity features over the symbol stream and/or selected categorical sequences per window:
  - zlib compression ratio (compressed_len / raw_len)
  - Shannon entropy over symbols
  - optionally an LZ-style proxy (e.g., dictionary size estimate) if feasible without heavy code
- Use these as an additional signal family or as supporting evidence, but DO NOT let them dominate.

Robust baselines:
- Use median and MAD to compute robust z-scores per feature.
- Avoid simplistic “sort by update count and call it anomaly”. Each candidate must show residual reasoning vs baseline.

DCC-style promotion ladder (explainable):
Define a small ladder such as:
- Observation → Residual flagged (per-family) → Candidate (single family) → Promoted candidate (≥2 families agree) → Review note (human)
For each promoted candidate window, generate an explanation:
- Which families triggered, each family’s robust score, and the raw feature values vs baseline.
- Include a “data quality” assessment alongside (missing data, API partial, cache/demo).

OUTPUTS (MUST PRODUCE ALL OF THESE)
Create these artifacts per run:
1) HTML report (static): summary, parameters, data provenance, data-quality warnings, candidate table, per-candidate explanations, and simple plots (static images ok).
2) CSV of events/candidates: one row per candidate window with per-family scores and promotion status.
3) JSON summary: run metadata + top candidates + aggregate stats + data-quality flags.
4) Cache manifest JSON: all requests, cache keys, file hashes, timestamps, mode used (online/cache/demo).

Windows-friendly UX:
- One primary script: routesignal_scout_v0_1.py (prefer a single-file deliverable for v0.1).
- Provide .bat helpers:
  - run_demo.bat (forces demo mode)
  - run_as5603.bat (online mode if available; otherwise fallback)
- Provide a clear CLI interface:
  Examples:
    python routesignal_scout_v0_1.py --asn 5603 --days 14 --out out/as5603
    python routesignal_scout_v0_1.py --asn 21283 --start YYYY-MM-DD --end YYYY-MM-DD --bucket 1h --out out/as21283
    python routesignal_scout_v0_1.py --demo --out out/demo
    python routesignal_scout_v0_1.py --replay-cache --out out/replay
- Include --no-network flag that hard-fails if any network call is attempted (used by tests).

FILES TO CREATE (MINIMUM)
- routesignal_scout_v0_1.py
- README.md (how to run, what it does/doesn’t claim, data sources, limitations)
- requirements.txt (keep dependencies minimal)
- demo_data/ (bundled deterministic API response fixtures + a fixture manifest)
- scripts/run_demo.bat and scripts/run_as5603.bat
- tests/ (smoke tests; can be simple Python scripts if you avoid pytest)

ACCEPTANCE / SMOKE TESTS (REQUIRED)
Provide automated checks that:
- Demo mode (fully offline) generates HTML + CSV + JSON + manifest deterministically (stable hashes or stable row counts + key fields).
- Replay-cache mode generates outputs without network access.
- Online mode caches responses and can later replay them.
- The tool never uses forbidden language in outputs (“attack”, “hijack”, “outage”, “route leak”). Add a test that scans output text for these tokens.
- Promotion rule is enforced: no “promoted candidate” unless ≥2 families triggered.

FAILURE HANDLING
- If RIPEstat errors, rate-limits, or returns partial data:
  - record the failure in data-quality flags,
  - backoff and retry within reason,
  - then fallback to cache; if cache insufficient, fallback to demo mode,
  - and ensure outputs still render with a clear “data may be incomplete / demo” banner.

IMPLEMENTATION STYLE (HOW YOU SHOULD WORK)
- Do not write any code in your planning response until you are ready to generate the actual files.
- First, outline the architecture, data model, cache keying, and the four+ signal families clearly.
- Then implement incrementally with small, testable functions.
- Keep everything explainable and logged: show how each candidate was produced.

Deliverable: produce the complete repository content (all files listed), with clear instructions in README.md, and ensure the script runs on Windows with Python 3.11+.

C-prompt.txtC starting/meta prompt · 1,878 chars

# C-prompt.txt — MS Copilot GPT-5.5 Think + RHPr/RHP Workflow Test

You are Microsoft Copilot GPT-5.5 Think.

Use the following BD protocols and domain page as active method/context:

LLM prompting / RHPr:
https://www.mdlxdcc.org/BD/BD_AIM3_RHPr.html

AI brainstorming / RHP:
https://www.mdlxdcc.org/BD/bd_aim3_rhp

Domain context / MDL×DCC:
https://www.mdlxdcc.org/crp/MDLxDCC.html

Required input:
- Attached seed file: `Seed.txt`
- RHPr page above
- RHP page above
- MDL×DCC page above

Context-access rule:
Before improving the prompt, check whether you can read all three pages.
If you cannot access/read any required page, do not continue with a fallback summary, do not guess the missing context, and do not proceed from generic knowledge. Stop and say exactly which file or files you need attached.

Expected attachment names:
- for RHPr: `../BD/BD_AIM3_RHPr.html`
- for RHP: `../BD/BD_AIM3_RHP.html`
- for MDL×DCC: `MDLxDCC.html`

Then wait for the attachment(s).

Task:
Improve the raw RouteSignal Scout seed in the attached file into a strong coding-builder prompt.

Do not write code.
Do not solve the task.
Your job is only to create the best possible builder prompt for a future LLM coding session.

Required process:
1. Use RHPr to improve the prompt: clarify the seed, recover missing context, identify blind spots, define boundaries, and make the request build-ready.
2. Use RHP to review/synthesize the prompt: include multiple perspectives, especially engineering, data-quality, safety/no-overclaim, testability, operator-review usefulness, and MDL×DCC scoring discipline.
3. Compress the result into one practical builder prompt that a coding LLM can follow without becoming lost in theory.

Return:
1. final RHPr/RHP builder prompt,
2. key blind spots recovered,
3. acceptance checklist,
4. remaining risks,
5. why this is better than the raw seed.

Read attached seed:
Seed.txt

C-result.txtC generated builder prompt · 17,291 chars

You are an expert Python coding builder, data-quality engineer, and cautious network-observability prototype designer.

Your task is to build RouteSignal Scout v0.1: a small Python-first, read-only research prototype for analyzing public internet routing signals, mainly BGP/RPKI/IPv6-style signals, using public data sources first.

Important: Do not claim to detect attacks, hijacks, outages, route leaks, or operational incidents. This tool only surfaces routing-signal candidates, anomaly candidates, high-residual windows, data-quality warnings, and items worth human/operator review.

Do not write production routing advice. Do not configure routers. Do not require privileged data. Do not use private telemetry. Do not call any LLM/API from inside the tool.

========================
A. Builder role
========================

Act as:
1. Python-first prototype builder.
2. Network measurement / BGP data-quality skeptic.
3. MDL×DCC-inspired scoring implementer.
4. Test-focused engineer.
5. Safety/no-overclaim reviewer.

You must produce a practical, runnable v0.1 prototype, not a research essay and not a production NOC tool.

Use simple, readable Python. Prefer the standard library plus widely available packages only if truly useful. Keep the design Windows-friendly.

Do not over-engineer. Build one robust script first.

========================
B. Goal
========================

Build RouteSignal Scout v0.1 as a read-only public-data prototype that:

1. Pulls public routing-related signals, especially from RIPEstat, for one or more ASNs/prefixes.
2. Starts with AS5603 by default, with optional support for AS21283 and user-supplied prefixes/ASNs.
3. Looks at BGP update activity, BGP update windows, BGP state/snapshot-like data where available, RPKI/ROA validity where available, and IPv4/IPv6-relevant signals where available.
4. Saves raw API responses into a local cache.
5. Can run fully without internet using deterministic demo data and/or offline cache replay.
6. Produces all output artifacts in both online and offline/demo mode:
   - HTML report
   - CSV candidates/events
   - JSON summary
   - manifest file with run metadata, data sources, parameters, and warnings
7. Uses multiple independent signal families rather than simply sorting by update count.
8. Promotes a candidate only when at least two independent signal families agree.
9. Explains every candidate score in plain language.
10. Clearly reports data/source quality and missing-data limitations.

========================
C. Non-goals and hard constraints
========================

Hard constraints:

- Read-only research prototype only.
- Public data only.
- No router configuration.
- No production routing advice.
- No claim that anything is an attack, hijack, outage, route leak, or confirmed incident.
- No GPU.
- No Rust.
- No live JavaScript dashboard in v0.1.
- No raw MRT parser in v0.1.
- No LLM/API calls inside the tool.
- No black-box anomaly score.
- No single-metric anomaly detector.
- Do not simply sort by update count and call the top entries anomalies.
- If internet/API fails, deterministic demo-data mode must still generate every required output file.
- Prefer clarity and inspectability over sophistication.

Use cautious language everywhere:
- “routing-signal candidate”
- “anomaly candidate”
- “high residual”
- “worth operator review”
- “data-quality warning”
- “insufficient data”
- “source unavailable”
- “candidate promoted because multiple signal families agreed”

Never use:
- “attack detected”
- “hijack detected”
- “route leak detected”
- “outage detected”
- “confirmed incident”
- “production recommendation”
- “reroute”
- “mitigation instruction”

========================
D. Files to create
========================

Create a compact project with these files:

1. routesignal_scout_v0_1.py
   - Main runnable script.
   - Contains CLI, data fetching, cache handling, demo-data generation, signal extraction, scoring, and report generation.
   - Keep it readable and modular inside one file.

2. README.md
   - What the tool is.
   - What it is not.
   - How to run online, offline replay, and demo mode.
   - Data-source caveats.
   - Safety/no-overclaim language.

3. run_demo.bat
   - Runs deterministic demo mode.
   - Must work on Windows from a normal command prompt.

4. run_as5603.bat
   - Runs an AS5603 online test if internet is available.
   - If online fails, the tool should explain the failure and suggest demo/cache mode, but the script itself should not crash obscurely.

5. Optional but useful:
   - requirements.txt, only if non-stdlib packages are used.
   - tests or smoke-test commands documented in README.

The tool must create an output directory, for example:

outputs/
  routesignal_report.html
  routesignal_candidates.csv
  routesignal_summary.json
  routesignal_manifest.json
  cache/
    raw_*.json

Exact names may be parameterized, but the default run must generate predictable names.

========================
E. CLI requirements
========================

Implement a clear CLI. Suggested examples:

Demo mode, no internet required:
python routesignal_scout_v0_1.py --demo --out outputs_demo

Online AS5603:
python routesignal_scout_v0_1.py --asn AS5603 --days 7 --out outputs_as5603

Online AS5603 plus another ASN:
python routesignal_scout_v0_1.py --asn AS5603 --asn AS21283 --days 7 --out outputs_multi

Prefix-focused run:
python routesignal_scout_v0_1.py --prefix 193.2.0.0/16 --days 7 --out outputs_prefix

Offline/cache replay:
python routesignal_scout_v0_1.py --offline --cache outputs_as5603/cache --out outputs_replay

Useful options:
--demo
--offline
--cache PATH
--out PATH
--asn ASN, repeatable
--prefix PREFIX, repeatable
--days N
--start YYYY-MM-DD
--end YYYY-MM-DD
--max-api-pages N, if pagination is needed
--timeout SECONDS
--verbose
--fail-soft, default true

Do not require all options for v0.1, but implement enough to support demo, online AS5603, offline/cache replay, and custom ASN/prefix inputs.

========================
F. Public data-source requirements
========================

Use public sources first, especially RIPEstat.

Prefer RIPEstat endpoints that can support these signal families where practical:

1. BGP update activity
   - Update counts by time window.
   - Announcements vs withdrawals if available.
   - Per-prefix or per-ASN where available.

2. BGP state / routing table style snapshots
   - Visible prefixes or origin information where available.
   - Prefix-origin observations.
   - Peer/source coverage if available.

3. RPKI/ROA validity or related status
   - Valid/invalid/not-found/unknown where available.
   - Treat missing RPKI data as a data-quality state, not as evidence.

4. IPv4/IPv6 distinction
   - Separate signal summaries for IPv4 and IPv6 where possible.
   - If the source does not provide clean separation, say so.

5. Metadata and coverage
   - Source endpoint name.
   - Query parameters.
   - Time range.
   - Response timestamp if available.
   - Any source errors or missing fields.

Implementation rule:
- Save raw API responses before transforming them.
- If an endpoint fails, record the failure in the manifest and continue with available data.
- If all online data fails, fall back to demo mode only if --demo was requested; otherwise produce a clear error and no false report.
- In demo mode, generate deterministic synthetic data with a fixed seed and clearly label it as demo/synthetic in every output.

Important: If the exact RIPEstat endpoint names or response shapes differ from expectation, inspect responses defensively. Do not assume fields exist. Preserve raw JSON and use robust parsing with warnings.

========================
G. Demo/offline/cache requirements
========================

Demo mode:
- Must not require internet.
- Must generate all output files.
- Must use deterministic fixed data.
- Must include at least:
  - a normal baseline period,
  - one candidate window with elevated updates,
  - one candidate window where updates alone are high but other signal families do not agree, so it is NOT promoted,
  - one data-quality warning example,
  - IPv4 and IPv6 examples if practical.

Offline/cache replay:
- Must read previously saved raw JSON responses from cache.
- Must regenerate report/CSV/JSON/manifest without network access.
- Must label the run as offline/cache replay.

Cache:
- Raw response files should be named predictably and include endpoint, target, and time info where practical.
- Manifest must list cache files used.

========================
H. MDL×DCC-inspired scoring requirements
========================

Use the MDL×DCC idea as engineering discipline, not as mystical language.

Core scoring principle:
- MDL side: prefer compact explanations of structure; use residuals against robust baselines.
- DCC side: promote only when multiple independent signal families agree; treat scoring as adaptive governance over evidence, not one static metric.

Implement multiple independent signal families. At minimum include:

1. Volume residual family
   - BGP update count residual vs robust baseline.
   - Use median and MAD or another robust baseline.
   - Explain if the baseline is too short or too sparse.

2. Event-symbol stream family
   - Convert observed events into a simple symbolic stream.
   - Suggested symbols:
     A = announcement-like event
     W = withdrawal-like event
     4 = IPv4-related signal
     6 = IPv6-related signal
     P = prefix-related change
     S = state/snapshot change
     U = unknown/uncategorized update
   - The exact mapping may be limited by available data; document it.

3. Compression/entropy/process family
   - Compute simple explainable metrics such as:
     - Shannon entropy of symbol stream
     - zlib compression ratio of symbol stream
     - simple LZ-like complexity approximation if feasible
     - transition diversity
   - Do not overclaim these metrics. Use them as supporting signals only.

4. Source/data-quality family
   - Missing endpoint data.
   - Sparse peer/source coverage if available.
   - Missing RPKI data.
   - API errors.
   - Time-window gaps.
   - Demo/synthetic data flag.

5. Optional RPKI/origin-consistency family
   - If data supports it, report changes in observed origin/RPKI status.
   - Do not infer hijack/leak/outage from this.

Candidate promotion rule:
- A candidate can be promoted only if at least two independent non-quality signal families agree.
- Data-quality warnings can downgrade confidence or add review warnings, but should not by themselves promote a routing candidate.
- A single very high update count is not sufficient.
- Every candidate row must include:
  - target ASN/prefix
  - time window
  - signal families triggered
  - raw metrics
  - robust baseline/residual info
  - promotion status
  - confidence band such as low/medium/high, but only as “review priority,” not truth
  - explanation string
  - data-quality warnings

Suggested score structure:
- family_votes = count of independent signal families triggered
- residual_score = robust normalized residual, capped
- complexity_score = bounded support from entropy/compression/process metrics
- data_quality_penalty = penalty for missing/weak data
- review_priority = low/medium/high
- promoted = family_votes >= 2 and data quality not fatal

Keep the math transparent and small. Put formulas or scoring explanation in README and HTML report.

========================
I. Output/report requirements
========================

HTML report must include:

1. Title and run summary
   - RouteSignal Scout v0.1
   - run mode: demo / online / offline replay
   - targets
   - time range
   - generated timestamp
   - public-data/read-only disclaimer

2. Executive summary
   - number of candidates
   - number promoted for operator review
   - number of data-quality warnings
   - whether data is real public API data or synthetic demo data

3. Safety language
   - “This prototype does not detect attacks, hijacks, outages, or route leaks.”
   - “It surfaces routing-signal candidates for human review.”

4. Data sources
   - endpoint names / source labels
   - cache files
   - failed endpoints
   - missing fields

5. Candidate table
   - target
   - time window
   - promoted yes/no
   - triggered signal families
   - review priority
   - explanation
   - data-quality warnings

6. Metrics section
   - robust baseline method
   - median/MAD residuals
   - symbol-stream metrics
   - compression/entropy metrics
   - source-quality metrics

7. “Why not promoted?” section
   - Include examples where update volume alone was high but not enough.
   - Explain which second signal was missing.

8. Manifest summary
   - output files
   - cache files
   - parameters
   - warnings

CSV candidates/events:
- One row per candidate/event window.
- Include raw metrics and explanation columns.
- Use stable column names.

JSON summary:
- Machine-readable run summary.
- Include targets, time range, candidates, warnings, output file paths.

Manifest:
- Run ID.
- Tool version.
- Python version.
- OS/platform.
- CLI args.
- mode.
- data sources.
- cache files written/read.
- warnings/errors.
- output files.

========================
J. Acceptance and smoke tests
========================

Implement or document smoke tests that a reviewer can run immediately.

Required smoke tests:

1. Demo mode:
Command:
python routesignal_scout_v0_1.py --demo --out outputs_demo

Pass if:
- exits successfully
- creates HTML, CSV, JSON summary, manifest
- report clearly says demo/synthetic
- at least one promoted candidate exists
- at least one high-update but not-promoted example exists
- at least one data-quality warning exists

2. AS5603 online mode:
Command:
python routesignal_scout_v0_1.py --asn AS5603 --days 7 --out outputs_as5603

Pass if internet/API available:
- exits successfully
- saves raw API responses in cache
- creates all output files
- report includes source-quality section
- no overclaim language appears

If internet/API unavailable:
- exits gracefully
- explains which source failed
- does not fabricate online data
- suggests using --demo or --offline with cache

3. Offline replay:
Command:
python routesignal_scout_v0_1.py --offline --cache outputs_as5603/cache --out outputs_replay

Pass if cache exists:
- no network required
- creates all output files
- report says offline/cache replay
- uses cached raw JSON

4. Overclaim language check:
Search output files for banned words/phrases:
- “attack detected”
- “hijack detected”
- “route leak detected”
- “outage detected”
- “confirmed incident”
- “production recommendation”
Pass only if none appear.

5. Single-metric guard:
Create or use demo data where update count spikes but other families do not agree.
Pass only if this example is NOT promoted.

6. Manifest completeness:
Pass if manifest includes:
- version
- mode
- CLI args
- outputs
- sources
- cache files
- warnings/errors

========================
K. Data-quality and failure behavior
========================

The prototype must be failure-tolerant and honest.

If a public API endpoint fails:
- log warning
- write warning to manifest
- continue if enough other data exists
- mark affected metrics as unavailable
- do not silently substitute demo data into an online run

If no online data is available:
- do not create a misleading report
- create a clear failure manifest if useful
- tell the operator to run --demo or --offline with cache

If data is sparse:
- report “insufficient data for robust baseline”
- avoid promoted candidates unless at least two independent signal families still support them
- reduce review priority

If fields are missing:
- preserve raw JSON
- record missing fields
- continue with available fields

If demo mode is used:
- label every output as synthetic/demo
- never mix demo and real data unless explicitly requested, and if mixed, label each record source

========================
L. Builder workflow
========================

Before coding:
1. Restate the intended v0.1 scope in 5–8 bullets.
2. Identify any assumptions you must make about RIPEstat endpoints and response shapes.
3. Write a short build plan.
4. Then implement.

While coding:
1. Keep the main script readable.
2. Use small functions:
   - parse_args
   - fetch_public_data
   - load_cache
   - save_cache
   - generate_demo_data
   - normalize_events
   - compute_baselines
   - compute_symbol_metrics
   - score_candidates
   - write_html_report
   - write_csv
   - write_json
   - write_manifest
3. Avoid hidden magic.
4. Use defensive parsing.
5. Keep all claims in the report tied to observed public data or demo data.

After coding:
1. Run or describe the smoke tests.
2. Show the generated file list.
3. Summarize known limitations.
4. Confirm no banned overclaim language is emitted.
5. Confirm the single-metric guard is working.

========================
M. Final deliverable from you, the coding builder
========================

Return:
1. The created files.
2. A short explanation of how to run:
   - demo
   - AS5603 online
   - offline replay
3. Acceptance-test evidence.
4. Known limitations.
5. Any endpoint assumptions made.

Do not solve routing incidents.
Do not provide operational routing advice.
Do not claim detection.
Build the prototype and make the evidence reviewable.

D-prompt.txtD starting/meta prompt · 3,733 chars

# D-prompt.txt — Fresh BD × GPT RHPr/RHP Prompt Generation Test

You are ChatGPT Pro / GPT-5.5 Pro acting as BD's prompt-design partner.

This is method D in the RouteSignal Scout A/B/C/D prompt test.
Do not use the old existing hybrid prompt as the final answer. Generate a fresh builder prompt from the same raw seed, using the same broad RHPr/RHP method that BD and GPT used before.

Required input:
- Attached seed file: `Seed.txt`
- RHPr page: https://www.mdlxdcc.org/BD/BD_AIM3_RHPr.html
- RHP page: https://www.mdlxdcc.org/BD/bd_aim3_rhp
- MDL×DCC page: https://www.mdlxdcc.org/crp/MDLxDCC.html

Optional reference input:
- The previous hybrid reference prompt, if attached, e.g. `RouteSignal_Scout_v0_1_HYBRID_BUILDER_PROMPT.md`.

Reference rule:
If the old hybrid prompt is attached, use it only as a reference for quality bar, known pitfalls, and useful structure. Do not copy it. Do not treat it as the D result. The D result must be a fresh prompt generated from `Seed.txt` after applying RHPr/RHP reasoning.

Context-access rule:
Before improving the prompt, check whether you can read all required pages or equivalent attached HTML files.
If you cannot access/read any required page, do not continue with a fallback summary, do not guess the missing context, and do not proceed from generic knowledge. Stop and say exactly which file or files you need attached.

Expected attachment names:
- for RHPr: `../BD/BD_AIM3_RHPr.html`
- for RHP: `../BD/BD_AIM3_RHP.html`
- for MDL×DCC: `MDLxDCC.html`

If the old hybrid reference is not attached, you may still continue once the three required context pages and `Seed.txt` are available. State that no old reference was used.

Task:
Create a fresh, high-quality builder prompt for a future LLM coding session that will build RouteSignal Scout v0.1.

Do not write code.
Do not solve/build RouteSignal Scout.
Your job is only to create the final builder prompt.

Required process:
1. Read `Seed.txt` and compress the real intent.
2. Use RHPr to recover missing context, assumptions, boundaries, and prompt structure.
3. Use RHP to review the prompt through multiple lenses: builder, data-quality reviewer, routing-domain skeptic, safety/no-overclaim reviewer, MDL×DCC scorer, test designer, and practical operator-review user.
4. If the previous hybrid reference is attached, compare against it and keep only improvements that make the new prompt stronger, clearer, safer, or more buildable.
5. Produce one final builder prompt that is practical enough to paste into a fresh coding session.

The final builder prompt must preserve these RouteSignal requirements:
- Python-first read-only research prototype.
- Public routing data only, RIPEstat first.
- No router configuration and no production routing advice.
- No factual claims of attack, hijack, outage, misconfiguration, or route leak.
- Deterministic demo-data mode.
- Offline/cache replay.
- HTML report, CSV events/candidates, JSON summary, manifest, and cache directory.
- Windows-friendly `.bat` runs.
- Multiple independent signal families.
- Robust baselines such as median/MAD.
- Symbolic event streams where useful, e.g. A/W/4/6/P/S/U.
- LZ/entropy/zlib/process-style metrics where useful.
- DCC-style promotion only when at least two independent signal families agree.
- Explainable scoring, not black-box anomaly labels.
- Acceptance/smoke tests.
- v0.1 scope discipline: no GPU, no Rust, no raw MRT parser, no live JS dashboard, no LLM/API calls inside the tool.

Return:
1. final fresh BD × GPT RHPr/RHP builder prompt,
2. what changed versus the raw seed,
3. key blind spots recovered,
4. assumptions made,
5. acceptance checklist,
6. optional notes from the old hybrid reference if it was attached.

Read attached seed:
Seed.txt

D-result.txtD generated builder prompt · 18,860 chars

You are ChatGPT Pro / GPT-5.5 Pro acting as BD’s senior Python coding builder, routing-data skeptic, MDL×DCC implementer, and safety/no-overclaim reviewer.

Build RouteSignal Scout v0.1.

This is a Python-first, read-only research prototype for public internet routing-signal analysis. It must use public data only, with RIPEstat first. It is NOT a production routing tool, NOT an incident detector, NOT a mitigation tool, and NOT router-configuration advice.

Your job in this session:
1. Build the v0.1 tool.
2. Run the smoke/acceptance tests you can run in this environment.
3. Return files created, commands run, outputs generated, test results, and what remains weak.

Do not make production routing claims.
Do not say the tool detects attacks, hijacks, outages, misconfigurations, or route leaks.
The safest label is: “routing-signal candidate worth operator review.”

────────────────────────────────────────
A. Core goal
────────────────────────────────────────

Create RouteSignal Scout v0.1: a small but robust Python prototype that can:

- analyze public routing data from RIPEstat;
- start with AS5603 as the primary target;
- optionally include AS21283 and user-provided ASNs/prefixes;
- inspect BGP update activity, BGP updates, BGP state snapshots, prefix discovery, IPv4/IPv6 mix, and RPKI origin-validation state where available;
- convert raw observations into explainable signal families;
- use MDL×DCC-style promotion: a candidate is promoted only when at least two independent signal families agree;
- produce static outputs that are useful for a human/operator-style review;
- work fully without internet via deterministic demo-data mode and offline/cache replay.

Definition of success:
A user on Windows can run a demo command with no internet and still get:
- HTML report
- CSV events
- CSV candidates
- JSON summary
- JSON manifest
- cache/output directories
- visible data-quality section
- no overclaiming language

────────────────────────────────────────
B. Hard scope boundaries
────────────────────────────────────────

Allowed:
- Read-only public-data research prototype.
- RIPEstat Data API.
- Local deterministic demo data.
- Local cache replay.
- Static HTML report.
- CSV/JSON outputs.
- Windows .bat launchers.
- Simple Python standard-library implementation, with tiny optional dependencies only if truly needed.

Forbidden in v0.1:
- no router configuration;
- no router commands;
- no mitigation advice;
- no production routing recommendations;
- no GPU;
- no Rust;
- no raw MRT parser;
- no live JavaScript dashboard;
- no LLM/API calls inside the tool;
- no paid/private datasets;
- no “attack/hijack/outage/route leak/misconfiguration” factual claims;
- no black-box anomaly label;
- no sorting only by update count and calling that an anomaly.

Important wording:
Use:
- routing-signal candidate
- anomaly candidate
- high residual
- source-quality warning
- data-quality warning
- worth operator review
- possible routing-state change candidate
- public-data signal, not a diagnosis

Do not use as factual labels:
- attack
- hijack
- outage
- route leak
- malicious
- compromise
- confirmed misconfiguration
- incident

It is okay to mention these banned terms only in the safety/disclaimer section as examples of what the tool does NOT claim.

────────────────────────────────────────
C. Files to create
────────────────────────────────────────

Create these files unless there is a strong reason to simplify:

1. routesignal_scout_v0_1.py
   - Main script.
   - Prefer standard library: argparse, urllib.request, json, csv, statistics, datetime, pathlib, hashlib, gzip/zlib, html, traceback.
   - If using requests, add requirements.txt and keep fallback/simple install instructions.

2. README_RouteSignal_Scout_v0_1.md
   - What it is.
   - What it is not.
   - Install/run instructions.
   - Example commands.
   - Output explanation.
   - Safety/no-overclaim wording.
   - Known limitations.

3. run_demo_v0_1.bat
   - Must work without internet.
   - Must generate all outputs.

4. run_as5603_live_v0_1.bat
   - Attempts RIPEstat public-data run for AS5603.
   - Must still handle API failure safely.

5. run_offline_replay_v0_1.bat
   - Replays from cache if available.
   - If no cache exists, prints clear message and exits cleanly or falls back to demo only if explicitly configured.

6. run_smoke_tests_v0_1.bat
   - Runs demo mode and self-tests.
   - Prints clear pass/fail.

7. Optional but preferred:
   - test_routesignal_scout_v0_1.py or built-in --self-test mode.
   - requirements.txt only if needed.

Suggested output tree:

outputs/
  demo_<timestamp>/
    report.html
    events.csv
    candidates.csv
    summary.json
    manifest.json
    runlog.txt

cache/
  ripestat/
    <endpoint>/
      <safe_resource>_<hash>.json

────────────────────────────────────────
D. Required CLI
────────────────────────────────────────

Implement an argparse CLI with clear defaults.

Required modes:

1. Demo mode:
   python routesignal_scout_v0_1.py --demo --out outputs\demo_run --seed 42

2. Live RIPEstat mode:
   python routesignal_scout_v0_1.py --asn AS5603 --window-hours 48 --out outputs\as5603_live --cache cache

3. Secondary ASN:
   python routesignal_scout_v0_1.py --asn AS21283 --window-hours 48 --out outputs\as21283_live --cache cache

4. Prefix mode:
   python routesignal_scout_v0_1.py --prefix 193.0.0.0/21 --origin-as AS3333 --window-hours 48 --out outputs\prefix_live --cache cache

5. Offline replay:
   python routesignal_scout_v0_1.py --asn AS5603 --offline --cache cache --out outputs\as5603_replay

6. Self-test:
   python routesignal_scout_v0_1.py --self-test

Useful CLI arguments:
- --asn
- --prefix
- --origin-as
- --start
- --end
- --window-hours
- --max-samples
- --cache
- --offline
- --demo
- --demo-fallback
- --seed
- --out
- --timeout-sec
- --verbose
- --self-test

Defaults:
- Default live target: AS5603 if no ASN/prefix is specified.
- Default window: last 48 hours UTC.
- Default max samples: reasonable, e.g. 50 or 96.
- Default cache policy: live fetch with cache write; use cache only if --offline or if live fails and cached response exists.
- Never silently mix live/cache/demo data without manifest provenance.

────────────────────────────────────────
E. RIPEstat data-source requirements
────────────────────────────────────────

Use RIPEstat first. Implement polite HTTP fetching with timeout, retries/backoff, and raw JSON caching.

Base URL:
https://stat.ripe.net

Minimum endpoint candidates:

1. /data/bgp-update-activity/data.json
   Purpose:
   - aggregated BGP update counts over time;
   - good for volume/timeline baselines;
   - accepts ASN, prefix, or IP range resource;
   - supports starttime/endtime/max_samples.

2. /data/bgp-updates/data.json
   Purpose:
   - observed BGP updates for a resource over time;
   - use for announcement/withdrawal symbol stream where fields are available.
   Note:
   - RIPEstat docs indicate this dataset is indexed only after January 2024 at time of writing.
   - If user asks for earlier time ranges, create a data-quality warning instead of pretending coverage exists.

3. /data/bgp-state/data.json
   Purpose:
   - BGP route state snapshot at a timestamp;
   - use to compare state across start/end timestamps if practical.

4. /data/announced-prefixes/data.json
   Purpose:
   - discover prefixes announced by an ASN over a time range.

5. /data/ris-prefixes/data.json
   Purpose:
   - discover originated/transited prefixes from the RIPE RIS perspective;
   - make clear that RIS perspective is not full global truth.

6. /data/rpki-validation/data.json
   Purpose:
   - get RPKI validity state for prefix + ASN pairs.
   Important:
   - This is supportive context only, not proof of incident, fault, or attack.

Optional if easy:
- /data/rpki-history/data.json
- /data/prefix-routing-consistency/data.json
- /data/as-overview/data.json
- /data/prefix-overview/data.json

Endpoint behavior rules:
- Save every raw successful API response to cache before analysis.
- Also save failed fetch metadata in manifest/runlog.
- Record endpoint, params, final URL, HTTP status, fetched_at UTC, cache_hit, cache_path, error message if any.
- For ASNs, handle AS5603 and 5603 forms robustly.
- For prefix/RPKI calls, do not invent prefixes. Discover them from announced-prefixes / ris-prefixes or use user-provided prefixes.
- If an endpoint response is empty, missing fields, partial, or outside coverage, mark source-quality warning.

────────────────────────────────────────
F. Demo, offline, cache, and failure behavior
────────────────────────────────────────

Demo mode:
- Must require no internet.
- Must be deterministic with --seed.
- Must generate realistic-looking but clearly synthetic routing-signal events.
- Must include both normal background and at least one planted “routing-signal candidate” so the report/candidate path is exercised.
- Must label every synthetic record as source_mode=demo.
- Must never imply demo records are real.

Offline mode:
- Must not call the internet.
- Must read only cached API responses.
- If needed cache is missing, fail cleanly with a clear message OR use demo fallback only if --demo-fallback is set.
- Do not pretend offline missing data means “no events.”

Live/API failure:
When live API fails:
1. log exact endpoint and error;
2. try cached response if allowed;
3. if no cache and --demo-fallback is set, generate demo outputs and mark them clearly;
4. if no cache and no demo fallback, still write manifest/runlog explaining failure if possible;
5. never return a confident analysis from missing data.

Rerun/resume:
- Safe reruns should not destroy previous outputs unless --overwrite is explicitly implemented.
- Prefer timestamped output folders if out folder exists.
- Cache reuse is encouraged.
- Manifest must show whether each data source came from live, cache, or demo.

────────────────────────────────────────
G. MDL×DCC scoring requirements
────────────────────────────────────────

Do not build a black-box detector.
Build an explainable scoring system with multiple independent signal families.

Minimum signal families:

Family 1 — Volume / robust residual
- update_count per bin
- announcement_count where available
- withdrawal_count where available
- robust z-score using median/MAD
- never enough alone for promotion

Family 2 — Composition / mix shift
- withdrawal/announcement ratio
- IPv4 vs IPv6 mix
- prefix-count change
- origin/prefix visibility change where available
- peer/collector visibility change if exposed by source fields

Family 3 — Symbolic stream complexity
- Convert observations into symbolic tokens where possible:
  A = announcement-like event
  W = withdrawal-like event
  4 = IPv4-related event
  6 = IPv6-related event
  P = prefix/path/origin/state-change-like event
  S = snapshot/state signal
  U = unknown/uncategorized/data-quality token
- Compute simple LZ76-style complexity or approximate phrase count.
- Compute Shannon entropy.
- Compute zlib compression ratio.
- Compute simple n-gram novelty or run-length statistics if easy.

Family 4 — Process / burst / temporal shape
- burstiness
- run lengths
- inter-arrival or bin-to-bin change
- local change candidate versus robust baseline
- no formal changepoint overclaim unless very clearly implemented as heuristic

Family 5 — RPKI / state consistency context
- RPKI valid/invalid/not_found/unknown where available.
- Snapshot state deltas from bgp-state if available.
- This family supports review but cannot alone prove anything.

Family 6 — Data/source quality
- missing endpoint
- empty response
- partial response
- stale cache
- demo data
- too few bins
- too few events
- limited RIS visibility
- outside indexed range
- This is NOT an anomaly family; it is a confidence modifier and warning family.

DCC-style promotion rule:
- A row/window/resource can become a “routing-signal candidate” only if at least two independent non-quality signal families support it.
- Update count alone may produce “watch” or “high volume residual,” but not “candidate.”
- Data-quality problems reduce confidence or add warnings; they must not inflate candidate score.
- Keep family weights explicit and configurable in constants or CLI.
- Every candidate must include:
  - support_family_count
  - support_families
  - top_features
  - observed values
  - baseline values
  - residuals/z-scores
  - confidence
  - data-quality flags
  - explanation text

Suggested scoring labels:
- background
- watch
- routing-signal candidate
- high-review candidate
- data-quality warning

Do not call anything:
- confirmed anomaly
- detected attack
- hijack
- outage
- route leak
- misconfiguration

────────────────────────────────────────
H. Output/report requirements
────────────────────────────────────────

Create these outputs for every successful demo/live/replay run:

1. report.html
Static, self-contained HTML. No live JS dashboard. Small inline JS for table sorting is acceptable only if self-contained, but plain static is preferred.

Report sections:
- Title: RouteSignal Scout v0.1
- Scope and no-overclaim banner
- Run mode: live/cache/offline/demo
- Target resources
- Time window
- Data sources and endpoint-quality table
- Cache/provenance table
- Summary metrics
- Candidate table
- Events/timeline table
- Signal-family explanation
- Top candidate explanations
- Data-quality warnings
- Limitations
- Next safe operator-review questions
- Manifest link/path

The report must be useful even when there are no candidates.

2. events.csv
Suggested columns:
- run_id
- source_mode
- resource
- resource_type
- time_bin_start
- time_bin_end
- endpoint
- event_type
- symbol
- update_count
- announce_count
- withdraw_count
- ipv4_count
- ipv6_count
- prefix
- origin_as
- rpki_status
- raw_ref
- data_quality_flags

3. candidates.csv
Suggested columns:
- run_id
- resource
- time_bin_start
- time_bin_end
- label
- dcc_score
- confidence
- support_family_count
- support_families
- volume_z
- symbolic_score
- entropy
- lz_score
- zlib_ratio
- process_score
- rpki_context
- data_quality_flags
- explanation

4. summary.json
Include:
- version
- run_id
- mode
- targets
- window
- counts
- top_candidates
- data_quality_summary
- scoring_config

5. manifest.json
Include:
- tool version
- run_id
- created_at UTC
- CLI args
- Python version
- platform
- output paths
- cache paths
- endpoints called
- live/cache/demo provenance per source
- errors/warnings
- hashes of output files if easy

6. runlog.txt
Human-readable fetch/analyze/report log.

────────────────────────────────────────
I. Acceptance and smoke tests
────────────────────────────────────────

Implement tests as built-in --self-test, separate test file, or both.

Minimum tests:

1. Demo generation test
Command:
python routesignal_scout_v0_1.py --demo --seed 42 --out outputs\test_demo

Pass:
- report.html exists
- events.csv exists and non-empty
- candidates.csv exists
- summary.json valid JSON
- manifest.json valid JSON
- manifest says mode=demo
- report clearly says data is synthetic/demo
- at least one planted candidate appears in demo outputs

2. Determinism test
Run demo twice with same seed.
Pass:
- same candidate count
- same top candidate resource/window/label
- stable summary values, ignoring timestamps/run_id/output paths

3. Offline behavior test
Command:
python routesignal_scout_v0_1.py --asn AS5603 --offline --cache cache --out outputs\test_offline

Pass:
- no internet fetch attempted
- missing cache produces clear message and manifest/runlog if possible
- cached data, if present, is used and marked source_mode=cache

4. Metric unit tests
Pass:
- median/MAD handles zeros and tiny samples
- robust_z does not divide by zero
- entropy handles empty and constant streams
- LZ approximation handles empty and repeated streams
- zlib ratio handles empty and repeated streams
- symbolization maps unknown fields to U safely

5. Promotion-rule test
Pass:
- update-count-only spike does NOT become “routing-signal candidate”
- two-family support DOES become candidate
- data-quality warning alone does NOT become candidate
- RPKI context alone does NOT become candidate

6. No-overclaim report test
Pass:
- candidate labels/headlines do not contain banned factual-claim words:
  attack, hijack, outage, route leak, malicious, compromise, confirmed incident, confirmed misconfiguration
- safety/disclaimer section may mention them only as things not claimed

7. Live smoke test
Command:
python routesignal_scout_v0_1.py --asn AS5603 --window-hours 6 --out outputs\test_as5603 --cache cache

Pass:
- if internet works: fetches at least one RIPEstat endpoint, writes outputs
- if internet fails: logs failure and exits cleanly, or uses explicit demo fallback only when requested
- never silently claims “no candidates” when data was missing

────────────────────────────────────────
J. Implementation guidance
────────────────────────────────────────

Before coding, do a short plan through these lenses:
- Python builder: simplest robust architecture.
- Routing-data skeptic: what RIPEstat can and cannot prove.
- Safety/no-overclaim reviewer: wording guardrails.
- MDL×DCC scorer: independent signal families and promotion rule.
- Test designer: demo, offline, deterministic, failure-mode tests.
- Practical operator-review user: report readability and useful next questions.

Then build.

Architecture suggestion:
- Keep it mostly one script for v0.1.
- Use functions/classes internally:
  - parse_args()
  - utc_now()
  - safe_resource_name()
  - fetch_ripe_json()
  - cache_get/cache_put()
  - generate_demo_data()
  - normalize_ripestat_response()
  - make_symbol_stream()
  - robust_stats()
  - entropy()
  - lz_complexity()
  - zlib_ratio()
  - score_signal_families()
  - promote_candidates()
  - write_events_csv()
  - write_candidates_csv()
  - write_summary_json()
  - write_manifest_json()
  - write_report_html()
  - run_self_tests()

Keep source normalization defensive. RIPEstat response shapes can vary by endpoint.
Do not crash on missing optional fields. Use data-quality flags.

Report style:
- Clean dark-theme HTML is preferred.
- Tables should be readable on Windows/browser.
- Avoid external assets.
- The most important report line:
  “This is a public-data routing-signal scout. It produces review candidates, not operational diagnoses.”

────────────────────────────────────────
K. Final response required from builder
────────────────────────────────────────

When finished, return:

1. Files created/modified.
2. Commands run.
3. Test results, with pass/fail.
4. Output folder paths.
5. Example snippets from summary.json and candidates.csv.
6. What remains weak in v0.1.
7. Suggested v0.2 upgrades.

Do not hide failures. If something did not run, say exactly why.
If internet was unavailable, say so and show demo/offline evidence instead.

E-reference.txtE reference hybrid builder prompt · 22,945 chars

# RouteSignal Scout v0.1 — Hybrid Builder Prompt v0.2

You are working in the MDL×DCC / AIM³ / Internet-ops project folder.

## Role

You are the coding builder for **RouteSignal Scout v0.1**, a read-only Python prototype for BGP/RPKI/IPv6 routing-signal analysis.

Build a working package. Do not write a theory essay. Do not make production routing claims.

This hybrid prompt combines:

- the **Direct prompt** advantage: shorter, build-first, lower cognitive load;
- the **RHPr/RHP prompt** advantage: stronger scoring discipline, data-quality discipline, kill-tests, expert-review report;
- one important API correction: use **endpoint-specific RIPEstat parameters**, especially `bgp-state` with `timestamp`, not the generic start/end pattern.

## Mission in one sentence

Build a self-contained Python scout that uses public RIPEstat routing data, deterministic demo data, cached/offline replay, CPU workers, MDL-style process metrics, and DCC-style multi-signal gating to rank ASN/prefix/time windows that are structurally unusual and worth human operator review.

The tool answers:

> Which ASN/prefix/time windows look unusual compared with their own baseline and, when available, peer resources — and why?

The tool must **not** answer:

> This is definitely an attack, hijack, outage, misconfiguration, or route leak.

Use careful language everywhere:

- `routing-signal candidate`
- `anomaly candidate`
- `high residual`
- `worth operator review`
- `read-only research prototype`
- `data-quality warning`
- `source-quality note`

Jan Žorž / Jan Zorz is only a future expert critic. Do not require him as a participant for v0.1. The output should be good enough that BD can show the HTML report to Jan and ask for critique in five minutes.

## Priority rule

Prioritize a working v0.1 package over perfect completeness.

If the full prompt is too large to implement in one pass, implement this core first:

1. deterministic demo mode,
2. offline/cache replay,
3. event/window model,
4. metrics,
5. DCC scoring with explanations,
6. HTML/CSV/JSON/manifest outputs,
7. acceptance tests.

Then add RIPEstat real-data fetching and parsing. If real fetching is not possible in the environment, still implement the fetch path defensively and complete demo/offline tests.

Do not stop to ask clarification unless impossible to proceed. Make reasonable assumptions, record them in the manifest, and continue.

## Hard constraints

### Must do

- Python first.
- One robust main script preferred: `routesignal_scout_v0_1.py`.
- CPU workers supported.
- Public data only.
- RIPEstat public data first.
- Cached offline mode required.
- Deterministic demo-data mode required.
- HTML report required.
- CSV, JSON summary, and manifest required.
- Windows-friendly `.bat` files required.
- Standard library preferred.
- Optional dependencies are allowed only if gracefully degraded and not required for acceptance tests.
- If internet is unavailable, `--demo-data` must still produce all output types.
- If real data parsing is partial, continue and report source/data quality.

### Must not do

- No GPU.
- No Rust.
- No live JavaScript dashboard in v0.1.
- No raw MRT parser in v0.1.
- No router configuration.
- No production routing advice.
- No production incident claims.
- No AC/ASI/consciousness story in the first engineering document.
- No LLM/API calls.
- No sprawling framework unless absolutely necessary.
- No naive “sort by update count and call it anomaly.”

## Files to create

Create at least:

1. `routesignal_scout_v0_1.py`
2. `README_RouteSignal_v0_1.md`
3. `run_routesignal_demo.bat`
4. `run_routesignal_as5603_7d.bat`

Example runs must generate output directories containing:

- `routesignal_report.html`
- `routesignal_summary.json`
- `routesignal_events.csv`
- `routesignal_manifest.json`
- `cache/` with raw API responses or demo fixtures

Optional if easy:

- `routesignal_candidates.csv`
- `routesignal_debug_warnings.json`

## CLI requirements

Implement these example commands:

```bash
python routesignal_scout_v0_1.py --demo-data --out routesignal_out_demo --workers 4
python routesignal_scout_v0_1.py --resources AS5603 --days 7 --out routesignal_out_as5603 --workers 4
python routesignal_scout_v0_1.py --resources AS5603,AS21283 --days 7 --out routesignal_out_si_compare --workers 4
python routesignal_scout_v0_1.py --resource 193.2.0.0/16 --days 7 --out routesignal_out_prefix --workers 2
python routesignal_scout_v0_1.py --resources AS5603 --days 7 --no-web --cache-dir routesignal_out_as5603/cache --out routesignal_replay
```

Minimum options:

- `--resource RESOURCE` single ASN or prefix.
- `--resources R1,R2,...` comma-separated resources.
- `--asn ASN` convenience alias; accepts `5603` or `AS5603`.
- `--prefix PREFIX` convenience alias.
- `--days N` default `7`.
- `--from YYYY-MM-DD` optional start date.
- `--to YYYY-MM-DD` optional end date.
- `--ip-version both|v4|v6` default `both`.
- `--workers N` default `1`.
- `--out DIR` default `routesignal_out`.
- `--cache-dir DIR` default `<out>/cache`.
- `--offline-cache` prefer cache and fetch only missing.
- `--no-web` never fetch; cache/demo only.
- `--demo-data` generate deterministic synthetic BGP-like data with injected anomaly windows.
- `--window-minutes N` default `60`.
- `--top N` default `20`.
- `--progress-every N` default `1`.
- `--verbose`.
- `--seed N` default `42` for demo determinism.

CLI behavior rules:

- Combine and de-duplicate resources from `--resource`, `--resources`, `--asn`, and `--prefix`.
- Normalize ASN values to `ASNNNN` internally.
- Keep prefixes as valid-looking prefix strings; warn on malformed prefix/resource.
- If no resource is provided and `--demo-data` is false, default to `AS5603` and record a manifest warning.
- With `--no-web`, never attempt HTTP. Use cache if available; otherwise complete with empty/data-limited report and clear warnings.
- Keep output deterministic: sort resources, windows, and candidates in stable order.

## RIPEstat data-source requirements

Primary source: RIPEstat Data API.

Base pattern:

```text
https://stat.ripe.net/data/<endpoint>/data.json?param1=value1&param2=value2
```

Use endpoint-specific parameters. Do **not** blindly use the same start/end parameters for every endpoint.

### Endpoint 1 — `bgp-update-activity`

Purpose: coarse time-series activity counts.

Suggested request parameters:

- `resource`
- `starttime`
- `endtime`
- `max_samples`
- `min_sampling_period`
- `hide_empty_samples=false` when supported, because zero windows are useful for baselines

Expected useful fields when available:

- update sample start time,
- announcements,
- withdrawals,
- sampling period,
- query start/end.

Parsing cautions:

- withdrawals may be missing or null for some ASN queries; treat null as unknown, not zero unless documented by the payload.
- missing empty samples may be implied; reconstruct zero/stable windows when safe.

### Endpoint 2 — `bgp-updates`

Purpose: more detailed chronological BGP update events.

Suggested request parameters:

- `resource`
- `starttime`
- `endtime`

Expected useful fields when available:

- update `type`: `A` announcement or `W` withdrawal,
- `timestamp`,
- `attrs.target_prefix`,
- `attrs.path` for announcements,
- `source_id`,
- `seq`,
- total count fields if present.

Parsing cautions:

- data availability may be time-limited by RIPEstat indexing.
- payload shapes can change; parse defensively.
- if detailed updates are missing, fall back to activity-only mode.

### Endpoint 3 — `bgp-state`

Purpose: route-state snapshot at a point in time.

Suggested request parameters:

- `resource`
- `timestamp`

Do **not** call `bgp-state` with only `starttime/endtime` as if it were an interval endpoint.

Use it as an optional snapshot/proxy source, for example:

- one query near start of range,
- one query near end of range,
- optional middle query if cheap.

Expected useful fields when available:

- `bgp_state` route records,
- target prefix,
- path/origin fields if present,
- collector/source fields if present.

Parsing cautions:

- this is not the primary event stream.
- missing `bgp_state` should not fail the run.

## Fetch/cache rules

- Use robust URL construction with `urllib.parse`.
- Use a reasonable timeout.
- Cache every raw API response as JSON before parsing.
- Cache file name should include endpoint, normalized resource, time range/timestamp, and a short hash.
- Record endpoint URL, cache path, fetch timestamp, cache hit/miss, parse status, and warnings in `routesignal_manifest.json`.
- Avoid API spam: one fetch per endpoint/resource/time request; no repeated identical calls in one run.
- Fetching may be sequential for API safety; CPU workers should analyze already-fetched/loaded records.
- If fetch fails, continue with other endpoints/cache/demo and mark evidence as partial.

## RPKI/ROA stance for v0.1

Add a clean placeholder only:

```json
"rpki_enrichment_status": "placeholder_not_validating"
```

Do not claim ROA validation unless actually implemented. Leave hooks for v0.2.

Deferred to v0.2+:

- real RPKI/ROA validation,
- ASPA/route-leak lane,
- RIPE RIS / RouteViews raw MRT parser,
- Rust parser,
- live dashboard.

## Demo-data requirement

`--demo-data` must work without internet and must produce all output types.

Generate deterministic synthetic BGP-like windows for several resources, for example:

- `AS5603`
- `AS_DEMO_STABLE`
- `AS_DEMO_CHURNY`
- `AS_DEMO_IPV6_SPIKE`
- `AS_DEMO_PATH_SHIFT`

Inject at least three known anomaly windows:

1. withdrawal burst,
2. IPv6-heavy update burst,
3. path-shape / prefix-diversity shift proxy.

Rules:

- Use deterministic `--seed`.
- Save demo fixtures into `<out>/cache/` so offline replay can work.
- Mark demo rows with `source_quality = demo`.
- Add `demo_truth_label` only for validation/debug summary; do not use it in scoring.
- Report must clearly say synthetic/demo data is being used.
- Demo smoke should surface at least three known anomaly types in candidates or validation summary.

## Event/window model

Normalize all data into one window model.

Each row/window should include at least:

```json
{
  "resource": "AS5603",
  "window_start": "2026-05-01T00:00:00Z",
  "window_end": "2026-05-01T01:00:00Z",
  "ip_version": "v4|v6|unknown|mixed",
  "announcements": 0,
  "withdrawals": 0,
  "updates_total": 0,
  "unique_prefixes": 0,
  "unique_peers": 0,
  "unique_origins": 0,
  "path_change_proxy": 0,
  "raw_symbols": "",
  "symbol_len": 0,
  "source_quality": "activity_only|updates_parsed|state_parsed|demo|partial|missing",
  "parse_warnings": []
}
```

Also include when available:

- `prefixes_sample`
- `peers_sample`
- `origins_sample`
- `collector_count`
- `endpoint_sources`
- `demo_truth_label` only in demo mode

Rules:

- Do not assume RIPEstat gives all fields.
- Missing fields become zero/unknown plus warnings.
- Every analyzed time window should have a row, even if mostly empty.
- Use UTC timestamps.
- Sort deterministically.

## Symbol encoding

For each window, encode a symbolic process stream.

Suggested symbols:

- `A` announcement
- `W` withdrawal
- `4` IPv4
- `6` IPv6
- `P` prefix diversity / path proxy changed
- `S` stable or no-change placeholder
- `B` burst bucket
- `U` unknown/missing

Encoding rules:

- If parsed updates exist, encode from event-like records.
- If only activity counts exist, create a coarser count-bucket stream.
- Cap repeated symbols to avoid huge strings; preserve intensity through buckets.
- Always generate a non-empty stream: use `S` for stable/zero and `U` for missing.
- Store short snippets for top-candidate report display.

## Metrics

Compute at least:

- `updates_total`
- `announcements`
- `withdrawals`
- `withdrawal_share`
- `aw_ratio`
- `baseline_median`
- `baseline_mad`
- `residual_mad_z`
- `burst_zscore`
- `peer_residual_z` when peers exist
- `shannon_entropy`
- `lz76_complexity`
- `zlib_ratio`
- `complexity_residual_z`
- `symbol_len`
- `unique_prefixes`
- `unique_peers`
- `unique_origins`
- `path_change_proxy`
- `diversity_residual_z`
- `ip_version_signal`
- `missing_data_penalty`
- `source_quality_score`

Metric rules:

- Prefer median/MAD over mean/std.
- Use safe division everywhere.
- If MAD is zero, use an epsilon/fallback and log `mad_zero_fallback`.
- LZ76 can be a simple normalized implementation for short symbol strings.
- `zlib_ratio = compressed_size / raw_size`, with safe handling for short strings.
- Entropy is Shannon entropy over symbols.
- Complexity/residual metrics compare with own-resource history when possible.
- Missing/partial source data must reduce confidence.

## MDL×DCC scoring

Build a transparent scoring function.

Default visible weights:

```json
{
  "residual": 1.40,
  "burst": 1.10,
  "withdrawal": 1.00,
  "complexity": 0.90,
  "diversity": 0.90,
  "peer_control": 0.70,
  "ip_version": 0.55,
  "agreement_bonus": 0.75,
  "missing_data_penalty": 1.00
}
```

Scoring formula concept:

```text
DCC score = weighted signal scores + agreement bonus - missing/data-quality penalty
```

Independent signal families:

- robust residual / volume change,
- burstiness,
- withdrawal behavior,
- symbolic complexity change,
- diversity/path proxy,
- peer contrast,
- IP-version shift.

Promotion rule:

- Promote normal candidates only if at least two independent signal families agree.
- If one signal is extreme but alone, mark `single_signal_extreme`.
- Never rank purely by `updates_total`.

Candidate fields:

- `rank`
- `dcc_score`
- `agreement_count`
- `dominant_signals`
- `confidence_band`
- `candidate_reason`
- `explanation_short`
- `source_quality_note`

Suggested confidence bands:

- `review`
- `weak_review`
- `data_limited`
- `background`

Explanation examples:

- “Update volume is 7.2 MAD above this resource’s own baseline.”
- “Withdrawal share is unusually high.”
- “Symbol complexity changed relative to the resource baseline.”
- “Peer resources did not show the same-window rise.”
- “Source quality is partial, so confidence is reduced.”

## Controls

Implement at least:

1. **Own-baseline control** — compare window against the same resource’s history.
2. **Peer-resource control** — when multiple resources are provided, compare same-window behavior against the other resources.
3. **Source-quality control** — confidence is reduced for activity-only/missing/partial data.

Rules:

- Absence of peer resources must not fail the run.
- The report must say whether peer control was available.
- Demo truth labels are post-hoc validation only, never scoring input.
- Do not implement broad Internet-wide comparison in v0.1.

## HTML report requirements

Generate `routesignal_report.html` with embedded CSS and no external JS/CSS dependencies.

Required sections:

1. Title and run metadata.
2. Strong non-claim box: read-only research prototype; not an incident detector.
3. Resources analyzed.
4. Five-minute expert summary:
   - top 5 candidates,
   - data-quality status,
   - strongest signal type,
   - weakest evidence,
   - questions for Jan / operator critic.
5. Top routing-signal candidates.
6. Per-candidate explanation cards:
   - resource,
   - time window,
   - DCC score,
   - confidence band,
   - agreement count,
   - key metrics,
   - explanation bullets,
   - source-quality note,
   - controls available.
7. Baseline/metric tables or simple inline bars.
8. IPv4 vs IPv6 section if data permits.
9. Data quality / missing fields section.
10. Methodology and scoring weights.
11. Limitations.
12. Next tests and upgrade path.

“Questions for Jan” should include:

- Are these metrics meaningful from an operator perspective?
- Which RIPEstat fields/endpoints are the right next layer?
- What is the minimum credible route-leak/hijack control?
- Which false positives are expected?
- Is raw RIS/RouteViews MRT necessary for meaningful v0.2?

## JSON/CSV/manifest requirements

### `routesignal_summary.json`

Include:

- run metadata,
- resources,
- date range,
- window size,
- counts,
- top candidates,
- aggregate scores,
- data-quality warnings,
- strongest/weakest evidence summary,
- demo validation summary when demo mode is used,
- questions for expert critique.

### `routesignal_events.csv`

One row per analyzed window. Include at least:

- `rank`
- `resource`
- `window_start`
- `window_end`
- `ip_version`
- `updates_total`
- `announcements`
- `withdrawals`
- `withdrawal_share`
- `baseline_median`
- `baseline_mad`
- `residual_mad_z`
- `peer_residual_z`
- `shannon_entropy`
- `lz76_complexity`
- `zlib_ratio`
- `unique_prefixes`
- `unique_peers`
- `unique_origins`
- `path_change_proxy`
- `missing_data_penalty`
- `agreement_count`
- `dcc_score`
- `confidence_band`
- `source_quality`
- `dominant_signals`
- `explanation_short`

### `routesignal_manifest.json`

Include:

- version,
- command-line args,
- timestamp UTC,
- Python version,
- platform,
- code hash if easy,
- resources,
- date range,
- endpoint URLs used,
- cache files,
- cache hits/misses,
- scoring weights,
- thresholds,
- warnings,
- data-source quality,
- worker count requested/effective,
- runtime seconds,
- `rpki_enrichment_status`,
- assumptions,
- limitations.

Pretty-print JSON. Sort keys where useful.

## CPU worker requirements

Use workers for per-resource analysis and/or metric computation, not racey cache writes.

Rules:

- Default workers = 1.
- Deterministic output order after parallel completion.
- No racey writes to shared files.
- Fetch/cache should be safe and not spam APIs.
- If workers > resources, handle gracefully.
- Log workers requested/effective in manifest.
- Keep Windows safe with `if __name__ == "__main__":`.
- Worker exceptions should become warnings and partial output, not silent crashes.

## README requirements

`README_RouteSignal_v0_1.md` must explain:

- what the tool does,
- what it does not claim,
- how to run demo mode,
- how to run AS5603 mode,
- how cache/offline replay works,
- how to interpret DCC scores,
- what `candidate` means,
- what data-quality warnings mean,
- how to send output to Jan for critique,
- next upgrade path,
- acceptance-test commands.

Keep README practical. No AC/ASI/consciousness story.

## Batch files

Create Windows-friendly batch files.

`run_routesignal_demo.bat`:

```bat
@echo off
python routesignal_scout_v0_1.py --demo-data --out routesignal_out_demo --workers 4 --top 20 --verbose
```

`run_routesignal_as5603_7d.bat`:

```bat
@echo off
python routesignal_scout_v0_1.py --resources AS5603 --days 7 --out routesignal_out_as5603_7d --workers 4 --top 20 --verbose
```

Do not include `pause` unless commented out.

## Implementation order

Follow this order to avoid overbuilding:

### Stage A — working skeleton

- parse CLI,
- create output/cache dirs,
- write manifest shell,
- generate demo windows,
- write CSV/JSON/HTML minimal outputs.

### Stage B — MDL×DCC core

- symbol encoding,
- robust stats,
- entropy/LZ/zlib,
- scoring,
- explanations,
- candidate ranking,
- top 5 expert-summary box.

### Stage C — cache/offline and RIPEstat

- cache write/read,
- `--no-web`,
- endpoint-specific RIPEstat fetch,
- defensive parsers,
- source-quality warnings.

### Stage D — tests and polish

- demo smoke,
- offline replay,
- determinism workers 1 vs 2,
- optional real AS5603 fetch,
- malformed/empty/zero-MAD kill-tests.

## Acceptance tests

### Test 1 — Demo smoke

```bash
python routesignal_scout_v0_1.py --demo-data --out routesignal_out_demo --workers 2 --top 10 --verbose
```

Expected:

- `routesignal_report.html` exists.
- `routesignal_summary.json` exists.
- `routesignal_events.csv` exists and has rows.
- `routesignal_manifest.json` exists.
- At least three demo anomaly types appear among candidates or validation summary.
- Report clearly says demo/synthetic data.

### Test 2 — Offline replay smoke

```bash
python routesignal_scout_v0_1.py --demo-data --out routesignal_out_demo_replay_seed --workers 1
python routesignal_scout_v0_1.py --no-web --cache-dir routesignal_out_demo_replay_seed/cache --out routesignal_out_demo_replay --workers 1
```

Expected:

- Second run does not fetch web.
- Second run produces report/summary/events/manifest.
- Manifest shows cache usage.

### Test 3 — Determinism smoke

```bash
python routesignal_scout_v0_1.py --demo-data --seed 42 --out routesignal_out_det_a --workers 1 --top 10
python routesignal_scout_v0_1.py --demo-data --seed 42 --out routesignal_out_det_b --workers 2 --top 10
```

Expected:

- Top candidate order is identical or explainably equivalent.
- Output path/timestamp differences do not affect ranking.

### Test 4 — Optional real data

```bash
python routesignal_scout_v0_1.py --resources AS5603 --days 7 --out routesignal_out_as5603 --workers 2 --verbose
```

Expected:

- If fetch succeeds, report real-data evidence status.
- If fetch fails, explain why and still complete demo mode separately.

### Test 5 — Data-quality kill-tests

Run or describe handling for:

- no internet,
- malformed cached JSON,
- zero-event resource,
- one resource only with no peer control,
- `--workers` greater than resource count,
- zero MAD baseline,
- endpoint partial/missing fields.

The script should not crash. It should report warnings.

## Success criteria

Ready for BD internal review if:

- demo mode produces a clean HTML report,
- real-data mode is implemented even if not fetchable in the environment,
- output is deterministic and cacheable,
- top candidates are not naive update-count sorting,
- scoring weights and thresholds are visible,
- explanations are human-readable,
- data-quality limitations are obvious,
- a future expert can critique it in five minutes,
- no production claims are made.

## Implementation hints

Useful standard-library modules:

- `argparse`
- `csv`
- `json`
- `hashlib`
- `statistics`
- `datetime`
- `urllib.request`
- `urllib.parse`
- `concurrent.futures`
- `platform`
- `pathlib`
- `zlib`
- `math`
- `random`
- `html`
- `traceback`

Suggested internal functions:

- `parse_args()`
- `normalize_resources(args)`
- `build_time_range(args)`
- `cache_key(endpoint, resource, params)`
- `fetch_or_load_endpoint(...)`
- `parse_bgp_update_activity(...)`
- `parse_bgp_updates(...)`
- `parse_bgp_state(...)`
- `generate_demo_windows(args)`
- `window_events(...)`
- `encode_symbols(window)`
- `compute_metrics(windows)`
- `compute_controls(windows)`
- `score_windows(windows, weights)`
- `make_candidate_explanations(...)`
- `write_csv(...)`
- `write_json(...)`
- `write_html_report(...)`

## Final response format from the builder

When done, return exactly:

1. Files created.
2. Exact commands run.
3. Exact outputs generated.
4. Evidence status: `demo-only`, `real-data fetched`, or `partial`.
5. Strongest current signal.
6. Weakest part / what needs Jan’s expert critique.
7. Acceptance-test results.
8. Recommended next action.
9. Whether the package is ready for BD internal review or needs one more patch.

Do not claim v0.1 detects real incidents.

Reports

Detailed scoring reports

The full scoring notes are preserved below for reproducibility and critique.

Phase 1 prompt scoring reportMarkdown report

# RouteSignal Scout A/B/C/D/E Prompt Scoring Report

Date: 2026-05-31
Evaluator: GPT-5.5 Pro
Input ZIP: `ABCDE-prompts.zip`

## What was scored

Main scores are for the actual builder prompts intended to create Python code:

- `A-result.txt` — direct/classic builder prompt result
- `B-result.txt` — Microsoft Copilot Prompt Coach result
- `C-result.txt` — MS Copilot + RHPr/RHP result
- `D-result.txt` — fresh BD × GPT RHPr/RHP result
- `E-reference.txt` — previous hybrid reference prompt

The initial `A-prompt.txt`–`D-prompt.txt` files were also reviewed, but they are meta-prompts for generating builder prompts, so they are scored separately.

## Main rubric: Python-builder prompt quality, 100 points

| Category | Max | Meaning |
|---|---:|---|
| Build target & deliverables | 14 | Clear role, file list, CLI, repository shape, coding task clarity |
| Safety/scope/no-overclaim | 14 | Public data only, read-only, no production routing claims, no unsafe labels |
| Data-source/API precision | 14 | RIPEstat specificity, endpoint awareness, parameter correctness, defensive parsing |
| Offline/demo/cache/failure handling | 12 | Deterministic demo, offline replay, cache keys, fallback behavior, partial-data handling |
| MDL×DCC scoring fidelity | 14 | Multi-signal families, robust baselines, symbolic streams, compression/process metrics, explainable DCC promotion |
| Tests/acceptance/reproducibility | 12 | Smoke tests, determinism checks, no-overclaim scan, single-metric guard, replay tests |
| Outputs/report/user review value | 10 | HTML/CSV/JSON/manifest quality, data-quality section, candidate explanations |
| Implementation workflow/portability/cognitive load | 10 | Build order, Windows friendliness, dependency discipline, not too vague or too bloated |

## Final builder prompt scores

| Rank | Prompt | Score | Verdict |
|---:|---|---:|---|
| 1 | `E-reference.txt` | 96 | Still the strongest builder prompt. Best API correction, endpoint-specific RIPEstat detail, controls, CPU-worker note, implementation stages, and acceptance tests. Slightly long and reference-like, but strongest for actually producing code. |
| 2 | `D-result.txt` | 94 | Best fresh result. Very strong, safer and cleaner than A/B/C, with endpoint-specific RIPEstat handling and strong DCC/scoring/reporting sections. Slightly weaker than E because it lacks some E-reference controls/CPU-worker/implementation-order detail. |
| 3 | `C-result.txt` | 91 | Strong RHPr/RHP result. Very good blind-spot recovery, scoring discipline, single-metric guard, data-quality handling, and tests. Main weakness: RIPEstat section is still more generic than D/E. |
| 4 | `A-result.txt` | 89 | Good direct baseline. Surprisingly strong and buildable. Covers most requirements well. Weakness: less endpoint-specific and less controlled than C/D/E; fewer hard kill-tests and less implementation staging. |
| 5 | `B-result.txt` | 82 | Useful and concise, but clearly weaker for this task. It improves structure and safety, but remains more generic, with weaker RIPEstat endpoint precision, less implementation depth, and fewer hard controls. |

## Sub-score matrix

| Prompt | Build target /14 | Safety /14 | API precision /14 | Offline/cache /12 | MDL×DCC /14 | Tests /12 | Outputs /10 | Workflow /10 | Total |
|---|---:|---:|---:|---:|---:|---:|---:|---:|---:|
| `A-result.txt` | 13 | 14 | 11 | 11 | 13 | 10 | 9 | 7 | 89 |
| `B-result.txt` | 11 | 14 | 8 | 10 | 12 | 10 | 8 | 7 | 82 |
| `C-result.txt` | 13 | 14 | 9 | 11 | 13 | 12 | 9 | 7 | 91 |
| `D-result.txt` | 14 | 14 | 13 | 11 | 13 | 11 | 10 | 8 | 94 |
| `E-reference.txt` | 14 | 14 | 14 | 12 | 14 | 12 | 10 | 8 | 96 |

## Prompt-by-prompt notes

### `A-result.txt` — 89/100

Strengths:
- Good direct builder role and deliverables.
- Preserves read-only/public-data/no-overclaim boundary.
- Good coverage of demo, offline, cache, outputs, CLI, tests.
- Strong enough to likely produce a working prototype.

Weaknesses:
- RIPEstat requirements are endpoint-family-level, not endpoint-specific enough.
- Less explicit about endpoint parameter differences.
- No CPU-worker lane.
- Less explicit about controls and no-overclaim/forbidden-token testing than E.

### `B-result.txt` — 82/100

Strengths:
- Clean, readable, practical.
- Good safety wording.
- Good high-level offline/cache/demo requirements.
- Includes 4+ signal families and promotion by ≥2 families.

Weaknesses:
- Too generic for a hard coding builder prompt.
- RIPEstat details are broad and not endpoint-specific.
- Less guidance on exact files, implementation order, controls, and API edge cases.
- Good prompt-improvement result, but weaker as a coding build spec.

### `C-result.txt` — 91/100

Strengths:
- Strong RHPr/RHP structure.
- Good scope, files, CLI, demo/offline/cache, MDL×DCC scoring, output/report, data-quality and smoke-test sections.
- Better than A on blind spots, testability, single-metric guard, and builder workflow.

Weaknesses:
- Still generic about RIPEstat endpoints and parameters.
- Slightly more process-heavy than necessary for coding.
- Does not include the concrete API correction that makes E so strong.

### `D-result.txt` — 94/100

Strengths:
- Best fresh result.
- Strong builder role, safety stance, files, CLI, report format, data-quality flags, and scoring.
- Explicit RIPEstat endpoint candidates: `bgp-update-activity`, `bgp-updates`, and `bgp-state`.
- Correctly treats `bgp-state` as timestamp/snapshot-style rather than generic interval.
- Strong candidate fields and explanation requirements.

Weaknesses:
- Slightly less mature than E on implementation staging and controls.
- No explicit CPU-worker requirement.
- Slightly environment-specific role line: “ChatGPT Pro / GPT-5.5 Pro,” which is fine for D but less portable as a universal builder prompt.

### `E-reference.txt` — 96/100

Strengths:
- Best practical coding spec.
- Best endpoint-specific RIPEstat correction.
- Includes priority rule, files, CLI, fetch/cache, demo truth labels, event/window model, symbols, metrics, scoring weights, controls, HTML/JSON/CSV/manifest requirements, CPU workers, README, batch files, implementation stages, and acceptance tests.
- Most likely to produce a complete working v0.1 package with the fewest missing pieces.

Weaknesses:
- Longest and cognitively heaviest.
- Some reference-specific wording and Jan-related notes are useful but not necessary for every builder session.
- Slightly less “fresh” than D, but stronger as an implementation spec.

## Meta-prompt scores: prompt-generation quality

These scores are not directly comparable to the builder scores above. They evaluate how well each starting prompt asks a model/tool to generate a builder prompt.

| Meta prompt | Score | Notes |
|---|---:|---|
| `A-prompt.txt` | 86 | Clean direct baseline generator. Good context-access rule. Thin but intentionally neutral. |
| `B-prompt.txt` | 89 | Good Prompt Coach test. Clear preservation of intent and anti-generic warning. |
| `C-prompt.txt` | 93 | Strong RHPr/RHP workflow prompt for MS Copilot. Good context-access handling and lens list. |
| `D-prompt.txt` | 96 | Best meta-prompt. Strongest preservation list, old-reference guard, fresh-generation rule, and explicit blind-spot lenses. |

## Main conclusion

The current evidence supports the working hypothesis:

- Microsoft Prompt Coach improved clarity and safety, but produced the weakest actual builder prompt of the result set.
- Direct prompting produced a surprisingly strong baseline.
- RHPr/RHP improved blind-spot recovery, testability, scoring discipline, and data-quality handling.
- The previous hybrid reference remains the strongest implementation prompt, mainly because it combines direct build-first clarity with RHPr/RHP discipline and a concrete RIPEstat endpoint correction.

Best next step:

Use `D-result.txt` and `E-reference.txt` as the two strongest candidates for actual Python builds. For the four-session build test, run A/B/C/D as planned, but keep E as the champion reference and scoring anchor.

Phase 2 code-output evaluation reportMarkdown report

# RouteSignal prompt-method study — Phase 2 code-output benchmark update

Generated: 2026-05-31

This update adds two reference builds:

- `RouteSignal_Scout_v0_1_BUILD_E.zip` — first implementation generated from the old E/reference hybrid prompt.
- `RouteSignal_Scout_v0_2_BUILD_E.zip` — later RouteSignal v0.2 package derived from the E/reference lineage.

## Method

I rescored the full set together, not only the two new E builds. The A/B/C/D builds remain the fresh prompt-method code-output benchmark. E-v0.1 and E-v0.2 are reference/champion artifacts and are useful, but not perfectly same-condition peers.

Scoring was based on:

1. runnability and test evidence,
2. deterministic demo/offline/cache behavior,
3. output artifacts and report usefulness,
4. safety / no-overclaim discipline,
5. public-data / RIPEstat path quality,
6. data-quality and failure handling,
7. MDL×DCC scoring and promotion fidelity,
8. controls / validation / operator-review value,
9. Windows packaging and documentation,
10. code clarity / maintainability / scope discipline.

Public internet/DNS was not usable in this evaluation environment, so live AS5603 behavior was judged by implemented fetch paths, fail-soft behavior, manifests, and included test evidence rather than live API success.

## Commands / checks run

### A

- `python3 routesignal_scout_v0_1.py --demo --out eval_out_demo_A --cache eval_out_demo_A/cache --seed 42 --hours 48 --bucket-minutes 60 --top 20`
- `python3 routesignal_scout_v0_1.py --self-test`

Result: PASS. Self-test reported demo files, deterministic top-5, offline replay, missing-cache manifest, promotion rule, and no-overclaim checks passing.

### B

- `python3 routesignal_scout_v0_1.py --demo --seed 42 --out eval_out_demo_B --top 20`
- `python3 tests/acceptance_tests_B.py`

Result: PASS, but with a concrete quality issue: demo resources include malformed normalized resources such as `AS` and `AS6`, indicating weak handling of synthetic/demo resource names.

### C

- `python3 routesignal_scout_v0_1.py --demo --out eval_out_demo_C --cache eval_out_demo_C/cache --seed 42 --top 20`
- `python3 routesignal_scout_v0_1.py --self-test --self-test-root eval_acceptance_tests_out_C`

Result: PASS. Self-test passed 13/13 checks, including deterministic replay, promotion rule, single-metric guard, manifest completeness, and no-overclaim wording.

### D

- `python3 routesignal_scout_v0_1.py --demo --out eval_out_demo_D --seed 42 --verbose`
- `python3 routesignal_scout_v0_1.py --self-test`

Result: PASS. Self-test passed demo generation, determinism, offline missing-cache manifest, offline demo-cache replay, metric unit tests, promotion rule tests, no-overclaim checks, and manifest completeness.

### E-v0.1

- `python3 routesignal_scout_v0_1.py --demo-data --out eval_routesignal_out_demo --workers 2 --top 10 --verbose`
- `python3 routesignal_scout_v0_1.py --demo-data --out eval_routesignal_out_demo_replay_seed --workers 1`
- `python3 routesignal_scout_v0_1.py --no-web --cache-dir eval_routesignal_out_demo_replay_seed/cache --out eval_routesignal_out_demo_replay --workers 1`
- `python3 routesignal_scout_v0_1.py --demo-data --seed 42 --out eval_routesignal_out_det_a --workers 1 --top 10`
- `python3 routesignal_scout_v0_1.py --demo-data --seed 42 --out eval_routesignal_out_det_b --workers 2 --top 10`
- `python3 routesignal_scout_v0_1.py --resources AS5603 --days 1 --no-web --cache-dir routesignal_bad_cache --out eval_routesignal_out_kill_empty --workers 9 --top 5`

Result: PASS. Strong v0.1 reference build: deterministic demo, offline replay, workers, peer-control style comparisons, endpoint-specific RIPEstat parameter handling, expert-critique questions, and robust source-quality warnings.

### E-v0.2

- `python3 routesignal_scout_v0_2.py --demo-data --resources AS5603,1.1.1.0/24 --days 7 --window-minutes 60 --top 20 --out eval_routesignal_demo_v02_core --analysis-out eval_routesignal_demo_v02_analysis`
- `python3 routesignal_v02_master_test.py --profile smoke --source-root . --work-root eval_master_smoke2 --python /usr/bin/python3 --offline-only --force --step-timeout-seconds 120`

Result: PASS. Master smoke report passed compile, v0.1 help, v0.2 help, v0.2 demo wrapper, final report, and analysis-bundle collection. v0.2 analysis generated global event ledger, parsed-first operator report, known-event validation report, false-positive load summary, repeated-resource summary, manifest, and a thin evidence ZIP.

## Updated scores

| Rank | Build | Score | Category | Verdict |
|---:|---|---:|---|---|
| 1 | E-v0.2 latest | 98/100 | Current champion / v0.2 release | Best overall engineering artifact. Strongest reporting/test harness/thin packaging/operator-facing evidence layer. Not a same-condition A/B/C/D prompt peer. |
| 2 | E-v0.1 reference | 96/100 | v0.1 reference/champion | Best v0.1-line reference build. Strong endpoint discipline, peer controls, deterministic/offline behavior, and expert-review framing. |
| 3 | D | 94/100 | Fresh A/B/C/D build | Best fresh-build prompt result. Richest A/B/C/D implementation, strong MDL×DCC and safety behavior. |
| 4 | C | 92/100 | Fresh A/B/C/D build | Very strong RHPr/RHP build. Good artifact discipline, self-tests, and data-quality handling. |
| 5 | A | 91/100 | Fresh A/B/C/D build | Surprisingly strong direct baseline. Clean, robust, and runnable, but less deep than C/D/E. |
| 6 | B | 81/100 | Fresh A/B/C/D build | Weakest. Works, but more generic; concrete demo-resource normalization bug; weaker domain/API specificity. |

## Prompt score vs code score

| Method/build | Prompt score | Code score |
|---|---:|---:|
| E-reference prompt / E-v0.1 build | 96 | 96 |
| D-result / D build | 94 | 94 |
| C-result / C build | 91 | 92 |
| A-result / A build | 89 | 91 |
| B-result / B build | 82 | 81 |

Note: E-v0.2 is not listed in the prompt-score comparison because it is a later RouteSignal release package, not a direct one-shot prompt result comparable to A/B/C/D.

## Key findings

### E-v0.2 is the current RouteSignal champion

It should be treated as the current engineering baseline, not just another prompt-output artifact. It adds the v0.2 layer: global event ledger, parsed-first operator ranking, known-event/control reporting, false-positive/load proxy summaries, repeated-resource summaries, thin packaging, and a master test harness.

Remaining weakness: live/API paths were not validated in this container, and v0.2 still depends on the v0.1e RIPEstat abstraction. The README itself correctly notes these as weak points.

### E-v0.1 validates the old hybrid reference prompt

E-v0.1 scored higher than fresh D because it is more mature as a v0.1 reference artifact: workers, peer-control comparisons, endpoint-specific RIPEstat notes, deterministic demo validation, offline replay, and explicit expert-critique questions. Its main weakness against the uniform A/B/C/D benchmark is packaging mismatch: it was not generated under the later standardized `00/01/02/03/04` batch-script wrapper.

### D remains the best fresh A/B/C/D build

D is still the best same-round fresh implementation. It has strong safety/no-overclaim discipline, useful demo and offline behavior, stronger endpoint coverage, RPKI-context hooks, DCC-style scoring, and robust self-tests.

### C and A are close

C has better RHPr/RHP-style test/manifest/artifact discipline. A is a very clean direct baseline. The closeness is actually useful: it shows that a strong direct prompt can do well, while RHPr/RHP adds depth and blind-spot coverage rather than magic.

### B remains last after code generation

B did not fail. It produced a working package and passed its acceptance tests. But it stayed last in both prompt-level and code-output evaluation. The concrete demo-resource normalization bug (`AS`, `AS6`) is a good example of the kind of domain-specific issue the stronger prompts avoided.

## Bottom-line conclusion

The updated evidence is stronger than the original prompt-only study:

- Prompt Coach / B was last at prompt level.
- Prompt Coach / B was last again at code-output level.
- The old hybrid E reference lineage produced the best v0.1 reference build and the current best v0.2 RouteSignal package.
- D remains the best fresh one-shot A/B/C/D build.

A fair public phrasing would be:

> In this RouteSignal case study, Prompt Coach improved clarity and safety, but produced the weakest builder prompt and the weakest resulting code package. RHPr/RHP and the BD×GPT hybrid workflow recovered more implementation-critical structure: data-quality handling, endpoint specificity, fallback behavior, tests, MDL×DCC scoring, and operator-review reporting.

Limits

What this does and does not prove

This is a practical engineering case study. It is not a universal benchmark of every model, every Prompt Coach configuration, or every possible routing-analysis task.

It does show

For this RouteSignal task, Prompt Coach was weaker than direct MS Copilot, RHPr/RHP paths, and the BD × GPT hybrid reference, both at prompt level and code-output level.

It does not show

That Prompt Coach is always bad, or that any single model/vendor is always superior. The Prompt Coach backing model/path was not exposed.

It also does not show

That RouteSignal has already solved internet routing triage. This page evaluates prompt and code-output quality. Operational value, long-run tests, and v0.2/v0.2.5 validation belong on the companion RouteSignal Scout page.

Best use

Treat Prompt Coach as a useful quick clarity layer. For hard build/research tasks, use RHPr/RHP or a hybrid process that forces tests, blind spots, boundary rules, and implementation artifacts.