Skip to content

Add RCA (root cause analysis) for plan reports#534

Merged
neoneye merged 40 commits intomainfrom
feature/flaw-tracer
Apr 8, 2026
Merged

Add RCA (root cause analysis) for plan reports#534
neoneye merged 40 commits intomainfrom
feature/flaw-tracer

Conversation

@neoneye
Copy link
Copy Markdown
Member

@neoneye neoneye commented Apr 5, 2026

Summary

  • Adds worker_plan_internal/rca/ package — a CLI tool that traces problems in PlanExe reports upstream through the pipeline DAG to find where they originated and classify the root cause
  • DAG registry auto-generated from Luigi task introspection via extract_dag.py — zero maintenance when pipeline changes
  • Three-phase recursive algorithm: (1) identify problems via LLM, (2) trace each upstream with dedup, (3) analyze source code at origin and classify as prompt_fixable, domain_complexity, or missing_input
  • The classification feeds the self-improve loop: prompt_fixable origins can be fixed automatically by editing the prompt at that node
  • Produces JSON (root_cause_analysis.json) and markdown (root_cause_analysis.md) reports, sorted by trace depth (deepest root cause first)
  • Live event log (events.jsonl) for monitoring progress via tail -f
  • 47 tests, all passing

Usage

python -m worker_plan_internal.rca \
    --dir /path/to/output \
    --file 030-report.html \
    --problem "The budget appears unvalidated..." \
    --verbose

Test plan

  • All 47 rca tests pass
  • CLI --help works
  • Module is importable
  • Manual test on paperclip_automation output (3 problems found, 48 LLM calls, deepest origin depth 12)
  • Manual test on india_census and minecraft_escape outputs

🤖 Generated with Claude Code

neoneye and others added 30 commits April 5, 2026 22:52
Static DAG registry mapping all 48 PlanExe pipeline stages to their
output files, upstream dependencies, and source code paths. Includes
lookup functions (find_stage_by_filename, get_upstream_files,
get_source_code_paths) and 14 passing tests.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
FlawTracer orchestrates three-phase flaw tracing through the pipeline DAG:
- Phase 1: LLM-based flaw identification in starting file
- Phase 2: Recursive upstream tracing with deduplication and max depth
- Phase 3: Source code analysis at flaw origin stages

Tests mock the LLM-calling methods to verify tracing logic, deduplication,
depth limits, multi-flaw handling, and depth-sorted output.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Previously, _analyze_source_code was only called when no upstream origin
was found (the fallback path). When _trace_upstream successfully identified
a deeper origin, Phase 3 was skipped entirely. Now Phase 3 runs whenever
an origin stage is known, regardless of how it was determined.

Also removes unused imports (json in tracer.py, MagicMock and json in
test_tracer.py) and adds a test verifying Phase 3 is called at a deep
upstream origin.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ze types

- Remove unused `field` import from dataclasses
- Remove unused `source_code_base` parameter from FlawTracer.__init__()
  (registry handles source code path resolution via its own _SOURCE_BASE)
- Replace `Optional[X]` with `X | None` using `from __future__ import annotations`
- Add clarifying comments for dedup strategy and first-match-wins logic
- Remove dead `mock_analysis` variable and unused `SourceCodeAnalysisResult`
  import from test_tracer.py

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Appends a JSONL line for each significant event during tracing:
phase1_start/done, trace_flaw_start/done, upstream_check, upstream_found,
origin_found, phase3_start, trace_complete.

Monitor progress with: tail -f events.jsonl

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Format: 2026-04-05T23:40:03Z

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Documents what works, what needs fixing (Phase 1 anchoring,
loose upstream checks, long evidence quotes), and test run results.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Phase 1 prompt now requires the user's specific flaw as the first
result, with additional flaws limited to the same problem family.

Phase 2 prompt now requires causal mechanism (not just topical
overlap) and limits evidence quotes to 200 characters.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Shows "stages/identify_purpose.py" and "assume/identify_purpose.py"
instead of "identify_purpose.py" twice.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Phase 3 always blames the prompt, but some flaws are inherent domain
complexity. Future improvement: classify root causes into prompt-fixable,
domain complexity, and missing input data.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…xity, missing_input

Phase 3 now categorizes each root cause so suggestions are honest:
- prompt_fixable: the prompt has a gap that can be edited
- domain_complexity: inherently uncertain/contentious, no prompt change resolves it
- missing_input: the user's plan didn't provide enough detail

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…esults

README: document category field, events.jsonl, updated examples and typical run stats.
AGENTS: move Phase 3 to fixed, add India census v3 results, update what-works-well.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
README: add Tips section (start from self_audit, trust chains over
suggestions, check category, results are non-deterministic) and
Limitations section (LLM subjectivity, first-match-wins, static
registry, text-only, diagnostic not prescriptive).

AGENTS: add non-determinism and registry drift as MEDIUM open issues,
add honest assessment section with guidance on what to trust.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…tion

The flaw tracer's registry.py now builds from extract_dag() at import
time instead of a 780-line static listing. The public API is unchanged.

Also rename upstream_stages → depends_on in StageInfo and tests to
match the extract_dag JSON schema.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- StageInfo → NodeInfo (previous commit)
- STAGES → NODES
- find_stage_by_filename → find_node_by_filename
- TraceEntry.stage → TraceEntry.node
- OriginInfo.stage → OriginInfo.node
- origin_stage → origin_node
- All local variables, comments, docstrings, prompts, and test data

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The heuristic is now applied inline in get_upstream_files() instead
of being stored on the dataclass.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ject

Each node now has implementation.files with role (workflow_node or
business_logic) and path, instead of a flat source_files list.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
neoneye and others added 10 commits April 8, 2026 01:23
Each artifact is now {"path": "filename"} instead of a flat string.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace nested "implementation": {"files": [...]} with a flat
"source_files": [...] array of {role, path} objects.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…nd artifact_path

Each input now specifies which upstream node it reads from and which
specific artifact file it consumes, instead of a flat list of node names.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Analyzes the current DAG JSON format's strengths for root cause
analysis and identifies gaps (claim-level provenance, runtime
context, artifact semantics).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…nalysis

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…dule

Renames classes (TracedFlaw→TracedProblem, FlawTraceResult→RCAResult,
IdentifiedFlaw→IdentifiedProblem, FlawIdentificationResult→
ProblemIdentificationResult), fields, methods, JSON keys, LLM prompts,
and markdown report output.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Remove 700-line inline DAG registry (now auto-generated), replace all
"flaw" terminology with "problem" in code samples, JSON keys, CLI args,
and prose.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Both the fallback path and the upstream-origin path now use
len(trace) - 1, fixing an off-by-one that inflated depth when
the origin was found after checking upstream inputs.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@neoneye neoneye changed the title Add flaw tracer for root-cause analysis of plan reports Add RCA (root cause analysis) for plan reports Apr 8, 2026
@neoneye neoneye merged commit 537f273 into main Apr 8, 2026
3 checks passed
@neoneye neoneye deleted the feature/flaw-tracer branch April 8, 2026 14:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant