Add RCA (root cause analysis) for plan reports by neoneye · Pull Request #534 · PlanExeOrg/PlanExe

neoneye · 2026-04-05T22:00:40Z

Summary

Adds worker_plan_internal/rca/ package — a CLI tool that traces problems in PlanExe reports upstream through the pipeline DAG to find where they originated and classify the root cause
DAG registry auto-generated from Luigi task introspection via extract_dag.py — zero maintenance when pipeline changes
Three-phase recursive algorithm: (1) identify problems via LLM, (2) trace each upstream with dedup, (3) analyze source code at origin and classify as prompt_fixable, domain_complexity, or missing_input
The classification feeds the self-improve loop: prompt_fixable origins can be fixed automatically by editing the prompt at that node
Produces JSON (root_cause_analysis.json) and markdown (root_cause_analysis.md) reports, sorted by trace depth (deepest root cause first)
Live event log (events.jsonl) for monitoring progress via tail -f
47 tests, all passing

Usage

python -m worker_plan_internal.rca \
    --dir /path/to/output \
    --file 030-report.html \
    --problem "The budget appears unvalidated..." \
    --verbose

Test plan

All 47 rca tests pass
CLI --help works
Module is importable
Manual test on paperclip_automation output (3 problems found, 48 LLM calls, deepest origin depth 12)
Manual test on india_census and minecraft_escape outputs

🤖 Generated with Claude Code

Static DAG registry mapping all 48 PlanExe pipeline stages to their output files, upstream dependencies, and source code paths. Includes lookup functions (find_stage_by_filename, get_upstream_files, get_source_code_paths) and 14 passing tests. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

FlawTracer orchestrates three-phase flaw tracing through the pipeline DAG: - Phase 1: LLM-based flaw identification in starting file - Phase 2: Recursive upstream tracing with deduplication and max depth - Phase 3: Source code analysis at flaw origin stages Tests mock the LLM-calling methods to verify tracing logic, deduplication, depth limits, multi-flaw handling, and depth-sorted output. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Previously, _analyze_source_code was only called when no upstream origin was found (the fallback path). When _trace_upstream successfully identified a deeper origin, Phase 3 was skipped entirely. Now Phase 3 runs whenever an origin stage is known, regardless of how it was determined. Also removes unused imports (json in tracer.py, MagicMock and json in test_tracer.py) and adds a test verifying Phase 3 is called at a deep upstream origin. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…ze types - Remove unused `field` import from dataclasses - Remove unused `source_code_base` parameter from FlawTracer.__init__() (registry handles source code path resolution via its own _SOURCE_BASE) - Replace `Optional[X]` with `X | None` using `from __future__ import annotations` - Add clarifying comments for dedup strategy and first-match-wins logic - Remove dead `mock_analysis` variable and unused `SourceCodeAnalysisResult` import from test_tracer.py Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Appends a JSONL line for each significant event during tracing: phase1_start/done, trace_flaw_start/done, upstream_check, upstream_found, origin_found, phase3_start, trace_complete. Monitor progress with: tail -f events.jsonl Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Format: 2026-04-05T23:40:03Z Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Documents what works, what needs fixing (Phase 1 anchoring, loose upstream checks, long evidence quotes), and test run results. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Phase 1 prompt now requires the user's specific flaw as the first result, with additional flaws limited to the same problem family. Phase 2 prompt now requires causal mechanism (not just topical overlap) and limits evidence quotes to 200 characters. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Shows "stages/identify_purpose.py" and "assume/identify_purpose.py" instead of "identify_purpose.py" twice. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Phase 3 always blames the prompt, but some flaws are inherent domain complexity. Future improvement: classify root causes into prompt-fixable, domain complexity, and missing input data. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…xity, missing_input Phase 3 now categorizes each root cause so suggestions are honest: - prompt_fixable: the prompt has a gap that can be edited - domain_complexity: inherently uncertain/contentious, no prompt change resolves it - missing_input: the user's plan didn't provide enough detail Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…esults README: document category field, events.jsonl, updated examples and typical run stats. AGENTS: move Phase 3 to fixed, add India census v3 results, update what-works-well. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

README: add Tips section (start from self_audit, trust chains over suggestions, check category, results are non-deterministic) and Limitations section (LLM subjectivity, first-match-wins, static registry, text-only, diagnostic not prescriptive). AGENTS: add non-determinism and registry drift as MEDIUM open issues, add honest assessment section with guidance on what to trust. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…tion The flaw tracer's registry.py now builds from extract_dag() at import time instead of a 780-line static listing. The public API is unchanged. Also rename upstream_stages → depends_on in StageInfo and tests to match the extract_dag JSON schema. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- StageInfo → NodeInfo (previous commit) - STAGES → NODES - find_stage_by_filename → find_node_by_filename - TraceEntry.stage → TraceEntry.node - OriginInfo.stage → OriginInfo.node - origin_stage → origin_node - All local variables, comments, docstrings, prompts, and test data Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The heuristic is now applied inline in get_upstream_files() instead of being stored on the dataclass. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…ject Each node now has implementation.files with role (workflow_node or business_logic) and path, instead of a flat source_files list. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Each artifact is now {"path": "filename"} instead of a flat string. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Replace nested "implementation": {"files": [...]} with a flat "source_files": [...] array of {role, path} objects. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…nd artifact_path Each input now specifies which upstream node it reads from and which specific artifact file it consumes, instead of a flat list of node names. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Analyzes the current DAG JSON format's strengths for root cause analysis and identifies gaps (claim-level provenance, runtime context, artifact semantics). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…nalysis Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…dule Renames classes (TracedFlaw→TracedProblem, FlawTraceResult→RCAResult, IdentifiedFlaw→IdentifiedProblem, FlawIdentificationResult→ ProblemIdentificationResult), fields, methods, JSON keys, LLM prompts, and markdown report output. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Remove 700-line inline DAG registry (now auto-generated), replace all "flaw" terminology with "problem" in code samples, JSON keys, CLI args, and prose. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Both the fallback path and the upstream-origin path now use len(trace) - 1, fixing an off-by-one that inflated depth when the origin was found after checking upstream inputs. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

neoneye and others added 30 commits April 5, 2026 22:52

refactor: use tuples and modern type syntax in flaw_tracer registry

6525dca

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

feat: add flaw_tracer Pydantic models and prompt builders

2fa4de3

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

feat: add flaw_tracer JSON and markdown report generation

5c7dd82

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

fix: sort flaws by depth in markdown report output

bdacb19

feat: add flaw_tracer CLI entry point

e479283

docs: add flaw tracer design spec and implementation plan

831ea6b

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Merge remote-tracking branch 'origin/main' into feature/flaw-tracer

83f9488

docs: add flaw_tracer README with usage instructions

6cb35c8

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

docs: add flaw_tracer README with usage instructions

8b2e6ff

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

fix: shorten event timestamp to HH:MM:SS

92936a4

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

fix: use compact UTC timestamp without subseconds

ffff673

Format: 2026-04-05T23:40:03Z Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

docs: add AGENTS.md with flaw tracer status and known issues

c5c7c15

Documents what works, what needs fixing (Phase 1 anchoring, loose upstream checks, long evidence quotes), and test run results. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

fix: disambiguate source code filenames with parent directory

2c9b401

Shows "stages/identify_purpose.py" and "assume/identify_purpose.py" instead of "identify_purpose.py" twice. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

docs: update AGENTS.md — mark fixed issues, add test run v2 results

8e20e9e

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Merge remote-tracking branch 'origin/main' into feature/flaw-tracer

e0eaf78

refactor: rename StageInfo to NodeInfo

dffb9af

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

refactor: remove primary_output from NodeInfo

a2b7df5

The heuristic is now applied inline in get_upstream_files() instead of being stored on the dataclass. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

refactor: replace flat source_files with structured implementation ob…

91d6058

…ject Each node now has implementation.files with role (workflow_node or business_logic) and path, instead of a flat source_files list. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

neoneye and others added 10 commits April 8, 2026 01:23

refactor: rename output_files to artifacts with path objects

1f2de49

Each artifact is now {"path": "filename"} instead of a flat string. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

refactor: flatten implementation.files to top-level source_files array

03a3c1d

Replace nested "implementation": {"files": [...]} with a flat "source_files": [...] array of {role, path} objects. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Merge remote-tracking branch 'origin/main' into feature/flaw-tracer

657e438

refactor: rename flaw_tracer to rca and FlawTracer to RootCauseAnalyzer

7827d84

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

refactor: rename --flaw to --problem and output files to root_cause_a…

109b7ab

…nalysis Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

docs: update RCA spec and plan to current terminology

313ac04

Remove 700-line inline DAG registry (now auto-generated), replace all "flaw" terminology with "problem" in code samples, JSON keys, CLI args, and prose. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

neoneye changed the title ~~Add flaw tracer for root-cause analysis of plan reports~~ Add RCA (root cause analysis) for plan reports Apr 8, 2026

neoneye merged commit 537f273 into main Apr 8, 2026
3 checks passed

neoneye deleted the feature/flaw-tracer branch April 8, 2026 14:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add RCA (root cause analysis) for plan reports#534

Add RCA (root cause analysis) for plan reports#534
neoneye merged 40 commits intomainfrom
feature/flaw-tracer

neoneye commented Apr 5, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

neoneye commented Apr 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Usage

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

neoneye commented Apr 5, 2026 •

edited

Loading