Add RCA (root cause analysis) for plan reports#534
Merged
Conversation
Static DAG registry mapping all 48 PlanExe pipeline stages to their output files, upstream dependencies, and source code paths. Includes lookup functions (find_stage_by_filename, get_upstream_files, get_source_code_paths) and 14 passing tests. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
FlawTracer orchestrates three-phase flaw tracing through the pipeline DAG: - Phase 1: LLM-based flaw identification in starting file - Phase 2: Recursive upstream tracing with deduplication and max depth - Phase 3: Source code analysis at flaw origin stages Tests mock the LLM-calling methods to verify tracing logic, deduplication, depth limits, multi-flaw handling, and depth-sorted output. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Previously, _analyze_source_code was only called when no upstream origin was found (the fallback path). When _trace_upstream successfully identified a deeper origin, Phase 3 was skipped entirely. Now Phase 3 runs whenever an origin stage is known, regardless of how it was determined. Also removes unused imports (json in tracer.py, MagicMock and json in test_tracer.py) and adds a test verifying Phase 3 is called at a deep upstream origin. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ze types - Remove unused `field` import from dataclasses - Remove unused `source_code_base` parameter from FlawTracer.__init__() (registry handles source code path resolution via its own _SOURCE_BASE) - Replace `Optional[X]` with `X | None` using `from __future__ import annotations` - Add clarifying comments for dedup strategy and first-match-wins logic - Remove dead `mock_analysis` variable and unused `SourceCodeAnalysisResult` import from test_tracer.py Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Appends a JSONL line for each significant event during tracing: phase1_start/done, trace_flaw_start/done, upstream_check, upstream_found, origin_found, phase3_start, trace_complete. Monitor progress with: tail -f events.jsonl Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Format: 2026-04-05T23:40:03Z Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Documents what works, what needs fixing (Phase 1 anchoring, loose upstream checks, long evidence quotes), and test run results. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Phase 1 prompt now requires the user's specific flaw as the first result, with additional flaws limited to the same problem family. Phase 2 prompt now requires causal mechanism (not just topical overlap) and limits evidence quotes to 200 characters. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Shows "stages/identify_purpose.py" and "assume/identify_purpose.py" instead of "identify_purpose.py" twice. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Phase 3 always blames the prompt, but some flaws are inherent domain complexity. Future improvement: classify root causes into prompt-fixable, domain complexity, and missing input data. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…xity, missing_input Phase 3 now categorizes each root cause so suggestions are honest: - prompt_fixable: the prompt has a gap that can be edited - domain_complexity: inherently uncertain/contentious, no prompt change resolves it - missing_input: the user's plan didn't provide enough detail Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…esults README: document category field, events.jsonl, updated examples and typical run stats. AGENTS: move Phase 3 to fixed, add India census v3 results, update what-works-well. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
README: add Tips section (start from self_audit, trust chains over suggestions, check category, results are non-deterministic) and Limitations section (LLM subjectivity, first-match-wins, static registry, text-only, diagnostic not prescriptive). AGENTS: add non-determinism and registry drift as MEDIUM open issues, add honest assessment section with guidance on what to trust. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…tion The flaw tracer's registry.py now builds from extract_dag() at import time instead of a 780-line static listing. The public API is unchanged. Also rename upstream_stages → depends_on in StageInfo and tests to match the extract_dag JSON schema. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- StageInfo → NodeInfo (previous commit) - STAGES → NODES - find_stage_by_filename → find_node_by_filename - TraceEntry.stage → TraceEntry.node - OriginInfo.stage → OriginInfo.node - origin_stage → origin_node - All local variables, comments, docstrings, prompts, and test data Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The heuristic is now applied inline in get_upstream_files() instead of being stored on the dataclass. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ject Each node now has implementation.files with role (workflow_node or business_logic) and path, instead of a flat source_files list. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Each artifact is now {"path": "filename"} instead of a flat string.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace nested "implementation": {"files": [...]} with a flat
"source_files": [...] array of {role, path} objects.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…nd artifact_path Each input now specifies which upstream node it reads from and which specific artifact file it consumes, instead of a flat list of node names. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Analyzes the current DAG JSON format's strengths for root cause analysis and identifies gaps (claim-level provenance, runtime context, artifact semantics). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…nalysis Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…dule Renames classes (TracedFlaw→TracedProblem, FlawTraceResult→RCAResult, IdentifiedFlaw→IdentifiedProblem, FlawIdentificationResult→ ProblemIdentificationResult), fields, methods, JSON keys, LLM prompts, and markdown report output. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Remove 700-line inline DAG registry (now auto-generated), replace all "flaw" terminology with "problem" in code samples, JSON keys, CLI args, and prose. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Both the fallback path and the upstream-origin path now use len(trace) - 1, fixing an off-by-one that inflated depth when the origin was found after checking upstream inputs. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
worker_plan_internal/rca/package — a CLI tool that traces problems in PlanExe reports upstream through the pipeline DAG to find where they originated and classify the root causeextract_dag.py— zero maintenance when pipeline changesprompt_fixable,domain_complexity, ormissing_inputprompt_fixableorigins can be fixed automatically by editing the prompt at that noderoot_cause_analysis.json) and markdown (root_cause_analysis.md) reports, sorted by trace depth (deepest root cause first)events.jsonl) for monitoring progress viatail -fUsage
Test plan
--helpworks🤖 Generated with Claude Code