PlanExeOrg · neoneye · Apr 8, 2026 · Apr 5, 2026 · Apr 5, 2026 · Apr 5, 2026
diff --git a/docs/superpowers/plans/2026-04-05-rca.md b/docs/superpowers/plans/2026-04-05-rca.md
diff --git a/docs/superpowers/specs/2026-04-05-rca-design.md b/docs/superpowers/specs/2026-04-05-rca-design.md
@@ -0,0 +1,286 @@
+# Root Cause Analysis (RCA) for PlanExe Reports
+
+> **Historical note:** This spec was written under the name "flaw tracer". The module
+> has been renamed to `rca` (root cause analysis).
+> The static DAG registry described here has since been replaced by `extract_dag.py`
+> which introspects the Luigi task graph at import time.
+
+## Goal
+
+A CLI tool that takes a PlanExe output directory, a starting file, and a problem description, then recursively traces the problem upstream through the DAG of intermediary files to find where it originated. Produces both JSON and markdown output. Built on PlanExe's existing LLM infrastructure so it can eventually become a pipeline stage.
+
+## Architecture
+
+The tool performs a recursive depth-first search through the pipeline DAG. Starting from a downstream file where a problem is observed, it walks upstream one hop at a time — reading input files, asking an LLM whether the problem or a precursor exists there, and continuing until it reaches a node where the problem exists in the output but not in any inputs. At that origin point, it reads the node's source code to identify the likely cause.
+
+Three LLM prompts drive the analysis: problem identification (once at the start), upstream checking (at each hop), and source code analysis (at each origin). All use Pydantic models for structured output and LLMExecutor for fallback resilience.
+
+## Components
+
+```
+worker_plan/worker_plan_internal/rca/
+    __init__.py
+    __main__.py      — CLI entry point (argparse, LLM setup, orchestration)
+    registry.py      — Static DAG mapping: stages, output files, dependencies, source code paths
+    tracer.py        — Recursive tracing algorithm
+    prompts.py       — Pydantic models and LLM prompt templates
+    output.py        — JSON + markdown report generation
+```
+
+### `registry.py` — DAG Mapping
+
+A static Python data structure mapping the full pipeline topology. Each entry describes one pipeline stage:
+
+```python
+@dataclass
+class NodeInfo:
+    name: str                       # e.g., "potential_levers"
+    output_files: list[str]         # e.g., ["002-9-potential_levers_raw.json", "002-10-potential_levers.json"]
+    inputs: list[str]      # e.g., ["setup", "identify_purpose", "plan_type", "extract_constraints"]
+    source_code_files: list[str]    # Relative to worker_plan/, e.g., ["worker_plan_internal/plan/stages/potential_levers.py", "worker_plan_internal/lever/identify_potential_levers.py"]
+```
+
+The registry covers all ~48 pipeline stages. Key functions:
+
+- `find_node_by_filename(filename: str) -> NodeInfo | None` — Given an output filename, return the stage that produced it.
+- `get_upstream_files(stage_name: str, output_dir: Path) -> list[tuple[str, Path]]` — Return `(stage_name, file_path)` pairs for all upstream stages, resolved against the output directory. Skip files that don't exist on disk. When a stage has multiple output files (e.g., both `_raw.json` and `.json`), prefer the clean/processed file since that's what downstream stages consume. If only the raw file exists, use that.
+- `get_source_code_paths(stage_name: str) -> list[Path]` — Return absolute paths to source code files for a stage.
+
+The mapping is derived from the Luigi task classes (`requires()` and `output()` methods) but hard-coded for reliability. When the pipeline changes, this file needs updating.
+
+### `prompts.py` — Pydantic Models and Prompt Templates
+
+Three Pydantic models for structured LLM output:
+
+```python
+class IdentifiedProblem(BaseModel):
+    description: str = Field(description="One-sentence description of the problem")
+    evidence: str = Field(description="Direct quote from the file demonstrating the problem")
+    severity: Literal["HIGH", "MEDIUM", "LOW"] = Field(
+        description="HIGH: fabricated data or missing critical analysis. MEDIUM: weak reasoning or vague claims. LOW: minor gaps."
+    )
+
+class ProblemIdentificationResult(BaseModel):
+    problems: list[IdentifiedProblem] = Field(description="List of discrete problems found in the file")
+
+class UpstreamCheckResult(BaseModel):
+    found: bool = Field(description="True if this file contains the problem or a precursor to it")
+    evidence: str | None = Field(description="Direct quote from the file if found, null otherwise")
+    explanation: str = Field(description="How this connects to the downstream problem, or why this file is clean")
+
+class SourceCodeAnalysisResult(BaseModel):
+    likely_cause: str = Field(description="What in the prompt or logic likely caused the problem")
+    relevant_code_section: str = Field(description="The specific code or prompt text responsible")
+    suggestion: str = Field(description="How to fix or prevent this problem")
+```
+
+Three prompt-building functions, each returning a `list[ChatMessage]`:
+
+**`build_problem_identification_messages(filename, file_content, user_problem_description)`**
+
+System message:
+```
+You are analyzing an intermediary file from a project planning pipeline.
+The user has identified problems in this output. Identify each discrete problem.
+For each problem, provide a short description, a direct quote as evidence, and a severity level.
+Only identify real problems — do not flag stylistic preferences or minor formatting issues.
+```
+
+User message contains the filename, file content, and the user's problem description.
+
+**`build_upstream_check_messages(problem_description, evidence_quote, upstream_filename, upstream_file_content)`**
+
+System message:
+```
+You are tracing a problem through a project planning pipeline to find where it originated.
+A downstream file contains a problem. You are examining an upstream file that was an input
+to the stage that produced the problematic output. Determine if this upstream file contains
+the same problem or a precursor to it.
+```
+
+User message contains the problem details and the upstream file content.
+
+**`build_source_code_analysis_messages(problem_description, evidence_quote, source_code_contents)`**
+
+System message:
+```
+A problem was introduced at this pipeline stage. The problem exists in its output but NOT
+in any of its inputs. Examine the source code to identify what in the prompt text,
+logic, or processing likely caused this problem. Be specific — point to lines or prompt phrases.
+```
+
+User message contains the problem details and the concatenated source code.
+
+### `tracer.py` — Recursive Tracing Algorithm
+
+```python
+class RootCauseAnalyzer:
+    def __init__(self, output_dir: Path, llm_executor: LLMExecutor, source_code_base: Path, max_depth: int = 15, verbose: bool = False):
+        ...
+
+    def trace(self, starting_file: str, problem_description: str) -> RCAResult:
+        """Main entry point. Returns the complete trace result."""
+        ...
+```
+
+The `trace` method implements three phases:
+
+**Phase 1 — Identify problems.**
+Read the starting file. Build the problem identification prompt with the file content and user's description. Call the LLM via `LLMExecutor.run()` using `llm.as_structured_llm(ProblemIdentificationResult)`. Returns a list of `IdentifiedProblem` objects.
+
+**Phase 2 — Recursive upstream trace.**
+For each identified problem, call `_trace_upstream(problem, node_name, current_file, depth)`:
+
+1. Look up the current node's upstream nodes via the registry.
+2. For each upstream node, resolve its output files on disk.
+3. Read each upstream file. Build the upstream check prompt. Call the LLM.
+4. If `found=True`: append to the trace chain and recurse into that node's upstream dependencies.
+5. If `found=False`: this branch is clean, stop.
+6. If depth reaches `max_depth`: stop and mark trace as incomplete.
+
+**Deduplication:** Track which `(node_name, problem_description)` pairs have already been analyzed. If two problems converge on the same upstream file, reuse the earlier result.
+
+**Multiple upstream branches:** When a node has multiple upstream inputs and the problem is found in more than one, follow all branches. The trace can fork — the JSON output represents this as a list of trace entries per problem (each entry has a node and file), ordered from downstream to upstream.
+
+**Phase 3 — Source code analysis at origin.**
+When a problem is found in a node's output but not in any of its inputs, that node is the origin. Read the source code files for that node (via registry). Build the source code analysis prompt. Call the LLM. Attach the result to the problem's origin data.
+
+### `output.py` — Report Generation
+
+Two functions:
+
+**`write_json_report(result: RCAResult, output_path: Path)`**
+
+Writes the full trace as JSON:
+
+```json
+{
+    "input": {
+        "starting_file": "030-report.html",
+        "problem_description": "...",
+        "output_dir": "/path/to/output",
+        "timestamp": "2026-04-05T14:30:00Z"
+    },
+    "problems": [
+        {
+            "id": "problem_001",
+            "description": "Budget of CZK 500,000 is unvalidated",
+            "severity": "HIGH",
+            "starting_evidence": "quote from starting file...",
+            "trace": [
+                {
+                    "node": "executive_summary",
+                    "file": "025-2-executive_summary.md",
+                    "evidence": "...",
+                    "is_origin": false
+                },
+                {
+                    "node": "make_assumptions",
+                    "file": "003-5-make_assumptions.md",
+                    "evidence": "...",
+                    "is_origin": true
+                }
+            ],
+            "origin": {
+                "node": "make_assumptions",
+                "file": "003-5-make_assumptions.md",
+                "source_code_files": ["stages/make_assumptions.py", "assumption/make_assumptions.py"],
+                "likely_cause": "The prompt asks the LLM to...",
+                "suggestion": "Add a validation step that..."
+            },
+            "depth": 2
+        }
+    ],
+    "summary": {
+        "total_problems": 3,
+        "deepest_origin_node": "make_assumptions",
+        "deepest_origin_depth": 3,
+        "llm_calls_made": 12
+    }
+}
+```
+
+**`write_markdown_report(result: RCAResult, output_path: Path)`**
+
+Writes a human-readable report:
+
+```markdown
+# Root Cause Analysis Report
+
+**Input:** 030-report.html
+**Problems found:** 3
+**Deepest origin:** make_assumptions (depth 3)
+
+---
+
+## Problem 1 (HIGH): Budget of CZK 500,000 is unvalidated
+
+**Trace:** executive_summary -> project_plan -> **make_assumptions** (origin)
+
+| Node | File | Evidence |
+|------|------|----------|
+| executive_summary | 025-2-executive_summary.md | "The budget is CZK 500,000..." |
+| project_plan | 005-2-project_plan.md | "Estimated budget: CZK 500,000..." |
+| **make_assumptions** | 003-5-make_assumptions.md | "Assume total budget..." |
+
+**Root cause:** The prompt asks the LLM to generate budget assumptions
+without requiring external data sources...
+
+**Suggestion:** Add a validation step that...
+```
+
+Problems are sorted by depth (deepest origin first) so the most upstream root cause appears at the top.
+
+### `__main__.py` — CLI Entry Point
+
+```
+python -m worker_plan_internal.rca \
+    --dir /path/to/output \
+    --file 030-report.html \
+    --problem "The budget is CZK 500,000 but this number appears unvalidated..." \
+    --output-dir /path/to/output \
+    --max-depth 15 \
+    --verbose
+```
+
+Arguments:
+- `--dir` (required): Path to the output directory containing intermediary files.
+- `--file` (required): Starting file to analyze, relative to `--dir`.
+- `--problem` (required): Text description of the observed problem(s).
+- `--output-dir` (optional): Where to write `root_cause_analysis.json` and `root_cause_analysis.md`. Defaults to `--dir`.
+- `--max-depth` (optional): Maximum upstream hops per problem. Default 15.
+- `--verbose` (optional): Print each LLM call and result to stderr as the trace runs.
+
+Orchestration:
+1. Parse arguments.
+2. Load model profile via `PlanExeLLMConfig.load()` and create `LLMExecutor` with priority-ordered models from the profile.
+3. Create `RootCauseAnalyzer` instance.
+4. Call `analyzer.trace(starting_file, problem_description)`.
+5. Write JSON and markdown reports via `output.py`.
+6. Print summary to stdout.
+
+## LLM Infrastructure Integration
+
+- **LLMExecutor** with `LLMModelFromName.from_names()` for multi-model fallback.
+- **Pydantic models** with `llm.as_structured_llm()` for all three prompt types.
+- **Model profile** loaded from `PLANEXE_MODEL_PROFILE` environment variable (defaults to baseline).
+- **RetryConfig** with defaults (2 retries, exponential backoff) for transient errors.
+- **`max_validation_retries=1`** to allow one structured output retry with feedback on parse failure.
+
+## Scope Boundaries
+
+**In scope:**
+- CLI tool with `--dir`, `--file`, `--problem`, `--output-dir`, `--max-depth`, `--verbose`.
+- Static registry of all ~48 pipeline stages with dependencies and source code paths.
+- Recursive depth-first upstream tracing with three LLM prompt types.
+- JSON + markdown output sorted by trace depth.
+- Source code analysis only at origin stages (lazy evaluation).
+- Full file contents sent to LLM (no chunking or summarization).
+
+**Out of scope (future work):**
+- Library/module API (CLI first, refactor later).
+- Integration as a Luigi pipeline stage.
+- Approach B (full reverse-topological sweep).
+- Approach C (scout-then-trace optimization).
+- Automatic registry generation from Luigi task introspection.
+- UI/web integration.
diff --git a/worker_plan/worker_plan_internal/extract_dag.py b/worker_plan/worker_plan_internal/extract_dag.py
@@ -135,29 +135,75 @@ def _detect_implementation_files(cls: type) -> list[str]:
     return files
 
 
-def _extract_source_files(task: luigi.Task) -> list[str]:
-    """Get source files: task's own file + auto-detected implementation files."""
+def _extract_source_files(task: luigi.Task) -> list[dict[str, str]]:
+    """Get source files: workflow node file + auto-detected business logic files."""
     cls = type(task)
+    files: list[dict[str, str]] = []
 
-    # The task's own file
-    result: list[str] = []
+    # The task's own file (workflow node)
     try:
         task_file = Path(inspect.getfile(cls)).resolve()
-        result.append(str(task_file.relative_to(_WORKER_PLAN_DIR)))
+        files.append({
+            "role": "workflow_node",
+            "path": str(task_file.relative_to(_WORKER_PLAN_DIR)),
+        })
     except (TypeError, ValueError, OSError):
         pass
 
-    # Supplement with auto-detected implementation files
-    for f in _detect_implementation_files(cls):
-        if f not in result:
-            result.append(f)
+    # Auto-detected implementation files (business logic)
+    seen = {f["path"] for f in files}
+    for path in _detect_implementation_files(cls):
+        if path not in seen:
+            files.append({
+                "role": "business_logic",
+                "path": path,
+            })
+            seen.add(path)
+
+    return files
+
+
+def _pick_primary_output(filenames: list[str]) -> str:
+    """Pick the most likely file to be read from a node's outputs.
+
+    Preference: .md > .html > non-raw file > first file.
+    """
+    for ext in (".md", ".html"):
+        for f in filenames:
+            if f.endswith(ext):
+                return f
+    non_raw = [f for f in filenames if "_raw" not in f]
+    if non_raw:
+        return non_raw[0]
+    return filenames[0] if filenames else ""
+
+
+def _extract_inputs(upstream_tasks: list[luigi.Task]) -> list[dict[str, str]]:
+    """Build inputs list: for each upstream task, identify the primary artifact it provides."""
+    inputs: list[dict[str, str]] = []
+    seen: set[str] = set()
+
+    for dep in upstream_tasks:
+        node_name = _class_name_to_stage_name(dep.__class__.__name__)
+        if node_name in seen:
+            continue
+        seen.add(node_name)
+
+        output_files = _extract_output_filenames(dep)
+        primary = _pick_primary_output(output_files)
+        if primary:
+            inputs.append({
+                "from_node": node_name,
+                "artifact_path": primary,
+            })
 
-    return result
+    inputs.sort(key=lambda x: x["from_node"])
+    return inputs
 
 
 def _output_sort_key(stage: dict[str, Any]) -> tuple[int, int, str]:
     """Sort key: numeric prefix from the first output filename, then name."""
-    filename = stage["output_files"][0] if stage.get("output_files") else ""
+    filename = stage["artifacts"][0]["path"] if stage.get("artifacts") else ""
     match = re.match(r"(\d+)-?(\d+)?", filename)
     if match:
         major = int(match.group(1))
@@ -197,18 +243,15 @@ def _walk(task: luigi.Task) -> None:
         cls = type(task)
         stage_name = _class_name_to_stage_name(class_name)
         description = cls.description() if hasattr(cls, "description") else ""
-        output_files = _extract_output_filenames(task)
+        artifacts = [{"path": f} for f in _extract_output_filenames(task)]
+        inputs = _extract_inputs(upstream_tasks)
         source_files = _extract_source_files(task)
-        depends_on_names = sorted(set(
-            _class_name_to_stage_name(dep.__class__.__name__)
-            for dep in upstream_tasks
-        ))
 
         stages.append({
             "id": stage_name,
             "description": description,
-            "output_files": output_files,
-            "depends_on": depends_on_names,
+            "artifacts": artifacts,
+            "inputs": inputs,
             "source_files": source_files,
         })