Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
40 commits
Select commit Hold shift + click to select a range
304c447
feat: add flaw_tracer registry with full pipeline DAG mapping
neoneye Apr 5, 2026
6525dca
refactor: use tuples and modern type syntax in flaw_tracer registry
neoneye Apr 5, 2026
2fa4de3
feat: add flaw_tracer Pydantic models and prompt builders
neoneye Apr 5, 2026
b1cdb29
feat: add flaw_tracer recursive tracing algorithm
neoneye Apr 5, 2026
d05f78d
fix: ensure Phase 3 source code analysis runs for upstream-traced flaws
neoneye Apr 5, 2026
0435abe
refactor: clean up tracer.py — remove unused imports, params, moderni…
neoneye Apr 5, 2026
5c7dd82
feat: add flaw_tracer JSON and markdown report generation
neoneye Apr 5, 2026
bdacb19
fix: sort flaws by depth in markdown report output
neoneye Apr 5, 2026
e479283
feat: add flaw_tracer CLI entry point
neoneye Apr 5, 2026
831ea6b
docs: add flaw tracer design spec and implementation plan
neoneye Apr 5, 2026
83f9488
Merge remote-tracking branch 'origin/main' into feature/flaw-tracer
neoneye Apr 5, 2026
6cb35c8
docs: add flaw_tracer README with usage instructions
neoneye Apr 5, 2026
8b2e6ff
docs: add flaw_tracer README with usage instructions
neoneye Apr 5, 2026
5fce9f0
feat: add events.jsonl live event log to flaw tracer
neoneye Apr 5, 2026
92936a4
fix: shorten event timestamp to HH:MM:SS
neoneye Apr 5, 2026
ffff673
fix: use compact UTC timestamp without subseconds
neoneye Apr 5, 2026
c5c7c15
docs: add AGENTS.md with flaw tracer status and known issues
neoneye Apr 5, 2026
d6c6a0d
fix: anchor Phase 1 to user's flaw and tighten upstream checks
neoneye Apr 5, 2026
2c9b401
fix: disambiguate source code filenames with parent directory
neoneye Apr 5, 2026
8e20e9e
docs: update AGENTS.md — mark fixed issues, add test run v2 results
neoneye Apr 5, 2026
ea93202
docs: add Phase 3 classification limitation and India census v2 results
neoneye Apr 6, 2026
1404563
feat: classify Phase 3 root causes into prompt_fixable, domain_comple…
neoneye Apr 6, 2026
264e76c
docs: update README and AGENTS with Phase 3 classification and test r…
neoneye Apr 6, 2026
db3b788
docs: add tips, limitations, honest assessment, and open issues
neoneye Apr 6, 2026
e0eaf78
Merge remote-tracking branch 'origin/main' into feature/flaw-tracer
neoneye Apr 7, 2026
89eba47
refactor: replace hand-maintained registry with extract_dag introspec…
neoneye Apr 7, 2026
dffb9af
refactor: rename StageInfo to NodeInfo
neoneye Apr 7, 2026
7e699b4
refactor: rename stage→node throughout flaw_tracer
neoneye Apr 7, 2026
a2b7df5
refactor: remove primary_output from NodeInfo
neoneye Apr 7, 2026
91d6058
refactor: replace flat source_files with structured implementation ob…
neoneye Apr 7, 2026
1f2de49
refactor: rename output_files to artifacts with path objects
neoneye Apr 7, 2026
03a3c1d
refactor: flatten implementation.files to top-level source_files array
neoneye Apr 7, 2026
677f7c6
refactor: replace depends_on with inputs array containing from_node a…
neoneye Apr 7, 2026
657e438
Merge remote-tracking branch 'origin/main' into feature/flaw-tracer
neoneye Apr 8, 2026
b06c936
docs: add proposal 133 — DAG format insights and RCA strategy
neoneye Apr 8, 2026
7827d84
refactor: rename flaw_tracer to rca and FlawTracer to RootCauseAnalyzer
neoneye Apr 8, 2026
109b7ab
refactor: rename --flaw to --problem and output files to root_cause_a…
neoneye Apr 8, 2026
411629b
refactor: replace "flaw" terminology with "problem" throughout rca mo…
neoneye Apr 8, 2026
313ac04
docs: update RCA spec and plan to current terminology
neoneye Apr 8, 2026
557cffc
fix: consistent depth calculation in RCA tracer
neoneye Apr 8, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1,498 changes: 1,498 additions & 0 deletions docs/superpowers/plans/2026-04-05-rca.md

Large diffs are not rendered by default.

286 changes: 286 additions & 0 deletions docs/superpowers/specs/2026-04-05-rca-design.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,286 @@
# Root Cause Analysis (RCA) for PlanExe Reports

> **Historical note:** This spec was written under the name "flaw tracer". The module
> has been renamed to `rca` (root cause analysis).
> The static DAG registry described here has since been replaced by `extract_dag.py`
> which introspects the Luigi task graph at import time.

## Goal

A CLI tool that takes a PlanExe output directory, a starting file, and a problem description, then recursively traces the problem upstream through the DAG of intermediary files to find where it originated. Produces both JSON and markdown output. Built on PlanExe's existing LLM infrastructure so it can eventually become a pipeline stage.

## Architecture

The tool performs a recursive depth-first search through the pipeline DAG. Starting from a downstream file where a problem is observed, it walks upstream one hop at a time — reading input files, asking an LLM whether the problem or a precursor exists there, and continuing until it reaches a node where the problem exists in the output but not in any inputs. At that origin point, it reads the node's source code to identify the likely cause.

Three LLM prompts drive the analysis: problem identification (once at the start), upstream checking (at each hop), and source code analysis (at each origin). All use Pydantic models for structured output and LLMExecutor for fallback resilience.

## Components

```
worker_plan/worker_plan_internal/rca/
__init__.py
__main__.py — CLI entry point (argparse, LLM setup, orchestration)
registry.py — Static DAG mapping: stages, output files, dependencies, source code paths
tracer.py — Recursive tracing algorithm
prompts.py — Pydantic models and LLM prompt templates
output.py — JSON + markdown report generation
```

### `registry.py` — DAG Mapping

A static Python data structure mapping the full pipeline topology. Each entry describes one pipeline stage:

```python
@dataclass
class NodeInfo:
name: str # e.g., "potential_levers"
output_files: list[str] # e.g., ["002-9-potential_levers_raw.json", "002-10-potential_levers.json"]
inputs: list[str] # e.g., ["setup", "identify_purpose", "plan_type", "extract_constraints"]
source_code_files: list[str] # Relative to worker_plan/, e.g., ["worker_plan_internal/plan/stages/potential_levers.py", "worker_plan_internal/lever/identify_potential_levers.py"]
```

The registry covers all ~48 pipeline stages. Key functions:

- `find_node_by_filename(filename: str) -> NodeInfo | None` — Given an output filename, return the stage that produced it.
- `get_upstream_files(stage_name: str, output_dir: Path) -> list[tuple[str, Path]]` — Return `(stage_name, file_path)` pairs for all upstream stages, resolved against the output directory. Skip files that don't exist on disk. When a stage has multiple output files (e.g., both `_raw.json` and `.json`), prefer the clean/processed file since that's what downstream stages consume. If only the raw file exists, use that.
- `get_source_code_paths(stage_name: str) -> list[Path]` — Return absolute paths to source code files for a stage.

The mapping is derived from the Luigi task classes (`requires()` and `output()` methods) but hard-coded for reliability. When the pipeline changes, this file needs updating.

### `prompts.py` — Pydantic Models and Prompt Templates

Three Pydantic models for structured LLM output:

```python
class IdentifiedProblem(BaseModel):
description: str = Field(description="One-sentence description of the problem")
evidence: str = Field(description="Direct quote from the file demonstrating the problem")
severity: Literal["HIGH", "MEDIUM", "LOW"] = Field(
description="HIGH: fabricated data or missing critical analysis. MEDIUM: weak reasoning or vague claims. LOW: minor gaps."
)

class ProblemIdentificationResult(BaseModel):
problems: list[IdentifiedProblem] = Field(description="List of discrete problems found in the file")

class UpstreamCheckResult(BaseModel):
found: bool = Field(description="True if this file contains the problem or a precursor to it")
evidence: str | None = Field(description="Direct quote from the file if found, null otherwise")
explanation: str = Field(description="How this connects to the downstream problem, or why this file is clean")

class SourceCodeAnalysisResult(BaseModel):
likely_cause: str = Field(description="What in the prompt or logic likely caused the problem")
relevant_code_section: str = Field(description="The specific code or prompt text responsible")
suggestion: str = Field(description="How to fix or prevent this problem")
```

Three prompt-building functions, each returning a `list[ChatMessage]`:

**`build_problem_identification_messages(filename, file_content, user_problem_description)`**

System message:
```
You are analyzing an intermediary file from a project planning pipeline.
The user has identified problems in this output. Identify each discrete problem.
For each problem, provide a short description, a direct quote as evidence, and a severity level.
Only identify real problems — do not flag stylistic preferences or minor formatting issues.
```

User message contains the filename, file content, and the user's problem description.

**`build_upstream_check_messages(problem_description, evidence_quote, upstream_filename, upstream_file_content)`**

System message:
```
You are tracing a problem through a project planning pipeline to find where it originated.
A downstream file contains a problem. You are examining an upstream file that was an input
to the stage that produced the problematic output. Determine if this upstream file contains
the same problem or a precursor to it.
```

User message contains the problem details and the upstream file content.

**`build_source_code_analysis_messages(problem_description, evidence_quote, source_code_contents)`**

System message:
```
A problem was introduced at this pipeline stage. The problem exists in its output but NOT
in any of its inputs. Examine the source code to identify what in the prompt text,
logic, or processing likely caused this problem. Be specific — point to lines or prompt phrases.
```

User message contains the problem details and the concatenated source code.

### `tracer.py` — Recursive Tracing Algorithm

```python
class RootCauseAnalyzer:
def __init__(self, output_dir: Path, llm_executor: LLMExecutor, source_code_base: Path, max_depth: int = 15, verbose: bool = False):
...

def trace(self, starting_file: str, problem_description: str) -> RCAResult:
"""Main entry point. Returns the complete trace result."""
...
```

The `trace` method implements three phases:

**Phase 1 — Identify problems.**
Read the starting file. Build the problem identification prompt with the file content and user's description. Call the LLM via `LLMExecutor.run()` using `llm.as_structured_llm(ProblemIdentificationResult)`. Returns a list of `IdentifiedProblem` objects.

**Phase 2 — Recursive upstream trace.**
For each identified problem, call `_trace_upstream(problem, node_name, current_file, depth)`:

1. Look up the current node's upstream nodes via the registry.
2. For each upstream node, resolve its output files on disk.
3. Read each upstream file. Build the upstream check prompt. Call the LLM.
4. If `found=True`: append to the trace chain and recurse into that node's upstream dependencies.
5. If `found=False`: this branch is clean, stop.
6. If depth reaches `max_depth`: stop and mark trace as incomplete.

**Deduplication:** Track which `(node_name, problem_description)` pairs have already been analyzed. If two problems converge on the same upstream file, reuse the earlier result.

**Multiple upstream branches:** When a node has multiple upstream inputs and the problem is found in more than one, follow all branches. The trace can fork — the JSON output represents this as a list of trace entries per problem (each entry has a node and file), ordered from downstream to upstream.

**Phase 3 — Source code analysis at origin.**
When a problem is found in a node's output but not in any of its inputs, that node is the origin. Read the source code files for that node (via registry). Build the source code analysis prompt. Call the LLM. Attach the result to the problem's origin data.

### `output.py` — Report Generation

Two functions:

**`write_json_report(result: RCAResult, output_path: Path)`**

Writes the full trace as JSON:

```json
{
"input": {
"starting_file": "030-report.html",
"problem_description": "...",
"output_dir": "/path/to/output",
"timestamp": "2026-04-05T14:30:00Z"
},
"problems": [
{
"id": "problem_001",
"description": "Budget of CZK 500,000 is unvalidated",
"severity": "HIGH",
"starting_evidence": "quote from starting file...",
"trace": [
{
"node": "executive_summary",
"file": "025-2-executive_summary.md",
"evidence": "...",
"is_origin": false
},
{
"node": "make_assumptions",
"file": "003-5-make_assumptions.md",
"evidence": "...",
"is_origin": true
}
],
"origin": {
"node": "make_assumptions",
"file": "003-5-make_assumptions.md",
"source_code_files": ["stages/make_assumptions.py", "assumption/make_assumptions.py"],
"likely_cause": "The prompt asks the LLM to...",
"suggestion": "Add a validation step that..."
},
"depth": 2
}
],
"summary": {
"total_problems": 3,
"deepest_origin_node": "make_assumptions",
"deepest_origin_depth": 3,
"llm_calls_made": 12
}
}
```

**`write_markdown_report(result: RCAResult, output_path: Path)`**

Writes a human-readable report:

```markdown
# Root Cause Analysis Report

**Input:** 030-report.html
**Problems found:** 3
**Deepest origin:** make_assumptions (depth 3)

---

## Problem 1 (HIGH): Budget of CZK 500,000 is unvalidated

**Trace:** executive_summary -> project_plan -> **make_assumptions** (origin)

| Node | File | Evidence |
|------|------|----------|
| executive_summary | 025-2-executive_summary.md | "The budget is CZK 500,000..." |
| project_plan | 005-2-project_plan.md | "Estimated budget: CZK 500,000..." |
| **make_assumptions** | 003-5-make_assumptions.md | "Assume total budget..." |

**Root cause:** The prompt asks the LLM to generate budget assumptions
without requiring external data sources...

**Suggestion:** Add a validation step that...
```

Problems are sorted by depth (deepest origin first) so the most upstream root cause appears at the top.

### `__main__.py` — CLI Entry Point

```
python -m worker_plan_internal.rca \
--dir /path/to/output \
--file 030-report.html \
--problem "The budget is CZK 500,000 but this number appears unvalidated..." \
--output-dir /path/to/output \
--max-depth 15 \
--verbose
```

Arguments:
- `--dir` (required): Path to the output directory containing intermediary files.
- `--file` (required): Starting file to analyze, relative to `--dir`.
- `--problem` (required): Text description of the observed problem(s).
- `--output-dir` (optional): Where to write `root_cause_analysis.json` and `root_cause_analysis.md`. Defaults to `--dir`.
- `--max-depth` (optional): Maximum upstream hops per problem. Default 15.
- `--verbose` (optional): Print each LLM call and result to stderr as the trace runs.

Orchestration:
1. Parse arguments.
2. Load model profile via `PlanExeLLMConfig.load()` and create `LLMExecutor` with priority-ordered models from the profile.
3. Create `RootCauseAnalyzer` instance.
4. Call `analyzer.trace(starting_file, problem_description)`.
5. Write JSON and markdown reports via `output.py`.
6. Print summary to stdout.

## LLM Infrastructure Integration

- **LLMExecutor** with `LLMModelFromName.from_names()` for multi-model fallback.
- **Pydantic models** with `llm.as_structured_llm()` for all three prompt types.
- **Model profile** loaded from `PLANEXE_MODEL_PROFILE` environment variable (defaults to baseline).
- **RetryConfig** with defaults (2 retries, exponential backoff) for transient errors.
- **`max_validation_retries=1`** to allow one structured output retry with feedback on parse failure.

## Scope Boundaries

**In scope:**
- CLI tool with `--dir`, `--file`, `--problem`, `--output-dir`, `--max-depth`, `--verbose`.
- Static registry of all ~48 pipeline stages with dependencies and source code paths.
- Recursive depth-first upstream tracing with three LLM prompt types.
- JSON + markdown output sorted by trace depth.
- Source code analysis only at origin stages (lazy evaluation).
- Full file contents sent to LLM (no chunking or summarization).

**Out of scope (future work):**
- Library/module API (CLI first, refactor later).
- Integration as a Luigi pipeline stage.
- Approach B (full reverse-topological sweep).
- Approach C (scout-then-trace optimization).
- Automatic registry generation from Luigi task introspection.
- UI/web integration.
79 changes: 61 additions & 18 deletions worker_plan/worker_plan_internal/extract_dag.py
Original file line number Diff line number Diff line change
Expand Up @@ -135,29 +135,75 @@ def _detect_implementation_files(cls: type) -> list[str]:
return files


def _extract_source_files(task: luigi.Task) -> list[str]:
"""Get source files: task's own file + auto-detected implementation files."""
def _extract_source_files(task: luigi.Task) -> list[dict[str, str]]:
"""Get source files: workflow node file + auto-detected business logic files."""
cls = type(task)
files: list[dict[str, str]] = []

# The task's own file
result: list[str] = []
# The task's own file (workflow node)
try:
task_file = Path(inspect.getfile(cls)).resolve()
result.append(str(task_file.relative_to(_WORKER_PLAN_DIR)))
files.append({
"role": "workflow_node",
"path": str(task_file.relative_to(_WORKER_PLAN_DIR)),
})
except (TypeError, ValueError, OSError):
pass

# Supplement with auto-detected implementation files
for f in _detect_implementation_files(cls):
if f not in result:
result.append(f)
# Auto-detected implementation files (business logic)
seen = {f["path"] for f in files}
for path in _detect_implementation_files(cls):
if path not in seen:
files.append({
"role": "business_logic",
"path": path,
})
seen.add(path)

return files


def _pick_primary_output(filenames: list[str]) -> str:
"""Pick the most likely file to be read from a node's outputs.

Preference: .md > .html > non-raw file > first file.
"""
for ext in (".md", ".html"):
for f in filenames:
if f.endswith(ext):
return f
non_raw = [f for f in filenames if "_raw" not in f]
if non_raw:
return non_raw[0]
return filenames[0] if filenames else ""


def _extract_inputs(upstream_tasks: list[luigi.Task]) -> list[dict[str, str]]:
"""Build inputs list: for each upstream task, identify the primary artifact it provides."""
inputs: list[dict[str, str]] = []
seen: set[str] = set()

for dep in upstream_tasks:
node_name = _class_name_to_stage_name(dep.__class__.__name__)
if node_name in seen:
continue
seen.add(node_name)

output_files = _extract_output_filenames(dep)
primary = _pick_primary_output(output_files)
if primary:
inputs.append({
"from_node": node_name,
"artifact_path": primary,
})

return result
inputs.sort(key=lambda x: x["from_node"])
return inputs


def _output_sort_key(stage: dict[str, Any]) -> tuple[int, int, str]:
"""Sort key: numeric prefix from the first output filename, then name."""
filename = stage["output_files"][0] if stage.get("output_files") else ""
filename = stage["artifacts"][0]["path"] if stage.get("artifacts") else ""
match = re.match(r"(\d+)-?(\d+)?", filename)
if match:
major = int(match.group(1))
Expand Down Expand Up @@ -197,18 +243,15 @@ def _walk(task: luigi.Task) -> None:
cls = type(task)
stage_name = _class_name_to_stage_name(class_name)
description = cls.description() if hasattr(cls, "description") else ""
output_files = _extract_output_filenames(task)
artifacts = [{"path": f} for f in _extract_output_filenames(task)]
inputs = _extract_inputs(upstream_tasks)
source_files = _extract_source_files(task)
depends_on_names = sorted(set(
_class_name_to_stage_name(dep.__class__.__name__)
for dep in upstream_tasks
))

stages.append({
"id": stage_name,
"description": description,
"output_files": output_files,
"depends_on": depends_on_names,
"artifacts": artifacts,
"inputs": inputs,
"source_files": source_files,
})

Expand Down
Loading