Skip to content

feat: add DAG extractor that introspects Luigi task graph#537

Merged
neoneye merged 24 commits intomainfrom
feature/extract-dag
Apr 7, 2026
Merged

feat: add DAG extractor that introspects Luigi task graph#537
neoneye merged 24 commits intomainfrom
feature/extract-dag

Conversation

@neoneye
Copy link
Copy Markdown
Member

@neoneye neoneye commented Apr 7, 2026

Summary

  • Adds worker_plan_internal/extract_dag.py that walks FullPlanPipeline.requires()/output() recursively to extract the pipeline DAG as JSON
  • Extracts stage name, output files, primary output, and upstream dependencies for all 70 stages
  • Replaces the need to hand-maintain the DAG mapping — regenerate the JSON whenever pipeline stages change

Usage

cd worker_plan
.venv/bin/python -m worker_plan_internal.extract_dag                    # stdout
.venv/bin/python -m worker_plan_internal.extract_dag --output dag.json  # file

Test plan

  • Run python -m worker_plan_internal.extract_dag and verify 70 stages are extracted
  • Compare output against actual pipeline stages in full_plan_pipeline.py
  • Verify upstream dependencies match requires() in individual task files

🤖 Generated with Claude Code

neoneye and others added 24 commits April 7, 2026 14:42
Walks FullPlanPipeline.requires()/output() recursively to extract
stage names, output files, primary outputs, and upstream dependencies
as JSON. Replaces the need to hand-maintain the pipeline DAG mapping.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… auto-detection

- Add source_files() classmethod to PlanTask (returns task's own file by default,
  subclasses can override to declare additional implementation files)
- Auto-detect implementation imports in extract_dag.py by inspecting module
  namespace for classes/functions from worker_plan_internal.* that aren't
  infrastructure (stages, llm_util, etc.)
- Remove primary_output field (was flaw-tracer-specific, not a Luigi concept)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
No subclasses override source_files(), so keep it as a local function
in extract_dag.py rather than a method on PlanTask.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…s are detected

The broad "worker_plan_internal.plan." prefix was excluding implementation
files like plan/data_collection.py. Narrowed to only skip plan/stages/,
plan/run_plan_pipeline, plan/pipeline_environment, and plan/ping_llm.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add description() classmethod to PlanTask, returns first line of docstring
- Add class docstrings to 23 task classes that were missing them
- Include description in extract_dag.py JSON output

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- strategic_decisions_markdown: clarify it summarizes the lever pipeline
- candidate_scenarios: specify aggressive/moderate/conservative scenarios
- filter_documents_to_find: describe what it does, not the problem
- distill_assumptions: explain condensing verbose into concise
- make_assumptions: specify filling info gaps with grounded assumptions

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- filter_documents_to_create: describe the action, not the problem
- draft_documents_to_find: specify content specs with essential info, risks, scenarios
- draft_documents_to_create: same as above
- focus_on_vital_few_levers: specify ~5 levers rated critical/high/medium
- review_assumptions: specify flagging unreasonable/missing/contradictory

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- identify_risks: broad risk register, not just location-dependent
- create_pitch: fix typo, specify pitch structure
- consolidate_assumptions_markdown: list what it actually merges
- identify_purpose: fix typo, explain why classification matters
- data_collection: specify concrete data-gathering areas

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- scenarios_markdown: mention selected + rejected alternatives
- physical_locations: "extract or suggest" instead of "identify/suggest"
- currency_strategy: concise, mention cross-border
- markdown_with_documents: describe structured output
- convert_pitch_to_markdown: specify JSON-to-markdown conversion
- identify_task_dependencies: clarify prerequisites for scheduling
- estimate_task_durations: specify bottom-up with min/max/realistic
- governance_phase6_extra: validation with tough questions
- review_plan: critical review with SMART recommendations
- report: assembles all outputs into final HTML

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- create_wbs_level1: extract project title and top-level phases
- create_wbs_level2: decompose phases into major tasks
- create_wbs_level3: break tasks into detailed subtasks
- wbs_project_level1_and_level2: merge into unified tree
- wbs_project_level1_and_level2_and_level3: complete hierarchy + CSV
- executive_summary: one-pager for decision-makers
- select_scenario: evaluate trade-offs with rationale
- questions_and_answers: anticipate stakeholder questions
- premortem: imagine failure, identify how and why
- self_audit: checklist for gaps, contradictions, unsupported claims

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- start_time: active voice
- setup: describe as loading input, not just data
- expert_review: fix tense, reframe as assembling a panel
- related_resources: list concrete examples
- enrich_team_environment_info: clarify as equipment/facility needs
- project_plan: specify goals, milestones, deliverables, criteria
- governance_phase1_audit: designing governance, not auditing existing
- swot_analysis: mention tailoring by plan purpose
- team_markdown: list what it compiles
- identify_documents: give concrete examples

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Each now reflects its position in the pipeline (brainstormed, triaged,
enriched, vital, scenarios, chosen) instead of repeating "Check X output
for constraint violations."

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds schema_version, pipeline_name, and description fields
around the stages array.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@neoneye neoneye merged commit 3d73c60 into main Apr 7, 2026
3 checks passed
@neoneye neoneye deleted the feature/extract-dag branch April 7, 2026 15:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant