Skip to content

fix: surface state:modified nodes in summary lineage graph#1192

Open
dtaniwaki wants to merge 2 commits into
DataRecce:mainfrom
dtaniwaki:feat/macro-change-detection
Open

fix: surface state:modified nodes in summary lineage graph#1192
dtaniwaki wants to merge 2 commits into
DataRecce:mainfrom
dtaniwaki:feat/macro-change-detection

Conversation

@dtaniwaki

Copy link
Copy Markdown
Contributor

Thanks for maintaining this project! I'd appreciate a review of this bug fix when you have a chance.

PR checklist

  • Ensure you have added or ran the appropriate tests for your PR.
  • DCO signed

What type of PR is this?

fix

What this PR does / why we need it:

When a dbt macro or project config changes, state:modified correctly returns the affected downstream nodes — even if their SQL body (and thus file checksum) is unchanged. However, generate_markdown_summary was building the lineage graph from checksums alone, never passing the lineage_diff.diff that the adapter had already computed. As a result, nodes changed only by macros or configs were silently invisible in the PR summary comment.

The fix is a one-line change in generate_markdown_summary: pass lineage_diff.diff to _build_lineage_graph. A small supporting mechanism (apply_diff / _forced_change_status) lets the graph override the checksum-based status for nodes that state:modified flagged but whose SQL didn't change.

Which issue(s) this PR fixes:

N/A

Special notes for your reviewer:

The diff dict was already being populated correctly by the adapter (it calls state:modified and maps each node to a NodeDiff). The bug was purely in the summary layer not consuming it.

Does this PR introduce a user-facing change?:

Nodes modified by macro or config changes (with no SQL body change) now correctly appear as changed in the Recce PR summary comment lineage graph.

@dtaniwaki dtaniwaki changed the title feat: detect macro and config changes in lineage diff and summary fix: surface state:modified nodes in summary lineage graph Mar 9, 2026
@codecov

codecov Bot commented Mar 9, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 98.80952% with 1 line in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
recce/summary.py 94.11% 1 Missing ⚠️
Files with missing lines Coverage Δ
tests/test_summary.py 100.00% <100.00%> (ø)
recce/summary.py 67.76% <94.11%> (+1.79%) ⬆️

... and 3 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes the PR summary lineage graph so it can surface state:modified nodes even when their SQL checksum hasn’t changed (e.g., macro/config-only changes), by plumbing the adapter-computed lineage_diff.diff into the summary graph builder and allowing that diff to override checksum-derived status.

Changes:

  • Pass lineage_diff.diff into _build_lineage_graph from generate_markdown_summary.
  • Add a Node.apply_diff() override mechanism (_forced_change_status) so externally computed diffs can drive change_status.
  • Add unit tests covering apply_diff behavior and _build_lineage_graph(..., diff=...).

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
recce/summary.py Allows lineage graph node status to be overridden by adapter diff and wires diff into markdown summary generation.
tests/test_summary.py Adds tests ensuring diff-driven “modified” nodes appear in the graph and in “What’s Changed”.

Comment thread tests/test_summary.py Outdated
@gcko

gcko commented Mar 10, 2026

Copy link
Copy Markdown
Contributor

Code Review — PR 1192

Summary

Fixes generate_markdown_summary to pass the adapter's pre-computed lineage_diff.diff into _build_lineage_graph, so nodes flagged by state:modified (e.g., macro/config-only changes) are correctly surfaced in the PR summary lineage graph. Clean, minimal fix with good test coverage.

Findings

[Warning] "Code" label is misleading for macro/config-only changes

File: recce/summary.py:148-149
Issue: When a node is force-marked as modified via apply_diff (because a macro or project config changed, not the SQL body), _what_changed() labels it as "Code" changed. This is technically inaccurate — the node's raw SQL didn't change; a macro or config dependency did. Users may look at the node's SQL diff and see no difference, which could be confusing.
Suggestion: Consider adding a distinct label like "Macro" or "Config" (would require extending NodeDiff to carry the reason). Not blocking since the current behavior is a clear improvement over nodes being completely invisible, and this can be addressed as a follow-up.

[Warning] Pre-existing shared mutable class-level defaults in LineageGraph

File: recce/summary.py:260-262
Issue: LineageGraph defines nodes and edges as class-level mutable dicts (nodes: Dict[str, Node] = {}). Without an __init__ that creates instance-level copies, all LineageGraph instances share the same dict. This is pre-existing (not introduced by this PR), but the new tests in TestBuildLineageGraphWithDiff.test_diff_marks_state_modified_nodes create two graphs sequentially — the second graph silently mutates state from the first. The assertions happen to pass because they run in sequence, but this is fragile and could cause test pollution if test ordering changes or tests run in parallel.
Suggestion: Not blocking for this PR since it's pre-existing. Worth a follow-up to add __init__ to LineageGraph that initializes instance-level dicts.

Verification

  • Tests: 26/26 passed (tests/test_summary.py)
  • Lint: flake8 clean
  • New test coverage: TestNodeApplyDiff (2 tests), TestWhatChanged (1 test), TestBuildLineageGraphWithDiff (3 tests) — all exercise the new code paths including the backward-compatibility check with real manifests.

Verdict

No critical issues found. The core fix (passing lineage_diff.diff to _build_lineage_graph) is correct and well-tested. The warnings above are non-blocking suggestions for follow-up.

@gcko

gcko commented Mar 10, 2026

Copy link
Copy Markdown
Contributor

@dtaniwaki thanks for the PR! Can you please resolve the conflict in tests/test_summary.py and look at addressing the copilot and findings above?

If you feel the issues are not valid, please say as such - and address them as necessary.

@gcko gcko self-requested a review March 10, 2026 01:46

@gcko gcko left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see my above comment #1192 (comment)

@gcko gcko added the bug Something isn't working label Mar 10, 2026
@dtaniwaki dtaniwaki requested review from Copilot and gcko March 11, 2026 14:06

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 3 changed files in this pull request and generated 1 comment.

Comment thread recce/summary.py
Comment on lines 321 to +356
@@ -336,6 +346,14 @@ def _build_lineage_graph(base, current) -> LineageGraph:
node = graph.nodes[node_id]
node.update_data(node_data, "current")

# Apply externally computed diff (e.g., from state:modified or macro detection).
# This allows nodes whose SQL checksum didn't change (e.g., macro-affected nodes)
# to be surfaced as modified in the graph.
if diff:
for node_id, node_diff in diff.items():
if node_id in graph.nodes:
graph.nodes[node_id].apply_diff(node_diff)

Copilot AI Mar 11, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LineageGraph.nodes and LineageGraph.edges are mutable class attributes, so _build_lineage_graph() mutates shared state across calls. With the new apply_diff() / _forced_change_status override, a node marked modified in one summary can remain modified in later summaries even when the next call’s diff doesn’t include it (and graphs can also retain stale nodes/edges from previous builds). Make nodes/edges instance attributes (e.g., add __init__ that sets self.nodes = {} and self.edges = {}), or otherwise ensure each _build_lineage_graph() call starts from a fresh graph state.

Copilot uses AI. Check for mistakes.
@gcko

gcko commented May 8, 2026

Copy link
Copy Markdown
Contributor

Hey @dtaniwaki , thanks for the update on the PR however I see a few issues:

  1. Check and resolve (either fix or explain why it isn't needed) the copilot review comment
  2. DCO failed as your last commit is missing Signed-off-by: Daisuke Taniwaki <daisuketaniwaki@gmail.com>
  3. Python tests are failing due to linting, ./tests/test_summary.py:191:5: E303 too many blank lines (2). In the root of the repository, you can first ensure your code will pass by running make format && make flake8.
  4. There is a merge conflict with the uv.lock file that needs to be fixed (pull the latest uv.lock file)

Once that is addressed we should be good to go

@gcko gcko left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see above comment

Signed-off-by: Daisuke Taniwaki <daisuketaniwaki@gmail.com>
@dtaniwaki dtaniwaki force-pushed the feat/macro-change-detection branch from 4a6a388 to 4ad2699 Compare May 13, 2026 02:03
@dtaniwaki dtaniwaki requested a review from gcko May 13, 2026 02:18

@gcko gcko left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: APPROVE — no blocking issues

The core fix is correct, minimal, well-targeted, and regression-safe. Plumbing lineage_diff.diff into _build_lineage_graph (plus the apply_diff / _forced_change_status override on Node) is exactly what surfaces macro/config-only state:modified nodes that were previously invisible in the summary lineage graph, and it does so additively — checksum-detected changes absent from diff still fall back to the original logic, the new diff parameter is optional with a None default, and the if node_id in graph.nodes guard prevents KeyError for package-filtered/absent nodes. The change also clears both prior-cycle blockers: Copilot's LineageGraph shared-mutable-class-attr concern (now a proper __init__) and the reversed base/current test-argument order. Verified against the real producer — the dbt adapter's get_lineage_diff populates diff via select_nodes(select="state:modified") — so the premise holds. Patch coverage is 98.8%.

All remaining findings are LOW/NIT and non-blocking follow-ups.

Non-blocking follow-ups

  • LOWrecce/summary.py:163-164 (Node._what_changed): a node force-marked modified via apply_diff (macro/config change, SQL body unchanged) is labeled "Code" changed. A user opening that node's SQL diff sees no difference, which is misleading. Non-blocking — invisible nodes were strictly worse — but worth a follow-up to extend NodeDiff with a reason and emit "Macro"/"Config". (Flagged in the prior cycle by the claude review; still open in the current diff.)
  • LOWrecce/summary.py:78-80 (Node.apply_diff): the method consumes only node_diff.change_status and discards NodeDiff.change (the category breaking/non-breaking signal and per-column columns map). Acceptable for the summary graph's current needs, but it means breaking/schema signaling cannot flow through this path later without reworking the override.
  • LOWrecce/summary.py:82-85 (Node.change_status): the property returns _forced_change_status before the structural data_from (added/removed) check, so the diff's status is trusted over the node's actual base/current presence. Currently safe — both the dbt adapter and the base adapter build diff consistently with node presence — but an inconsistent diff would silently drive wrong edge rendering in get_edge_str. Defensive observation, not a live bug.
  • NITtests/test_summary.py:326-328 (test_no_diff_preserves_existing_behavior): the test compares _build_lineage_graph(base, curr) against _build_lineage_graph(base, curr, None), which are identical by the default-argument definition, making it near-tautological. It never asserts modified_set is non-empty, nor that the with-diff path is a superset of the no-diff path. Weak regression guard; consider asserting concrete expected nodes or comparing the diff-applied path against a known fixture.
  • NITrecce/summary.py:277 (LineageGraph.checks): remains a class-level attribute (checks: List[CheckSummary] = None) outside the new __init__. It's an immutable None default reassigned per instance, so no shared-state bug, but for consistency with the nodes/edges fix it could move into __init__.

Nice work — approving.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants