api: emit canonical heading path on treewalk citations (HAL-70)#41
Conversation
Citations carried only {start_page, end_page, section_ids, quote} — no
structural heading path. The bench's path_correct@1 reads a title_path it
could only reconstruct from the parser's chunk titles, which are the wrong
vocabulary, so the metric was structurally 0%.
buildTreeWalkCitations now resolves the document's LLM TOC tree (via the
strategy's TOC provider) into a section-ID -> heading-path map
(tree.BuildHeadingPaths, HAL-109) and attaches the primary section's
heading path as title_path on each citation. Degrades cleanly: no TOC
persisted -> field omitted, prior behaviour unchanged.
Part of HAL-70.
GetTOC errors were all silently treated as 'no heading paths'. Now retrieval.ErrNoTOC (the expected missing-TOC case) stays silent while any other error is logged at debug for observability — still degrading gracefully. Per Sourcery review on #40.
Reviewer's GuideAdds canonical heading paths to TreeWalk citations by resolving section IDs to logical heading paths via the document TOC, wiring this through the citation builder with robust degradation when TOC is unavailable and covering behavior with targeted tests. Sequence diagram for building TreeWalk citations with heading pathssequenceDiagram
participant Client
participant Deps
participant TP as TOCProvider
participant Tree as TreePackage
Client->>Deps: buildTreeWalkCitations(ctx, tree, res)
Deps->>Deps: headingPathsForDoc(ctx, tree)
alt TOC available
Deps->>TP: GetTOC(ctx, tree.DocumentID)
TP-->>Deps: rawTOC
Deps->>Tree: BuildHeadingPaths(tree.Root, nodes)
Tree-->>Deps: headingPaths
Deps-->>Deps: attach title_path from headingPaths[sectionIDs[0]]
Deps-->>Client: citations with title_path
else no TOC or error
Deps-->>Deps: headingPathsForDoc returns nil
Deps-->>Client: citations without title_path
end
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
|
Warning Review limit reached
More reviews will be available in 31 minutes and 50 seconds. Learn how PR review limits work. Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file). ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits. 🚦 How do rate limits work?CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan refill rate. For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, the refill rate gradually slows as usage increases. The highest same-day bursts are limited more strictly. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Hey - I've found 1 issue
Prompt for AI Agents
Please address the comments from this code review:
## Individual Comments
### Comment 1
<location path="internal/api/treewalk_citations_test.go" line_range="63" />
<code_context>
+// a citation must carry the canonical heading path of its primary
+// section, resolved from the TOC — the field the bench's
+// path_correct@1 metric reads.
+func TestBuildTreeWalkCitations_EmitsHeadingPath(t *testing.T) {
+ d := depsWithTOC(citationTestTOC(t), nil)
+ tr := buildTreeWalkTestTree()
</code_context>
<issue_to_address>
**suggestion (testing):** Consider adding a test where the primary section has no corresponding heading path or an empty path to ensure title_path is omitted in that case.
Since the code branches on `if hp := headingPaths[sectionIDs[0]]; len(hp) > 0 { ... }`, it’s important to also cover the cases where (a) there is no entry for the primary section ID, and (b) the entry exists but is empty. Please add a test using a TOC that produces either a missing or empty heading path for the primary section and assert that `title_path` is omitted while `section_ids` and pages stay unchanged. This will guard against regressions where an empty or missing path incorrectly yields a `title_path`.
Suggested implementation:
```golang
// TestBuildTreeWalkCitations_EmitsHeadingPath is the HAL-70 regression:
// a citation must carry the canonical heading path of its primary
// section, resolved from the TOC — the field the bench's
// path_correct@1 metric reads.
func TestBuildTreeWalkCitations_EmitsHeadingPath(t *testing.T) {
d := depsWithTOC(citationTestTOC(t), nil)
tr := buildTreeWalkTestTree()
res := &retrieval.Result{CitedPages: [][2]int{{1, 2}}}
cites := d.buildTreeWalkCitations(context.Background(), tr, res, "how do I install?", "")
if len(cites) != 1 {
t.Fatalf("want 1 citation, got %d: %v", len(cites), cites)
}
c := cites[0]
// Primary section for pages 1-2 is the leaf sec_a1 (Install).
ids, _ := c["section_ids"].([]tree.SectionID)
}
// When the TOC does not provide a heading path for the primary section,
// title_path must be omitted while section_ids and pages remain unchanged.
func TestBuildTreeWalkCitations_OmitsHeadingPathWhenMissingOrEmpty(t *testing.T) {
tr := buildTreeWalkTestTree()
res := &retrieval.Result{CitedPages: [][2]int{{1, 2}}}
// Baseline with full heading paths.
dWithPaths := depsWithTOC(citationTestTOC(t), nil)
baseline := dWithPaths.buildTreeWalkCitations(context.Background(), tr, res, "how do I install?", "")
if len(baseline) != 1 {
t.Fatalf("baseline: want 1 citation, got %d: %v", len(baseline), baseline)
}
baseCitation := baseline[0]
baseSectionIDs, _ := baseCitation["section_ids"].([]tree.SectionID)
basePages, _ := baseCitation["pages"].([][2]int)
// TOC variant with no heading paths for any section: this exercises the
// "missing entry" / "empty slice" branch, i.e. len(hp) == 0.
tocNoPaths := citationTestTOC(t)
tocNoPaths.HeadingPaths = map[tree.SectionID][]string{}
dNoPaths := depsWithTOC(tocNoPaths, nil)
cites := dNoPaths.buildTreeWalkCitations(context.Background(), tr, res, "how do I install?", "")
if len(cites) != 1 {
t.Fatalf("no-paths: want 1 citation, got %d: %v", len(cites), cites)
}
c := cites[0]
// title_path must be omitted when there is no non-empty heading path
// for the primary section.
if _, ok := c["title_path"]; ok {
t.Fatalf("expected title_path to be omitted when primary heading path is missing/empty")
}
// section_ids must be preserved.
ids, _ := c["section_ids"].([]tree.SectionID)
if !reflect.DeepEqual(ids, baseSectionIDs) {
t.Fatalf("section_ids changed when heading path missing:\n got %#v\n want %#v", ids, baseSectionIDs)
}
// pages must be preserved.
pages, _ := c["pages"].([][2]int)
if !reflect.DeepEqual(pages, basePages) {
t.Fatalf("pages changed when heading path missing:\n got %#v\n want %#v", pages, basePages)
}
}
```
1. Ensure `internal/api/treewalk_citations_test.go` imports `reflect` if it does not already:
- Add `reflect` to the existing import block: `import (... "reflect" ...)`.
2. The field `HeadingPaths` on the TOC value returned by `citationTestTOC(t)` is assumed to exist and to have type `map[tree.SectionID][]string`. If the actual field or type differs, adjust `tocNoPaths.HeadingPaths = map[tree.SectionID][]string{}` accordingly so that, for the primary section ID, either there is no map entry or the entry is an empty slice, making the `len(hp) > 0` branch false.
3. If the citation map uses a different key than `"pages"` or a different type than `[][2]int`, adapt the casts and the key name to match the existing tests in this file.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| // a citation must carry the canonical heading path of its primary | ||
| // section, resolved from the TOC — the field the bench's | ||
| // path_correct@1 metric reads. | ||
| func TestBuildTreeWalkCitations_EmitsHeadingPath(t *testing.T) { |
There was a problem hiding this comment.
suggestion (testing): Consider adding a test where the primary section has no corresponding heading path or an empty path to ensure title_path is omitted in that case.
Since the code branches on if hp := headingPaths[sectionIDs[0]]; len(hp) > 0 { ... }, it’s important to also cover the cases where (a) there is no entry for the primary section ID, and (b) the entry exists but is empty. Please add a test using a TOC that produces either a missing or empty heading path for the primary section and assert that title_path is omitted while section_ids and pages stay unchanged. This will guard against regressions where an empty or missing path incorrectly yields a title_path.
Suggested implementation:
// TestBuildTreeWalkCitations_EmitsHeadingPath is the HAL-70 regression:
// a citation must carry the canonical heading path of its primary
// section, resolved from the TOC — the field the bench's
// path_correct@1 metric reads.
func TestBuildTreeWalkCitations_EmitsHeadingPath(t *testing.T) {
d := depsWithTOC(citationTestTOC(t), nil)
tr := buildTreeWalkTestTree()
res := &retrieval.Result{CitedPages: [][2]int{{1, 2}}}
cites := d.buildTreeWalkCitations(context.Background(), tr, res, "how do I install?", "")
if len(cites) != 1 {
t.Fatalf("want 1 citation, got %d: %v", len(cites), cites)
}
c := cites[0]
// Primary section for pages 1-2 is the leaf sec_a1 (Install).
ids, _ := c["section_ids"].([]tree.SectionID)
}
// When the TOC does not provide a heading path for the primary section,
// title_path must be omitted while section_ids and pages remain unchanged.
func TestBuildTreeWalkCitations_OmitsHeadingPathWhenMissingOrEmpty(t *testing.T) {
tr := buildTreeWalkTestTree()
res := &retrieval.Result{CitedPages: [][2]int{{1, 2}}}
// Baseline with full heading paths.
dWithPaths := depsWithTOC(citationTestTOC(t), nil)
baseline := dWithPaths.buildTreeWalkCitations(context.Background(), tr, res, "how do I install?", "")
if len(baseline) != 1 {
t.Fatalf("baseline: want 1 citation, got %d: %v", len(baseline), baseline)
}
baseCitation := baseline[0]
baseSectionIDs, _ := baseCitation["section_ids"].([]tree.SectionID)
basePages, _ := baseCitation["pages"].([][2]int)
// TOC variant with no heading paths for any section: this exercises the
// "missing entry" / "empty slice" branch, i.e. len(hp) == 0.
tocNoPaths := citationTestTOC(t)
tocNoPaths.HeadingPaths = map[tree.SectionID][]string{}
dNoPaths := depsWithTOC(tocNoPaths, nil)
cites := dNoPaths.buildTreeWalkCitations(context.Background(), tr, res, "how do I install?", "")
if len(cites) != 1 {
t.Fatalf("no-paths: want 1 citation, got %d: %v", len(cites), cites)
}
c := cites[0]
// title_path must be omitted when there is no non-empty heading path
// for the primary section.
if _, ok := c["title_path"]; ok {
t.Fatalf("expected title_path to be omitted when primary heading path is missing/empty")
}
// section_ids must be preserved.
ids, _ := c["section_ids"].([]tree.SectionID)
if !reflect.DeepEqual(ids, baseSectionIDs) {
t.Fatalf("section_ids changed when heading path missing:\n got %#v\n want %#v", ids, baseSectionIDs)
}
// pages must be preserved.
pages, _ := c["pages"].([][2]int)
if !reflect.DeepEqual(pages, basePages) {
t.Fatalf("pages changed when heading path missing:\n got %#v\n want %#v", pages, basePages)
}
}- Ensure
internal/api/treewalk_citations_test.goimportsreflectif it does not already:- Add
reflectto the existing import block:import (... "reflect" ...).
- Add
- The field
HeadingPathson the TOC value returned bycitationTestTOC(t)is assumed to exist and to have typemap[tree.SectionID][]string. If the actual field or type differs, adjusttocNoPaths.HeadingPaths = map[tree.SectionID][]string{}accordingly so that, for the primary section ID, either there is no map entry or the entry is an empty slice, making thelen(hp) > 0branch false. - If the citation map uses a different key than
"pages"or a different type than[][2]int, adapt the casts and the key name to match the existing tests in this file.
What
The engine now emits a real structural heading path (
title_path) on every TreeWalk citation — the fix forpath_correct@1 = 0%(root-cause analysis on HAL-70).Before: a citation was
{start_page, end_page, section_ids, quote}; the "structure-aware citation" differentiator was not in the response payload. The bench reverse-engineered a path from the parser's chunk titles (wrong vocabulary vs. the gold's logical headings), so the metric was structurally 0% whilespan_in_top1was 20% (content sometimes right, label always wrong).How
buildTreeWalkCitationsresolves the document's LLM TOC tree (via the strategy'sTOCprovider) into a section-ID → heading-path map withtree.BuildHeadingPaths(HAL-109, now onmain), then attaches the primary section's heading path astitle_pathon each citation. Both the non-streaming and SSE paths build citations through this one chokepoint.headingPathsForDocdegrades to nil on every failure mode (no provider,ErrNoTOC, fetch error, unparseable TOC) —title_pathis simply omitted and the citation is byte-for-byte what it was before.retrieval.ErrNoTOCis silent (expected); other GetTOC errors log at debug for observability (per review).Tests
TestBuildTreeWalkCitations_EmitsHeadingPathand..._NoTOCOmitsHeadingPath.go build/test/gofmt/go vetclean.Note
Supersedes #40, which GitHub auto-closed when its base branch (the merged HAL-109 PR #39) was deleted. Rebased onto
main; carries only the two treewalk commits. The Sourcery review fixes from #40 are included.Closes HAL-70
Summary by Sourcery
Emit canonical heading paths on TreeWalk citations using the document TOC while preserving previous behavior when TOC data is unavailable.
Enhancements:
Tests: