Skip to content

fix(knowledge): guard TreeKnowledge.find_path against malformed parent links#80

Open
saivedant169 wants to merge 1 commit into
LLMQuant:masterfrom
saivedant169:fix/tree-find-path-malformed-parent
Open

fix(knowledge): guard TreeKnowledge.find_path against malformed parent links#80
saivedant169 wants to merge 1 commit into
LLMQuant:masterfrom
saivedant169:fix/tree-find-path-malformed-parent

Conversation

@saivedant169
Copy link
Copy Markdown

What

TreeKnowledge.find_path could raise KeyError or loop forever on malformed parent links.

Why

The walk-up loop followed parent_id without checking the target stayed in nodes:

  • a parent_id pointing at a node not in the map raised KeyError
  • a cycle in the parent links (a points to b, b points to a) looped forever

TreeNode.parent_id is UUID | None with no referential validator, and a Paper (a TreeKnowledge) is populated by Agent(output_type=Paper), so the parent links come from model output and carry no guarantee. Both cases are reachable from normal extraction.

Fix

Stop the walk when the chain leaves the node map or revisits a node, tracking visited ids to break cycles. The path still ends at the requested node and returns the best-effort partial path for a broken chain instead of raising or hanging. The happy path is unchanged.

I went with a partial path rather than [] on a broken chain so callers still get the resolvable ancestors. Happy to switch to [] if you prefer a stricter contract.

Tests

  • test_find_path_dangling_parent: parent_id points outside the map, asserts no KeyError
  • test_find_path_cyclic_parent: cyclic parent links, asserts it terminates

I confirmed both fail on the current code first (KeyError and an infinite loop) and pass with the fix.

python -m unittest discover -s tests/knowledge is green (66 tests). ruff check and ruff format --check pass on both files.

…t links

find_path walked parent_id pointers without checking they stayed inside
the node map. A parent_id pointing at a missing node raised KeyError, and
a cycle in the parent links looped forever. Node data can come straight
from an LLM (Paper is built via Agent(output_type=Paper)), so parent_id
has no referential guarantee and both cases are reachable.

Stop the walk when the chain leaves the node map or revisits a node, and
return the best-effort partial path ending at the requested node. The
happy path is unchanged. Adds tests for the dangling and cyclic cases.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant