fix(knowledge): guard TreeKnowledge.find_path against malformed parent links#80
Open
saivedant169 wants to merge 1 commit into
Open
Conversation
…t links find_path walked parent_id pointers without checking they stayed inside the node map. A parent_id pointing at a missing node raised KeyError, and a cycle in the parent links looped forever. Node data can come straight from an LLM (Paper is built via Agent(output_type=Paper)), so parent_id has no referential guarantee and both cases are reachable. Stop the walk when the chain leaves the node map or revisits a node, and return the best-effort partial path ending at the requested node. The happy path is unchanged. Adds tests for the dangling and cyclic cases.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
TreeKnowledge.find_pathcould raiseKeyErroror loop forever on malformed parent links.Why
The walk-up loop followed
parent_idwithout checking the target stayed innodes:parent_idpointing at a node not in the map raisedKeyErrorTreeNode.parent_idisUUID | Nonewith no referential validator, and aPaper(aTreeKnowledge) is populated byAgent(output_type=Paper), so the parent links come from model output and carry no guarantee. Both cases are reachable from normal extraction.Fix
Stop the walk when the chain leaves the node map or revisits a node, tracking visited ids to break cycles. The path still ends at the requested node and returns the best-effort partial path for a broken chain instead of raising or hanging. The happy path is unchanged.
I went with a partial path rather than
[]on a broken chain so callers still get the resolvable ancestors. Happy to switch to[]if you prefer a stricter contract.Tests
test_find_path_dangling_parent: parent_id points outside the map, asserts noKeyErrortest_find_path_cyclic_parent: cyclic parent links, asserts it terminatesI confirmed both fail on the current code first (KeyError and an infinite loop) and pass with the fix.
python -m unittest discover -s tests/knowledgeis green (66 tests).ruff checkandruff format --checkpass on both files.