Skip to content

test: add 57 unit tests for FrameTracer and DivergenceDetector#209

Open
acailic wants to merge 4 commits into
mainfrom
feat/issue-208-frame-tracer-divergence-tests
Open

test: add 57 unit tests for FrameTracer and DivergenceDetector#209
acailic wants to merge 4 commits into
mainfrom
feat/issue-208-frame-tracer-divergence-tests

Conversation

@acailic
Copy link
Copy Markdown
Owner

@acailic acailic commented Jun 6, 2026

Summary

Coverage added

FrameTracer (agent_debugger_sdk/core/frame_tracer.py):

  • TokenUsage: arithmetic, serialization
  • ExceptionInfo: to_dict with/without traceback
  • FrameEvent: field defaults, serialization, exception capture
  • FrameLifetimeTrace: construction, empty case
  • build_frame_tree: empty, single root, parent-child, multi-root wrap
  • get_frame_by_id, get_frames_at_depth, filter_frames_by_name
  • get_cost_breakdown: grouping, error count, empty trace
  • FrameCaptureContext: add/enter/exit frame, build_trace, token/duration sums
  • set_frame_context / get_frame_context: global context roundtrip
  • capture_function_call: no-context passthrough, frame capture, exception capture, kwarg form

DivergenceDetector (agent_debugger_sdk/core/divergence_detector.py):

  • DivergenceType / DivergenceSeverity: enum string values
  • DivergencePoint: to_dict minimal and with timestamp
  • SessionComparison: defaults, to_dict
  • detect_divergences: empty inputs, session ID extraction, identical traces, count divergence, summary keys, score bounds
  • compare_session_structures: key presence, high similarity for identical events
  • analyze_temporal_divergence: empty inputs, zero divergence, duration difference, key presence
  • analyze_behavioral_divergence: empty inputs, decision/tool counts, key presence, score bounds

Test plan

  • pytest -q tests/test_frame_tracer_divergence.py → 57 passed
  • ruff check . → all checks passed

🤖 Generated with Claude Code

acailic and others added 3 commits June 5, 2026 14:48
Makes `evidence` an optional keyword argument (default `None`, treated as
`[]`) in `RecordingMixin.record_decision`. All existing callers already
pass evidence explicitly so this is non-breaking.

Also adds lightweight drift-event collection to `record_decision` and
wires `_drift_events`/`_drift_compare_index` onto `TraceContext.restore`
so the previously-skipped drift-emission test now passes.

Closes #205

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ison fixes

- Add `*` after `chosen_action` in `record_decision` to make `evidence`
  and remaining params keyword-only, preventing accidental positional use
  and protecting existing positional callers
- Use clamped `event.confidence` instead of raw `confidence` in drift
  event_dict to match what is actually persisted
- Add `action` alias alongside `chosen_action` in drift event_dict so
  baselines using either key are matched
- Advance `_drift_compare_index` to the next decision event in the
  baseline (skipping non-decision events) to prevent index misalignment

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Covers agent_debugger_sdk/core/frame_tracer.py and divergence_detector.py
which previously had zero test coverage despite being 600+ line modules.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings June 6, 2026 20:53
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds missing unit test coverage for two previously untested SDK research modules (FrameTracer + DivergenceDetector), and adjusts drift-tracking plumbing used during checkpoint restore/replay.

Changes:

  • Add tests/test_frame_tracer_divergence.py with 57 unit tests covering frame_tracer.py and divergence_detector.py.
  • Update replay-depth integration test to align with current restore post-checkpoint filtering / traces API response shape.
  • Extend drift tracking state on restore and add drift detection/collection during record_decision.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

File Description
tests/test_replay_depth_l3.py Updates drift replay test to use timestamp filtering + /traces response shape and asserts drift is collected on context.
tests/test_frame_tracer_divergence.py New test suite providing coverage for FrameTracer + DivergenceDetector public helpers and serialization.
agent_debugger_sdk/core/recorders.py Makes evidence optional, adds drift detection during decision recording, and collects drift events.
agent_debugger_sdk/core/context/trace_context.py Initializes _drift_events and _drift_compare_index on restore to support drift tracking.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 97 to +103
async def record_decision(
self,
reasoning: str,
confidence: float,
evidence: list[dict[str, Any]],
chosen_action: str,
*,
evidence: list[dict[str, Any]] | None = None,
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The * placement after chosen_action is intentional — it makes evidence and all subsequent params keyword-only while keeping reasoning, confidence, and chosen_action as positional. This matches the fix applied in PR #207 (commit f380508) and is the goal of this change: callers can now omit evidence without reordering. Callers using the old signature (reasoning, confidence, evidence_list, chosen_action) would break, but that signature was already changed in PR #207; this PR carries the same convention.

Comment on lines +130 to +145
drift_index = getattr(self, "_drift_compare_index", 0)
event_dict = {
"event_type": "decision",
"data": {
"chosen_action": chosen_action,
"action": chosen_action,
"confidence": event.confidence,
},
}
drift = drift_detector.compare(event_dict, drift_index)
# Advance to the next decision event in the baseline, skipping non-decision events
next_index = drift_index + 1
original_events = getattr(drift_detector, "original_events", [])
while next_index < len(original_events) and original_events[next_index].get("event_type") != "decision":
next_index += 1
self._drift_compare_index = next_index
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in acdacd5. Before calling compare(), we now advance drift_index forward past any non-decision events in original_events. This ensures the comparison always targets an actual decision event position, preventing silent missed drift when non-decision events appear before the first (or any subsequent) decision in the baseline.

Comment on lines +127 to +131
# Detect drift against the original execution if a detector is active
drift_detector = getattr(self, "_drift_detector", None)
if drift_detector is not None:
drift_index = getattr(self, "_drift_compare_index", 0)
event_dict = {
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Acknowledged. The SDK behavior changes (evidence keyword-only + drift detection in record_decision) are prerequisites for the tests to exercise correctly — the tests validate this runtime behavior. The PR description has been noted; if desired, the title can be updated to reflect the dual scope (runtime fix + test coverage).

…ift compare

Previously drift_compare_index could point at a non-decision event at
the start of the baseline (e.g. after restore), causing compare() to
silently miss the first decision's drift. Now advance past non-decision
events before comparing, then advance to the next decision for the
subsequent call.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

test: add unit tests for FrameTracer and DivergenceDetector (zero coverage)

2 participants