Skip to content

fix: track all tool invocations in ToolMetrics instead of overwriting#1892

Open
giulio-leone wants to merge 1 commit intostrands-agents:mainfrom
giulio-leone:fix/tool-metrics-all-invocations
Open

fix: track all tool invocations in ToolMetrics instead of overwriting#1892
giulio-leone wants to merge 1 commit intostrands-agents:mainfrom
giulio-leone:fix/tool-metrics-all-invocations

Conversation

@giulio-leone
Copy link
Contributor

Problem

ToolMetrics.add_call() overwrites self.tool with the latest invocation on every call (line 142: self.tool = tool). When users inspect AgentResult.metrics.tool_metrics, they can only see the last call's toolUseId and input — all previous invocations are lost.

This means to_dict() / get_summary() only expose the final tool call's data, making it impossible to audit which inputs were passed across multiple invocations of the same tool.

Closes #301

Solution

  • Added invocations: list[ToolUse] field to ToolMetrics that accumulates every call
  • Updated add_call() to append each tool invocation to the list
  • Updated to_dict() to expose an invocations array with all tool_use_ids and input_params
  • Backwards compatible: self.tool is still updated to the latest invocation — existing code reading metrics.tool continues to work

Changes

  • src/strands/telemetry/metrics.py: Added invocations field, updated add_call() and to_dict()
  • tests/strands/telemetry/test_metrics.py: Updated existing tests + 2 new tests for invocation tracking

Testing

  • All 1888 tests pass (1866 core + 22 telemetry)
  • New tests verify:
    • Multiple invocations are tracked with correct inputs
    • get_summary() includes full invocations array
    • Backwards compatibility (.tool still points to latest)

@giulio-leone
Copy link
Contributor Author

Friendly ping — fixes ToolMetrics to track all tool invocations instead of overwriting previous entries, giving accurate metrics for multi-tool calls.

@giulio-leone giulio-leone force-pushed the fix/tool-metrics-all-invocations branch from 29496bc to ccf5b79 Compare March 19, 2026 11:04
@github-actions github-actions bot added size/s and removed size/s labels Mar 19, 2026
@giulio-leone
Copy link
Contributor Author

Rebased this branch onto the latest main and force-pushed it.

Local validation I ran:

  • .venv/bin/python -m pytest tests/strands/telemetry/test_metrics.py::test_tool_metrics_tracks_all_invocations tests/strands/telemetry/test_metrics.py::test_tool_metrics_invocations_in_summary -q -> 2 passed

Real branch-vs-main proof (using the repo venv and explicit PYTHONPATH=<checkout>/src so the imported module path is unambiguous):

  • main imported sdk-python-main/src/strands/telemetry/metrics.py
    • call_count=3
    • latest_tool_use_id=id3
    • has_invocations_attr=False
    • invocation_count=0
    • summary_invocations=[]
  • this branch imported sdk-python/src/strands/telemetry/metrics.py
    • call_count=3
    • latest_tool_use_id=id3
    • has_invocations_attr=True
    • invocation_count=3
    • summary_invocations=[{"input_params":{"city":"Berlin"},"tool_use_id":"id1"},{"input_params":{"city":"Paris"},"tool_use_id":"id2"},{"input_params":{"city":"Rome"},"tool_use_id":"id3"}]

So the rebased branch preserves backwards compatibility (tool still points to the latest invocation) while fixing the real telemetry gap: repeated calls to the same tool now retain the full invocation history instead of silently overwriting it.

Previously, ToolMetrics.add_call() overwrote self.tool with the latest
invocation, losing all previous tool inputs and IDs. Users inspecting
AgentResult.metrics.tool_metrics could only see the last call's data.

Changes:
- Add 'invocations' list field to ToolMetrics that accumulates every call
- Update add_call() to append each tool invocation to the list
- Update to_dict() / get_summary() to expose invocations array
- Keep self.tool update for backwards compatibility

Closes strands-agents#301
@giulio-leone
Copy link
Contributor Author

Refreshed this branch onto current main and revalidated it on the rebased head.

Repo-native gate (run twice with no code changes between passes):

  • uv run hatch run test-format
  • uv run hatch run test-lint
  • uv run hatch run test -- tests/strands/telemetry/test_metrics.py -q

Results on refreshed head d3c02e483ba1b987c1222c9f1a46e16325ebdb23:

  • ruff check: pass
  • mypy ./src: pass (Success: no issues found in 140 source files)
  • tests/strands/telemetry/test_metrics.py: pass (22 passed)

Direct branch-vs-main proof on exact source trees used the same temporary proof_1892_tool_metrics_invocations.py in both checkouts and asserted three things:

  1. ToolMetrics retains the full invocation history across repeated calls.
  2. EventLoopMetrics.get_summary() exposes the full invocation list.
  3. Backwards compatibility is preserved because .tool still points at the latest invocation.

Proof result:

  • Refreshed branch: 3 passed
  • Current main (fd8168a531c140a0082a3c6412a577fe81db21f0 at verification time): 2 failed, 1 passed

The current main failures are the exact telemetry gap this PR fixes:

  • ToolMetrics still has no invocations history (getattr(..., "invocations", None) == None after 3 calls).
  • EventLoopMetrics.get_summary() still omits the invocations array entirely (KeyError: 'invocations').
  • The compatibility control still passes on both trees, so .tool still tracks the latest invocation exactly as before.

@giulio-leone giulio-leone force-pushed the fix/tool-metrics-all-invocations branch from ccf5b79 to d3c02e4 Compare March 21, 2026 19:18
@github-actions github-actions bot added size/s and removed size/s labels Mar 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] AgentResult.metrics.tool_metrics skips previous tool inputs for repeatedly invoked tools

1 participant