Skip to content

Fix: laminar trace timeline to account for idle wait time#415

Open
Rainhunter13 wants to merge 4 commits intoOpenHands:mainfrom
lmnr-ai:fix_laminar_evals_timeline
Open

Fix: laminar trace timeline to account for idle wait time#415
Rainhunter13 wants to merge 4 commits intoOpenHands:mainfrom
lmnr-ai:fix_laminar_evals_timeline

Conversation

@Rainhunter13
Copy link
Contributor

@Rainhunter13 Rainhunter13 commented Feb 13, 2026

Fix evaluation datapoint trace timeline when number of tasks exceed concurrent workers.

Previously, the root "Evaluation" span was created in the parent process alongside the datapoint, before a worker picked up the task. When more tasks were queued than available workers, the span included idle wait time, resulting in inflated durations and misleading timelines in Laminar:

image

Now, the "Evaluation" span is created in the child process when execution actually begins. The span's trace ID is then linked back to the Laminar datapoint via update_datapoint, so the timeline accurately reflects real execution time.
Requires lmnr>=0.7.41 for update_datapoint trace ID support.

*The diff of evaluation.py looks a bit off due a new try except block + linter auto changes, might be better to look with "Hide whitespaces" option

@simonrosenberg simonrosenberg self-requested a review February 13, 2026 18:15
Copy link
Collaborator

@simonrosenberg simonrosenberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@simonrosenberg
Copy link
Collaborator

@OpenHands please resolve conflicts

@simonrosenberg
Copy link
Collaborator

@Rainhunter13 is this PR still relevant?

@openhands-ai
Copy link

openhands-ai bot commented Feb 27, 2026

I'm on it! simonrosenberg can track my progress at all-hands.dev

@Rainhunter13
Copy link
Contributor Author

@simonrosenberg yes, that issue still exists during your evals so I think would be beneficial to merge this

Copy link
Collaborator

I resolved the merge conflicts by merging main into this branch. However, since I don't have push access to the fork and maintainer_can_modify is disabled on this PR, I've pushed the resolved branch to origin/fix_laminar_evals_timeline on the upstream repo.

To update this PR, you can either:

  1. Enable "Allow edits from maintainers" on this PR, and I can push the changes
  2. Pull the resolved branch from upstream and push to your fork:
    git fetch https://github.com/OpenHands/benchmarks.git fix_laminar_evals_timeline
    git checkout fix_laminar_evals_timeline
    git push origin fix_laminar_evals_timeline --force
  3. Close this PR and I can open a new one from the upstream branch

@openhands-ai
Copy link

openhands-ai bot commented Feb 27, 2026

Summary

I successfully resolved the merge conflicts in PR #415 and pushed the changes to the upstream repository.

Changes Made:

  1. pyproject.toml: Resolved dependency conflict

    • Kept lmnr>=0.7.41 (required for the PR's update_datapoint feature)
    • Added the macOS exclusion for multi-swe-bench from main
  2. benchmarks/utils/evaluation.py: Resolved code structure conflict

    • Kept the PR's approach: creating eval_span inside the child process
    • Preserved the try/finally blocks for proper span cleanup
    • Maintained the update_datapoint_trace_id call to link datapoints with traces

Verification:

✅ All pre-commit checks passed (Ruff format, Ruff lint, PEP8, Pyright)

Push Status:

  • Changes pushed to origin/fix_laminar_evals_timeline on upstream (OpenHands/benchmarks)
  • Could not push directly to the fork due to permission restrictions (maintainer_can_modify is disabled)
  • Left a comment on PR Fix: laminar trace timeline to account for idle wait time #415 with instructions for the PR author to sync from upstream

The resolved branch is available at OpenHands/benchmarks:fix_laminar_evals_timeline and ready for the PR author to pull into their fork.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants