Fix: laminar trace timeline to account for idle wait time by Rainhunter13 · Pull Request #415 · OpenHands/benchmarks

Rainhunter13 · 2026-02-13T15:40:12Z

Fix evaluation datapoint trace timeline when number of tasks exceed concurrent workers.

Previously, the root "Evaluation" span was created in the parent process alongside the datapoint, before a worker picked up the task. When more tasks were queued than available workers, the span included idle wait time, resulting in inflated durations and misleading timelines in Laminar:

Now, the "Evaluation" span is created in the child process when execution actually begins. The span's trace ID is then linked back to the Laminar datapoint via update_datapoint, so the timeline accurately reflects real execution time.
Requires lmnr>=0.7.41 for update_datapoint trace ID support.

*The diff of evaluation.py looks a bit off due a new try except block + linter auto changes, might be better to look with "Hide whitespaces" option

simonrosenberg

LGTM, thanks!

simonrosenberg · 2026-02-27T16:28:36Z

@OpenHands please resolve conflicts

simonrosenberg · 2026-02-27T16:28:50Z

@Rainhunter13 is this PR still relevant?

openhands-ai · 2026-02-27T16:28:54Z

I'm on it! simonrosenberg can track my progress at all-hands.dev

Rainhunter13 · 2026-02-27T16:30:33Z

@simonrosenberg yes, that issue still exists during your evals so I think would be beneficial to merge this

simonrosenberg · 2026-02-27T16:33:58Z

I resolved the merge conflicts by merging main into this branch. However, since I don't have push access to the fork and maintainer_can_modify is disabled on this PR, I've pushed the resolved branch to origin/fix_laminar_evals_timeline on the upstream repo.

To update this PR, you can either:

Enable "Allow edits from maintainers" on this PR, and I can push the changes

Pull the resolved branch from upstream and push to your fork:

git fetch https://github.com/OpenHands/benchmarks.git fix_laminar_evals_timeline
git checkout fix_laminar_evals_timeline
git push origin fix_laminar_evals_timeline --force

Close this PR and I can open a new one from the upstream branch

openhands-ai · 2026-02-27T16:34:57Z

Summary

I successfully resolved the merge conflicts in PR #415 and pushed the changes to the upstream repository.

Changes Made:

pyproject.toml: Resolved dependency conflict
- Kept lmnr>=0.7.41 (required for the PR's update_datapoint feature)
- Added the macOS exclusion for multi-swe-bench from main
benchmarks/utils/evaluation.py: Resolved code structure conflict
- Kept the PR's approach: creating eval_span inside the child process
- Preserved the try/finally blocks for proper span cleanup
- Maintained the update_datapoint_trace_id call to link datapoints with traces

Verification:

✅ All pre-commit checks passed (Ruff format, Ruff lint, PEP8, Pyright)

Push Status:

Changes pushed to origin/fix_laminar_evals_timeline on upstream (OpenHands/benchmarks)
Could not push directly to the fork due to permission restrictions (maintainer_can_modify is disabled)
Left a comment on PR Fix: laminar trace timeline to account for idle wait time #415 with instructions for the PR author to sync from upstream

The resolved branch is available at OpenHands/benchmarks:fix_laminar_evals_timeline and ready for the PR author to pull into their fork.

Rainhunter13 added 4 commits February 13, 2026 14:39

start evaluation span in child process upon task execution

797a14d

chore

74934ec

always end span

b5acc80

fix tests

72037af

simonrosenberg self-requested a review February 13, 2026 18:15

simonrosenberg approved these changes Feb 13, 2026

View reviewed changes

simonrosenberg mentioned this pull request Feb 27, 2026

Fix: laminar trace timeline to account for idle wait time #464

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix: laminar trace timeline to account for idle wait time#415

Fix: laminar trace timeline to account for idle wait time#415
Rainhunter13 wants to merge 4 commits intoOpenHands:mainfrom
lmnr-ai:fix_laminar_evals_timeline

Rainhunter13 commented Feb 13, 2026 •

edited

Loading

Uh oh!

simonrosenberg left a comment

Uh oh!

simonrosenberg commented Feb 27, 2026

Uh oh!

simonrosenberg commented Feb 27, 2026

Uh oh!

openhands-ai bot commented Feb 27, 2026

Uh oh!

Rainhunter13 commented Feb 27, 2026

Uh oh!

simonrosenberg commented Feb 27, 2026

Uh oh!

openhands-ai bot commented Feb 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Rainhunter13 commented Feb 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

simonrosenberg left a comment

Choose a reason for hiding this comment

Uh oh!

simonrosenberg commented Feb 27, 2026

Uh oh!

simonrosenberg commented Feb 27, 2026

Uh oh!

openhands-ai bot commented Feb 27, 2026

Uh oh!

Rainhunter13 commented Feb 27, 2026

Uh oh!

simonrosenberg commented Feb 27, 2026

Uh oh!

openhands-ai bot commented Feb 27, 2026

Summary

Changes Made:

Verification:

Push Status:

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Rainhunter13 commented Feb 13, 2026 •

edited

Loading