Skip to content

Log RL metrics per environment#3446

Merged
yobibyte merged 6 commits intoNVIDIA:mainfrom
yobibyte:vitalyk/perenvlog
Feb 18, 2026
Merged

Log RL metrics per environment#3446
yobibyte merged 6 commits intoNVIDIA:mainfrom
yobibyte:vitalyk/perenvlog

Conversation

@yobibyte
Copy link
Copy Markdown
Contributor

@yobibyte yobibyte commented Feb 16, 2026

This PR makes 2 main things:

  • Adds per-environment wandb logging for rl runs, e.g. we can now track average rollout length per environment.
  • Refactors the wandb logic. We now also do not store the mean/min/max, but compute them on the fly. I factor out prep_wandb_metrics() function and test some logic of it by a unit test.

@yobibyte yobibyte requested a review from a team as a code owner February 16, 2026 23:31
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented Feb 16, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@ko3n1g ko3n1g requested a review from a team February 16, 2026 23:32
@yobibyte yobibyte changed the title Vitalyk/perenvlog Log RL metrics per environment Feb 16, 2026
Copy link
Copy Markdown
Member

@Phlip79 Phlip79 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for adding great tests!

@tdene
Copy link
Copy Markdown
Contributor

tdene commented Feb 17, 2026

/ok to test 51e0005

Comment thread megatron/rl/rl_utils.py
Comment thread megatron/rl/rl_utils.py Outdated
env_idx = [i for i, eidx in enumerate(group_stats.env_ids) if eidx == env_id]

# Advantages are flattened, we need to be more careful with those.
group_turn_counts = [sum(nt) for nt in num_turns]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't need to be inside the for loop

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved out.

Comment thread megatron/rl/rl_utils.py Outdated
Comment thread megatron/rl/rl_utils.py Outdated
rewards: list[float]
rewards: list[list[float]] # inner list is for a group
env_ids: list[int] # same length as len(rewards)
turn_lens: list[list[int]] #
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Empty comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

Comment thread megatron/rl/rl_utils.py Outdated
Comment thread megatron/rl/rl_utils.py
Comment on lines +1197 to +1199
for g in rollouts:
if g[0].env_id not in example_groups:
example_groups[g[0].env_id] = g
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RewardOnlyAgent says env_id: str | None = None. But in practice, all of our environments do set an env_id, so in practice it's not a problem.

Still though, we should either None-check here, or we should enforce env_id as required in RewardOnlyAgent.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not know why we had None there. I added env_id to Countdown which was the only None-id env probably because we forget. Maybe @ArEsKay3 remembers why we need None there.

@yobibyte yobibyte requested review from a team as code owners February 18, 2026 09:47
return_log_probs = bool(req.get("logprobs", False))
top_n_logprobs = int(req.get("top_logprobs", 0)) if return_log_probs else 0
skip_prompt_log_probs = bool(req.get("skip_prompt_log_probs", False))
skip_prompt_log_probs = bool(req.get("skip_prompt_log_probs", True))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove this diff; it's already been fixed in main, and having this diff adds extra reviewer burden on other teams.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@tdene tdene removed the request for review from a team February 18, 2026 12:12
@yobibyte yobibyte enabled auto-merge February 18, 2026 12:35
@yobibyte
Copy link
Copy Markdown
Contributor Author

/ok to test a8abff4

@svcnvidia-nemo-ci
Copy link
Copy Markdown

🔄 Merge queue validation started!

You can track the progress here: https://github.com/NVIDIA/Megatron-LM/actions/runs/22144219623

Merged via the queue into NVIDIA:main with commit f1908bc Feb 18, 2026
46 of 48 checks passed
@yobibyte yobibyte deleted the vitalyk/perenvlog branch February 18, 2026 15:24
BoxiangW pushed a commit to BoxiangW/Megatron-LM that referenced this pull request Mar 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants