Deepcopy env_extras dicts for each sample when preparing generator input#1189
Deepcopy env_extras dicts for each sample when preparing generator input#1189SumanthRH merged 2 commits intoNovaSky-AI:mainfrom
Conversation
There was a problem hiding this comment.
Code Review
The pull request addresses a potential bug where env_extras dictionaries were not deep-copied, leading to unintended modifications across samples. The change correctly implements copy.deepcopy to ensure each sample receives an independent copy of env_extras, preventing side effects. This is a good fix for data integrity.
| env_extras = [ | ||
| copy.deepcopy(prompt["env_extras"]) | ||
| for prompt in prompts | ||
| for _ in range(n_samples_per_prompt) |
There was a problem hiding this comment.
This change correctly uses copy.deepcopy to ensure that each env_extras dictionary is an independent copy. This prevents unintended modifications to other samples when one sample's env_extras is altered, addressing the bug described in the PR description. This is a critical fix for data integrity.
copy.deepcopy(prompt["env_extras"])
for prompt in prompts
for _ in range(n_samples_per_prompt)There was a problem hiding this comment.
Devin Review found 1 new potential issue.
🐛 1 issue in files not directly in the diff
🐛 Deepcopy fix not applied to duplicate prepare_generator_input in skyrl-train package (skyrl-train/skyrl_train/generators/utils.py:382)
The env_extras shared-reference bug that this PR fixes in skyrl/train/generators/utils.py also exists in the duplicate file skyrl-train/skyrl_train/generators/utils.py, which was not updated. The skyrl-train copy is actively used by skyrl_train/trainer.py:211, skyrl_train/evaluate.py:64, skyrl_train/fully_async_trainer.py:542, and other callers.
Root Cause
The repository maintains two near-identical copies of prepare_generator_input: one in skyrl/train/generators/utils.py (fixed by this PR) and one in skyrl-train/skyrl_train/generators/utils.py (not fixed). The only differences between the files are import paths and this copy.deepcopy change.
At skyrl-train/skyrl_train/generators/utils.py:382, the code still reads:
env_extras = [prompt["env_extras"] for prompt in prompts for _ in range(n_samples_per_prompt)]This means all n_samples_per_prompt entries for the same prompt share the same dictionary reference. If any downstream code mutates one sample's env_extras, it silently corrupts all other samples for that prompt.
Impact: Users of the skyrl-train package (which includes the main trainer, evaluator, and async trainer) still have the shared-reference bug that this PR intends to fix.
View 2 additional findings in Devin Review.
When preparing
GeneratorInput, the sameenv_extrasis used for each sample per prompt. Since the dictionaries are not copied, this can lead to bugs where changing the dictionary for one sample changes them for all samples in that group (for the same prompt).This PR makes a deep copy of the
env_extrasfor each sample.