-
Notifications
You must be signed in to change notification settings - Fork 491
Description
Bug report
Bug Description
When running the RL post-training script (train_rl.py) from the current main branch, it immediately crashes with a TypeError when initializing RolloutConfig.
It appears that recent commits added new vLLM-specific arguments (like rollout_vllm_hf_config_path and rollout_vllm_additional_config) to the rollout instantiation in MaxText. However, the tunix package (specifically base_rollout.RolloutConfig) does not accept these arguments, leading to an initialization crash.
Point of Failure
The crash occurs here in train_rl.py:
https://www.google.com/search?q=https://github.com/AI-Hypercomputer/maxtext/blob/main/src/maxtext/trainers/post_train/rl/train_rl.py%23L563
This creates a version mismatch where git clone pulls the bleeding-edge MaxText sending these kwargs, but the installed tunix dependency rejects them.
Temporary Workaround
To get the training loop to compile, I had to dynamically monkey-patch base_rollout.RolloutConfig.init in memory to filter out any **kwargs starting with rollout_vllm_ before loading MaxText.
Logs/Output
Error Trace
Traceback (most recent call last):
File "/workspace/train.py", line 122, in <module>
rl_train(trainer_config, sampler_config, trainer_devices, sampler_devices)
File "/usr/local/lib/python3.12/site-packages/maxtext/trainers/post_train/rl/train_rl.py", line 552, in rl_train
rollout_config=base_rollout.RolloutConfig(
^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: RolloutConfig.__init__() got an unexpected keyword argument 'rollout_vllm_hf_config_path'(Note: If you bypass the first one, it throws the exact same error for rollout_vllm_additional_config).
Environment Information
Hardware: TPU v5e (2x4 slice)
Model: Llama 3.1 8B (GRPO)
MaxText Version: main (latest)
Additional Context
No response