Skip to content

[Bug] Tunix GRPO VllmRollout.get_per_token_logps API mismatch and JAX concatenate type error #3363

@karajendran

Description

@karajendran

Bug report

When running a full end-to-end GRPO training pipeline using VllmRollout on a TPU v5e-8 slice, the rollout engine crashes during the first micro-batch step due to two consecutive API mismatches between tunix/rl/rl_cluster.py and tunix/rl/rollout/vllm_rollout.py.

Issue 1: Unexpected Keyword Argument completion_mask
RLCluster.get_old_per_token_logps passes a completion_mask argument down to the rollout engine, but the VllmRollout.get_per_token_logps method signature does not accept it.

Traceback:

File "/usr/local/lib/python3.12/site-packages/tunix/rl/grpo/grpo_learner.py", line 267, in _generate_and_compute_advantage
    old_per_token_logps = self.rl_cluster.get_old_per_token_logps(...)
TypeError: VllmRollout.get_per_token_logps() got an unexpected keyword argument 'completion_mask'

Issue 2: JAX concatenate Strict Type Error
If the completion_mask issue is bypassed, the pipeline immediately crashes on the output. VllmRollout returns native Python lists, but rl_cluster.py passes them directly to jnp.concatenate(). Under strict JAX versions (like 0.4.25), this throws a TypeError because it expects array-like objects, not lists.

Traceback:

Python
File "/usr/local/lib/python3.12/site-packages/tunix/rl/rl_cluster.py", line 993, in get_old_per_token_logps
    per_token_logps = jnp.concatenate(outs, axis=0)
TypeError: concatenate requires ndarray or scalar arguments, got <class 'list'> at position 0.

Logs/Output

No response

Environment Information

Environment:

Hardware: GKE TPU v5e-8 slice

JAX Version: 0.4.25 (Pinned to maintain mesh stability with with_sharding_constraint)

vLLM Version: 0.17.0rc1

Algorithm: GRPO

Additional Context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions