Skip to content

Conversion scripts fail on standard TPU VMs for large models #3418

@zzxslp

Description

@zzxslp

Bug report

The official instruction for model conversion failed on a standard TPU-v5p VM for large models such as QWen3-235B, with CPU OOM errors when sharding MoE layers. Since on GCP the CPU memory is fixed (400GB), I wonder if we can improve the script to bypass this issue, or is the doc outdated?

Also for the model conversion the sharding process with default simulated_cpu_devices_count=16 is very very slow (even for 30B model).

Logs/Output

No response

Environment Information

No response

Additional Context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions