Skip to content

[BUG] Confusing machine names in Jenkins CI #3780

@crypdick

Description

@crypdick

Add Link

My PR for a new tutorial: #3763

Describe the bug

THe Jenkins CI has a few hard-coded behaviors that are confusing. The linux.16xlarge.nvidia.gpu label in Jenkins appears to be a legacy label that means "needs multi-GPU". It doesn't describe the actual hardware that is used.

get_files_to_run.py has a hard-coded check for this exact key, which routes it to shard 0. Shard 0 maps to WORKER_ID=1 in the matrix (also confusing), and then shard 1's runner is a linux.g5.12xlarge.nvidia.gpu, not a 16xl.

Reporting this since it resulted in some confusion while getting my CI for my PR to pass. If it were me, the key should be '4-gpu' instead of linux.16xlarge.nvidia.gpu.

Describe your environment

PyTorch tutorial CI

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions