Skip to content

Unable to run tests #19

@IgorZhiltsoff

Description

@IgorZhiltsoff

I tried running

./tools/dist_test_map.sh ./projects/configs/hrmapnet/hrmapnet_maptrv2_nusc_r50_110ep.py ./ckpts/hrmapnet_maptrv2_nuscenes_ep110.pth 1

(the checkpoint is downloaded from the link in repo's README)

and got an error

Traceback (most recent call last):
  File "./tools/test.py", line 264, in <module>
    main()
  File "./tools/test.py", line 229, in main
    model = MMDistributedDataParallel(
  File "/root/miniconda3/envs/smth/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 496, in __init__
    dist._verify_model_across_ranks(self.process_group, parameters)
RuntimeError: NCCL error in: ../torch/lib/c10d/ProcessGroupNCCL.cpp:911, unhandled system error, NCCL version 2.7.8
ncclSystemError: System call (socket, malloc, munmap, etc) failed.

Any ideas how to fix this?


My package versions are not exactly the ones you specified in the installation guide; namely, I

  1. Downgraded av2 to minimum,
  2. Downgared numpy to 1.23.0,
  3. Installed gcc-multilib,
  4. Upgraded gcc to 7 (https://anaconda.org/gouarin/gcc-7),
  5. Upgraded networkx to 3.1

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions