CausalDriveBench Docker Operations

Reference for all Docker-related commands, services, and configuration.

Prerequisites

Docker Engine and NVIDIA runtime

# Docker Engine >= 24.x
docker --version

# NVIDIA Container Toolkit (GPU passthrough)
sudo apt-get install -y nvidia-container-toolkit nvidia-docker2
sudo systemctl restart docker

# Verify GPU access inside containers
docker run --rm --gpus all nvidia/cuda:12.1.0-base-ubuntu22.04 nvidia-smi

Environment file

All docker compose commands read from dev.env in the model directory:

cp evaluation/models/<model>/dev.env.example \
   evaluation/models/<model>/dev.env
# Fill in: HF_TOKEN, WEIGHTS_DIR, RAW_DATA_DIR, CAUSAL_BENCH_DIR, OUTPUT_DIR

Run all commands from the model directory:

cd evaluation/models/<model>/

The Five Services

graph LR
    WD[weight-downloader<br/><i>profile: setup</i>]
    OR[original<br/><i>profile: original</i>]
    INF[inference<br/><i>default</i>]
    PP[postprocess<br/><i>profile: postprocess</i>]
    JUP[jupyter<br/><i>profile: jupyter</i>]

    WD -->|weights ready| INF
    INF -->|outputs.jsonl| PP
    INF -.->|same image| OR
    INF -.->|same image| JUP

Service	Profile	Purpose
`weight-downloader`	`setup`	Download model weights from HuggingFace (one-shot)
`original`	`original`	Run vendor's sample inference script (sanity check)
`inference`	(default)	Run benchmark evaluation pipeline
`postprocess`	`postprocess`	Parse outputs, compute accuracy metrics
`jupyter`	`jupyter`	Jupyter notebook for interactive development

Common Workflows

Initial setup

# 1. Build the image
docker compose build

# 2. Download weights (~once per model)
docker compose --profile setup run --rm weight-downloader

Benchmark evaluation

# Quick test — single scene
MODE=single SCENE=nuscenes-scene-0001 docker compose run --rm inference

# Small subset
MODE=subset SUBSET_SIZE=10 docker compose run --rm inference

# Full dataset
docker compose run --rm inference

Post-processing

# Matches the mode/subset/scene used for inference
MODE=single SCENE=nuscenes-scene-0001 \
docker compose --profile postprocess run --rm postprocess

Sanity check (model's own script)

docker compose --profile original run --rm original

Interactive development

# Start Jupyter
docker compose --profile jupyter up jupyter
# Open: http://localhost:8888

# Or drop into a shell
docker compose run --rm --entrypoint bash inference

Build Options

Force a clean rebuild (no layer cache):

docker compose build --no-cache

Pass build args to override defaults:

PYTHON_VERSION=3.11 docker compose build

GPU Configuration

All GPUs (default)

deploy:
  resources:
    reservations:
      devices:
        - driver: nvidia
          count: all
          capabilities: [gpu]

Specific GPU count

devices:
  - driver: nvidia
    count: 2
    capabilities: [gpu]

Specific GPU IDs

devices:
  - driver: nvidia
    device_ids: ["0", "1"]
    capabilities: [gpu]

Multi-GPU with accelerate

In dev.env:

CUDA_VISIBLE_DEVICES=0,1

In inference.py:

self.model = AutoModel.from_pretrained(
    weights_path,
    device_map="auto",
    torch_dtype=torch.bfloat16,
)

Logs

# Live logs from a running service
docker compose logs -f inference

# Logs from last run
docker compose logs inference

Troubleshooting

"CUDA out of memory"

Reduce BATCH_SIZE=1 in dev.env.
Use torch_dtype=torch.bfloat16 when loading the model.
Add torch.cuda.empty_cache() between batches in run_single.

"Could not select device driver"

sudo apt-get install -y nvidia-container-toolkit
sudo systemctl restart docker

"No module named 'evaluation'"

The Dockerfile must set PYTHONPATH to include the repo root, or the script must add it via sys.path. The template uses:

COPY evaluation/ /workspace/repo/evaluation/

And scripts add:

sys.path.insert(0, str(Path(__file__).resolve().parents[3]))

Container exits immediately

Run interactively to see the error:

docker compose run --rm --entrypoint bash inference
# Inside:
python inference.py --mode single --scene nuscenes-scene-0001

Resume after partial run

ModelInference.run_from_jsonl() automatically skips (sample_id, question_id) pairs already written to outputs.jsonl. Re-run the same command to resume.

Weight Download Patterns

HuggingFace Hub (via docker compose)

docker compose --profile setup run --rm weight-downloader

HuggingFace Hub (manual)

docker compose run --rm --entrypoint "" inference \
    huggingface-cli download "$MODEL_REPO" --local-dir /workspace/weights

Direct URL

docker compose run --rm --entrypoint "" inference \
    wget -q --show-progress -O /workspace/weights/model.bin \
        https://example.com/model.bin

AWS S3

docker compose run --rm --entrypoint "" inference \
    aws s3 cp s3://my-bucket/weights/ /workspace/weights/ --recursive

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CausalDriveBench Docker Operations

Prerequisites

Docker Engine and NVIDIA runtime

Environment file

The Five Services

Common Workflows

Initial setup

Benchmark evaluation

Post-processing

Sanity check (model's own script)

Interactive development

Build Options

GPU Configuration

All GPUs (default)

Specific GPU count

Specific GPU IDs

Multi-GPU with accelerate

Logs

Troubleshooting

"CUDA out of memory"

"Could not select device driver"

"No module named 'evaluation'"

Container exits immediately

Resume after partial run

Weight Download Patterns

HuggingFace Hub (via docker compose)

HuggingFace Hub (manual)

Direct URL

AWS S3

FilesExpand file tree

docker_operations.md

Latest commit

History

docker_operations.md

File metadata and controls

CausalDriveBench Docker Operations

Prerequisites

Docker Engine and NVIDIA runtime

Environment file

The Five Services

Common Workflows

Initial setup

Benchmark evaluation

Post-processing

Sanity check (model's own script)

Interactive development

Build Options

GPU Configuration

All GPUs (default)

Specific GPU count

Specific GPU IDs

Multi-GPU with accelerate

Logs

Troubleshooting

"CUDA out of memory"

"Could not select device driver"

"No module named 'evaluation'"

Container exits immediately

Resume after partial run

Weight Download Patterns

HuggingFace Hub (via docker compose)

HuggingFace Hub (manual)

Direct URL

AWS S3