Skip to content

Latest commit

 

History

History
1446 lines (1090 loc) · 26.4 KB

File metadata and controls

1446 lines (1090 loc) · 26.4 KB

Flash CLI Troubleshooting Guide

Solutions to common Flash CLI problems organized by command and error type.

Table of Contents


Installation Issues

Command Not Found: flash

Problem: Bash cannot find the flash command

Symptoms:

$ flash --version
bash: flash: command not found

Solutions:

1. Install with uv (recommended):

# Install uv if needed
curl -LsSf https://astral.sh/uv/install.sh | sh

# Verify installation
uv run flash --version

2. Install with pip (alternative):

pip install runpod-flash

# Verify installation
flash --version

3. Check PATH:

# Find where flash is installed
which flash

# If not in PATH, add pip bin directory
export PATH="$PATH:$HOME/.local/bin"

# Add to shell profile for persistence
echo 'export PATH="$PATH:$HOME/.local/bin"' >> ~/.bashrc
source ~/.bashrc

3. Use virtual environment:

# Create and activate venv
python -m venv .venv
source .venv/bin/activate  # macOS/Linux
# .venv\Scripts\activate  # Windows

# Install in venv
pip install runpod-flash
flash --version

4. Check Python version:

python --version  # Should be 3.10+

# If too old, install newer Python
# macOS: brew install python@3.11
# Ubuntu: sudo apt install python3.11

References:

Import Error After Installation

Problem: Flash imports fail even after installation

Symptoms:

$ flash run
Traceback (most recent call last):
  ImportError: cannot import name 'remote' from 'runpod_flash'

Solutions:

1. Reinstall Flash:

pip uninstall runpod-flash
pip install runpod-flash

# Or upgrade to latest
pip install --upgrade runpod-flash

2. Check for conflicting packages:

pip list | grep runpod

# Uninstall all runpod packages
pip uninstall runpod runpod-flash runpod-python

# Reinstall Flash only
pip install runpod-flash

3. Fresh virtual environment:

# Remove old venv
rm -rf .venv

# Create new
python -m venv .venv
source .venv/bin/activate
pip install runpod-flash

References:


flash init Problems

Directory Already Exists

Problem: Cannot initialize project because directory exists

Symptoms:

$ flash init my-api
Error: Directory 'my-api' already exists. Use --force to overwrite.

Solutions:

1. Use different name:

flash init my-api-v2

2. Initialize in existing directory:

cd my-api
flash init .

3. Force overwrite:

flash init my-api --force
# Warning: This overwrites existing files

4. Remove existing directory:

# Backup first if needed
mv my-api my-api.backup

# Then initialize
flash init my-api

References:

Permission Denied

Problem: Cannot create project directory due to permissions

Symptoms:

$ flash init my-api
Error: Permission denied: '/path/to/directory'

Solutions:

1. Check directory permissions:

ls -la /path/to/directory

# Fix permissions
chmod u+w /path/to/directory

2. Create in user-owned directory:

cd ~
flash init my-api

# Or in current directory
mkdir my-api && cd my-api
flash init .

3. Don't use sudo:

# Wrong: Creates files owned by root
sudo flash init my-api

# Right: Creates files owned by you
flash init my-api

References:


flash run Issues

Port Already in Use

Problem: Cannot start server because port is occupied

Symptoms:

$ flash run
ERROR: [Errno 48] error while attempting to bind on address ('127.0.0.1', 8888): address already in use

Solutions:

1. Use different port:

flash run --port 9000

2. Find and kill process using port:

# Find process
lsof -ti:8888

# Kill process
lsof -ti:8888 | xargs kill -9

# Or manually
lsof -i:8888  # Shows PID
kill <pid>

3. Use environment variable:

export FLASH_PORT=9000
flash run

References:

Module Not Found Error

Problem: Python cannot find required modules

Symptoms:

$ flash run
ModuleNotFoundError: No module named 'fastapi'

Solutions:

1. Install dependencies:

pip install -e .

2. Check virtual environment:

# Verify venv is activated
which python  # Should point to .venv/bin/python

# If not, activate it
source .venv/bin/activate

3. Install missing package:

pip install fastapi uvicorn

4. Check pyproject.toml:

[project]
dependencies = [
    "runpod-flash>=1.0.0",
    "fastapi>=0.100.0",
    "uvicorn>=0.23.0",
]

References:

Hot Reload Not Working

Problem: Code changes don't trigger server restart

Symptoms:

  • Save file
  • No server restart message
  • Changes not reflected in API

Solutions:

1. Check reload is enabled:

# Reload is default, but verify:
flash run  # Should show "StatReload" in output

2. Manually restart:

# Press Ctrl+C to stop
# Run again
flash run

3. Check file watching:

# Ensure files aren't ignored
cat .gitignore  # Uvicorn respects .gitignore

# Move files if needed
mv ignored_dir/worker.py workers/worker.py

4. Disable and re-enable reload:

# Try without reload
flash run --no-reload

# Then with reload
flash run

References:

Cannot Access from Network

Problem: Server not accessible from other devices on network

Symptoms:

  • http://localhost:8888 works on dev machine
  • http://192.168.1.100:8888 doesn't work from phone

Solutions:

1. Bind to 0.0.0.0:

flash run --host 0.0.0.0

2. Check firewall:

# macOS: System Preferences → Security & Privacy → Firewall
# Add Python to allowed apps

# Linux (ufw):
sudo ufw allow 8888

# Linux (firewalld):
sudo firewall-cmd --add-port=8888/tcp --permanent
sudo firewall-cmd --reload

3. Find your IP address:

# macOS
ifconfig | grep "inet "

# Linux
ip addr show

# Use this IP from other devices

References:


flash build Failures

Archive Size Exceeds Limit

Problem: Build archive exceeds 500MB deployment limit

Symptoms:

$ flash build
ERROR: Archive size (523MB) exceeds 500MB limit
Deployment will fail. Reduce archive size.

Solutions:

1. Identify large packages:

# After build, check package sizes
du -sh .build/lib/* | sort -h | tail -20

Common large packages:

156M    torch
89M     torchvision
45M     transformers
23M     opencv-python
18M     scipy

2. Exclude packages in base image:

flash build --exclude torch,torchvision,torchaudio

Runpod base image includes:

  • torch, torchvision, torchaudio (PyTorch stack)
  • transformers (Hugging Face)
  • tensorflow, keras (TensorFlow stack)
  • jax, jaxlib (JAX)
  • opencv-python (OpenCV)
  • numpy, scipy, pandas (Scientific computing)
  • pillow (Image processing)

Check Runpod documentation for complete list.

3. Use --no-deps:

flash build --no-deps --exclude torch,torchvision

Only installs direct dependencies, not transitive ones.

4. Remove unnecessary dependencies:

# Edit pyproject.toml
[project]
dependencies = [
    "runpod-flash>=1.0.0",
    # Remove packages not used at runtime:
    # "pytest",        # Testing only
    # "black",         # Development only
    # "pandas",        # If not needed for inference
]

Then rebuild:

flash build

5. Check .flashignore:

# Add large files not needed at runtime
echo "tests/" >> .flashignore
echo "docs/" >> .flashignore
echo "*.md" >> .flashignore
echo "data/" >> .flashignore
echo "models/*.onnx" >> .flashignore  # If using PyTorch versions

Verification:

# After changes, check size
flash build
ls -lh artifact.tar.gz

References:

Dependency Installation Failed

Problem: pip cannot install a required package

Symptoms:

$ flash build
ERROR: Could not find a version that satisfies the requirement your-package>=1.0.0

Solutions:

1. Check package name:

# Fix typo in pyproject.toml
[project]
dependencies = [
    "scikit-learn>=1.0.0",  # Not "sklearn"
]

2. Check version constraints:

# Relax version constraint
dependencies = [
    "your-package>=0.5.0",  # Was >=1.0.0
]

3. Test installation locally:

pip install your-package>=1.0.0

# If fails locally, won't work in build either

4. Check Python version compatibility:

# Package may require newer Python
[project]
requires-python = ">=3.10"  # Check if package needs 3.11+

References:

Manylinux Compatibility Error

Problem: Package has no Linux-compatible wheel

Symptoms:

$ flash build
ERROR: Package 'your-package' has no compatible wheels for manylinux2014_x86_64

Solutions:

1. Find alternative package:

# Some packages have Linux alternatives
# Example: Use 'python-magic' instead of 'pymagic'

2. Build from source (if build dependencies available):

# Some packages can build from source
# May require additional system packages in base image

3. Contact package maintainer:

  • Report issue on package GitHub
  • Request manylinux wheels

4. Use pure Python alternative:

  • Find package with no C extensions
  • Slower but more compatible

References:

Build Fails with Import Error

Problem: Build process fails when importing application code

Symptoms:

$ flash build
ERROR: Cannot import module 'main'
ImportError: No module named 'your_dependency'

Solutions:

1. Add missing dependency:

[project]
dependencies = [
    "your-dependency>=1.0.0",
]

2. Check circular imports:

# Avoid circular imports
# Bad: module A imports B, B imports A

# Good: Restructure to avoid cycle

3. Check sys.path issues:

# Don't modify sys.path in application code
# Let Flash handle paths

References:

Permission Denied Writing to .build/

Problem: Cannot write to build directory

Symptoms:

$ flash build
ERROR: Permission denied: '.build/lib'

Solutions:

1. Remove .build directory:

rm -rf .build
flash build

2. Fix permissions:

chmod -R u+w .build
flash build

3. Don't run with sudo:

# Wrong: Creates root-owned files
sudo flash build

# Right:
flash build

References:


flash deploy Errors

Missing API Key

Problem: Runpod API key not configured

Symptoms:

$ flash deploy
Error: RUNPOD_API_KEY environment variable not set

Solutions:

1. Use flash login (recommended):

# If you installed Flash via uv (recommended)
uv run flash login

# Or, if flash is installed globally
flash login

2. Set environment variable:

export RUNPOD_API_KEY=your-key-here

# Verify
echo $RUNPOD_API_KEY

3. Add to .env file (for local CLI use):

echo "RUNPOD_API_KEY=your-key-here" >> .env

# Loaded into os.environ for CLI commands

4. Get API key:

  1. Visit https://runpod.io/console/user/settings
  2. Click "API Keys"
  3. Create new key or copy existing
  4. Set environment variable

5. Make persistent (bash/zsh):

echo 'export RUNPOD_API_KEY=your-key-here' >> ~/.bashrc
source ~/.bashrc

References:

Environment Not Found

Problem: Specified environment doesn't exist

Symptoms:

$ flash deploy --env production
Error: Environment 'production' not found

Solutions:

1. List available environments:

flash env list

2. Create environment:

flash env create production
flash deploy --env production

3. Check spelling:

# Case-sensitive
flash deploy --env Production  # Wrong
flash deploy --env production  # Right

4. Deploy without --env:

# Auto-selects if only one environment
flash deploy

References:

Upload Failed

Problem: Cannot upload artifact to Runpod

Symptoms:

$ flash deploy --env production
Uploading artifact...
ERROR: Upload failed: Connection timeout

Solutions:

1. Check internet connection:

ping runpod.io

# Test API connectivity
curl -I https://api.runpod.io

2. Retry deployment:

flash deploy --env production

3. Check firewall:

  • Ensure HTTPS outbound traffic allowed
  • Check corporate firewall/proxy settings

4. Reduce archive size:

# Smaller files upload faster
flash deploy --env production --exclude torch,torchvision

5. Check file size:

ls -lh artifact.tar.gz
# Very large files may timeout

References:

Endpoint Creation Failed (Insufficient GPUs)

Problem: Runpod has no available GPUs of requested type

Symptoms:

$ flash deploy --env production
Creating endpoints...
ERROR: Failed to create endpoint: Insufficient GPU availability

Solutions:

1. Switch to a commonly-available GPU type:

# before (specific GPU)
@Endpoint(name="worker", gpu=GpuType.NVIDIA_A100_80GB_PCIe)

# after (widely available)
@Endpoint(name="worker", gpu=GpuType.NVIDIA_GEFORCE_RTX_4090)

Redeploy:

flash deploy --env production

2. Use GpuGroup for maximum flexibility:

# Accepts any GPU in the group
gpu=GpuGroup.ADA_24
# or any available GPU at all
gpu=GpuGroup.ANY

3. Wait and retry:

# GPUs may become available
sleep 300  # Wait 5 minutes
flash deploy --env production

4. Check Runpod status:

References:

Authentication Failed

Problem: API key is invalid or lacks permissions

Symptoms:

$ flash deploy --env production
ERROR: Authentication failed: Invalid API key

Solutions:

1. Verify API key:

echo $RUNPOD_API_KEY
# Should show your key (starts with a letter, contains alphanumeric)

2. Generate new key:

  1. Visit https://runpod.io/console/user/settings
  2. Revoke old key
  3. Create new key
  4. Update environment variable

3. Check key permissions:

  • Ensure key has serverless permissions
  • Some keys are read-only

4. Update environment variable:

export RUNPOD_API_KEY=new-key-here
flash deploy --env production

References:

Deployment Succeeds But Endpoints Don't Respond

Problem: Endpoints created but return errors or timeouts

Symptoms:

# Deployment succeeds
$ flash deploy --env production
✓ Deployment successful!

# But testing fails
$ curl -X POST https://endpoint.runpod.io/run ...
ERROR: 500 Internal Server Error

Solutions:

1. Check Runpod console logs:

  1. Visit https://runpod.io/console/serverless
  2. Click on endpoint
  3. View "Logs" tab
  4. Look for error messages

2. Test with preview first:

flash deploy --preview
# Test locally before deploying

3. Common runtime errors:

A. Import errors:

ModuleNotFoundError: No module named 'your_module'

Fix:

# Add to pyproject.toml
dependencies = ["your-module>=1.0.0"]

B. File not found:

FileNotFoundError: 'models/model.pt'

Fix:

# Ensure file in git
git add models/model.pt

# Or check .flashignore doesn't exclude it

C. GPU not available:

RuntimeError: CUDA not available

Fix:

# ensure GPU specified in Endpoint
@Endpoint(name="worker", gpu=GpuType.NVIDIA_GEFORCE_RTX_4090)

4. Redeploy after fixing:

flash deploy --env production

References:


Environment Management

Cannot Delete Environment (Has Endpoints)

Problem: Environment has active endpoints and cannot be deleted

Symptoms:

$ flash env delete staging
Error: Environment has 3 active endpoints
Delete endpoints first or use --force

Solutions:

1. Delete endpoints first:

# List endpoints
flash undeploy list

# Delete individually
flash undeploy staging-endpoint-1
flash undeploy staging-endpoint-2

# Or delete all
flash undeploy --all --force

# Then delete environment
flash env delete staging

2. Force delete (deletes endpoints too):

flash env delete staging --force

References:

Cannot Create Environment (Name Exists)

Problem: Environment name already in use

Symptoms:

$ flash env create production
Error: Environment 'production' already exists

Solutions:

1. Use different name:

flash env create production-v2

2. Delete existing:

flash env delete production
flash env create production

3. Check existing environments:

flash env list

References:


API Key Problems

API Key Not Recognized

Problem: Flash cannot find or read API key

Symptoms:

$ flash deploy
Error: RUNPOD_API_KEY not set

Even after setting the variable.

Solutions:

1. Check variable is exported:

# Wrong: Not exported
RUNPOD_API_KEY=your-key

# Right: Exported
export RUNPOD_API_KEY=your-key

2. Check in same terminal:

# Set variable
export RUNPOD_API_KEY=your-key

# Use in same terminal session
flash deploy

3. Use .env file (for local CLI use):

# Create .env in project root
echo "RUNPOD_API_KEY=your-key" > .env

# Loaded into os.environ for CLI commands
flash deploy

4. Check for typos:

# Variable name is case-sensitive
RUNPOD_API_KEY  # Correct
runpod_api_key  # Wrong
Runpod_Api_Key  # Wrong

References:

API Key Has Expired

Problem: API key no longer valid

Symptoms:

$ flash deploy
ERROR: Authentication failed: API key expired

Solutions:

1. Re-authenticate with flash login (recommended):

flash login

2. Or generate a new key manually:

  1. Visit https://runpod.io/console/user/settings
  2. Revoke expired key
  3. Create new key
  4. Update environment variable or .env file

3. Update in all locations:

# Update environment variable
export RUNPOD_API_KEY=new-key

# Or update .env file (for local CLI use)
echo "RUNPOD_API_KEY=new-key" > .env

# Update CI/CD secrets
# GitHub: Settings → Secrets → Update RUNPOD_API_KEY

References:


Network and Connectivity

Cannot Reach Runpod API

Problem: Network cannot connect to Runpod services

Symptoms:

$ flash deploy
ERROR: Connection failed: Network unreachable

Solutions:

1. Check internet connection:

ping google.com
ping runpod.io

2. Test API endpoint:

curl -I https://api.runpod.io
# Should return 200 OK or similar

3. Check firewall/proxy:

  • Ensure HTTPS (443) outbound allowed
  • Check corporate proxy settings
  • Try from different network (mobile hotspot)

4. Check DNS resolution:

nslookup runpod.io
# Should return IP address

5. Try again later:

# May be temporary network issue
sleep 60
flash deploy

References:

Slow Upload/Download

Problem: Artifact upload or deployment is very slow

Symptoms:

  • Upload progress bar stuck at low percentage
  • Deployment takes > 10 minutes

Solutions:

1. Check internet speed:

# Use speedtest-cli or online speed test
speedtest-cli

# If slow, consider:
# - Wired connection instead of WiFi
# - Different network

2. Reduce archive size:

# Smaller files upload faster
flash build --exclude torch,torchvision,torchaudio

# Check size
ls -lh artifact.tar.gz

3. Try at different time:

  • Network congestion varies by time
  • Try off-peak hours

References:


General Debugging Tips

Enable Verbose Logging

# Some commands support -v or --verbose
# Check command help
flash <command> --help

Check Version

flash --version
# Ensure you have latest version

# Update if needed
pip install --upgrade runpod-flash

Clean State

# Remove build artifacts
rm -rf .build artifact.tar.gz

# Remove cache
pip cache purge

# Fresh virtual environment
rm -rf .venv
python -m venv .venv
source .venv/bin/activate
pip install runpod-flash

Get Help

Documentation:

Command-specific help:

flash <command> --help

Community:

Support:


Diagnostic Checklist

When troubleshooting any issue:

  • Flash installed and in PATH (flash --version)
  • Python version >= 3.10 (python --version)
  • Virtual environment activated (which python)
  • Dependencies installed (pip list)
  • API key set (echo $RUNPOD_API_KEY)
  • Internet connectivity (ping runpod.io)
  • Sufficient disk space (df -h)
  • No permission issues (ls -la)
  • Recent Flash version (pip install --upgrade runpod-flash)
  • Checked logs and error messages
  • Tested locally first (flash run)
  • Reviewed documentation

Emergency Recovery

Broken Production Deployment

Immediate actions:

# 1. Undeploy broken version
flash undeploy production-endpoint --force

# 2. Checkout last known good version
git log --oneline  # Find commit hash
git checkout <good-commit-hash>

# 3. Redeploy
flash deploy --env production

# 4. Verify
curl -X POST https://production-endpoint/run ...

# 5. Return to main branch
git checkout main

# 6. Fix issue properly
# ... make changes ...
flash run  # Test locally
flash deploy --env staging  # Test in staging
flash deploy --env production  # Redeploy to production

Lost Configuration

Recover from Runpod console:

  1. Visit https://runpod.io/console/serverless
  2. Note endpoint configurations
  3. Recreate local configuration
  4. Redeploy to sync

Complete Reset

Start fresh:

# 1. Remove all deployments
flash undeploy --all --force

# 2. Delete all environments
flash env list
flash env delete dev
flash env delete staging
flash env delete production

# 3. Clean local state
rm -rf .build artifact.tar.gz .runpod/

# 4. Fresh virtual environment
rm -rf .venv
python -m venv .venv
source .venv/bin/activate
pip install runpod-flash

# 5. Reinstall dependencies
pip install -e .

# 6. Test locally
flash run

# 7. Redeploy
flash env create production
flash deploy --env production

Preventive Measures

Avoid issues before they happen:

  1. Always test locally first:

    flash run
    # Test all endpoints
  2. Use preview before deploying:

    flash deploy --preview
  3. Deploy to staging first:

    flash deploy --env staging
    # Test thoroughly
    flash deploy --env production
  4. Monitor builds:

    # Check size after build
    flash build
    ls -lh artifact.tar.gz  # Should be < 500MB
  5. Keep dependencies minimal:

    # Only include runtime dependencies
    [project]
    dependencies = [
        "runpod-flash>=1.0.0",
        # ... only what you need
    ]
  6. Document working configuration:

    • Save working pyproject.toml
    • Note GPU types that work
    • Document exclusion patterns
  7. Use version control:

    git add .
    git commit -m "Working deployment"
    # Easy to rollback if needed
  8. Regular cleanup:

    # Weekly
    flash undeploy --cleanup-stale

Quick Reference

Most Common Issues:

Issue Quick Fix
Command not found pip install runpod-flash
Port in use flash run --port 9000
Build too large flash build --exclude torch,torchvision
Missing API key export RUNPOD_API_KEY=your-key
Environment not found flash env create <name>
Module not found pip install -e .
Upload failed Retry or reduce size
GPU unavailable Use gpu=GpuGroup.ANY or gpu=GpuType.ANY

Diagnostic Commands:

flash --version              # Check Flash version
flash <command> --help       # Command-specific help
flash env list               # List environments
flash undeploy list          # List endpoints
pip list                     # Check installed packages
echo $RUNPOD_API_KEY        # Verify API key
du -sh .build/lib/* | sort -h | tail -10  # Check package sizes

For additional help: