Skip to content

Latest commit

 

History

History
141 lines (114 loc) · 5.68 KB

File metadata and controls

141 lines (114 loc) · 5.68 KB

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Project Purpose

This repository contains scripts for managing LLM development infrastructure services:

  • MLflow: For tracking LLM experiments, model versioning, and experiment management
  • Forgejo: For continuous integration and Git repository management
  • Private & Secure: All services are open source and designed to keep data internal to the system, avoiding cloud-based services

Repository Structure

The repository currently contains:

  • README.md - Comprehensive documentation with setup and usage instructions
  • LICENSE - MIT License
  • config.env.example - Example configuration file with directory paths
  • scripts/setup.sh - Installation script for Forgejo and MLflow
  • scripts/start_services.sh - Service startup script for MLflow and Forgejo
  • scripts/stop_services.sh - Graceful service shutdown script
  • scripts/status_services.sh - Service status checker with URLs and diagnostics
  • scripts/configure_shell.sh - Shell configuration with aliases and enhanced prompts
  • scripts/backup_data.sh - Data backup utility
  • scripts/cleanup_mlflow.sh - MLflow process cleanup utility
  • examples/ - Directory containing demo scripts and usage examples
    • examples/gpu_check.sh - SLURM job script demonstrating GPU checking and MLflow integration
    • examples/README.md - Documentation for example scripts

Infrastructure Scripts

Current scripts:

  • scripts/setup.sh - Installation and initial configuration
  • scripts/start_services.sh - Service startup with port detection
  • scripts/stop_services.sh - Graceful service shutdown
  • scripts/status_services.sh - Check service status and display URLs

Additional scripts implemented:

  • scripts/backup_data.sh - Backup and recovery functionality
  • scripts/cleanup_mlflow.sh - Process management and cleanup
  • scripts/configure_shell.sh - Shell integration and user experience
  • examples/gpu_check.sh - HPC integration example with SLURM and GPU checking

Future enhancements:

  • Docker/containerization scripts (if needed)
  • Additional example scripts for different ML workflows

Development Considerations

  • Security: All scripts should maintain localhost-only access and data privacy
  • Data Privacy: Ensure no sensitive data is exposed or logged
  • Service Management: Scripts should handle service lifecycle reliably
  • Configuration: Support for environment-specific configurations
  • Logging: Implement appropriate logging for troubleshooting without exposing sensitive information

Target Environment

  • Private/secure environment
  • Internal deployment (no external cloud services)
  • Open source tools only
  • Likely containerized deployment (Docker/Kubernetes)

Setup and Installation

Initial Setup

  1. Configure installation paths:

    cp config.env.example config.env
    # Edit config.env with your preferred installation directories
  2. Run the setup script:

    ./scripts/setup.sh

The setup script will:

  • Download and install Forgejo binary to the configured location
  • Create a Python virtual environment and install MLflow
  • Create necessary directories for data, logs, and artifacts
  • Generate basic configuration files
  • Create an activation script for easy environment setup

Configuration Structure

  • config.env: User-specific configuration (created from config.env.example)
  • Base directory: All services installed under a configurable base directory
  • Forgejo: Binary, data, and logs in separate subdirectories
  • MLflow: Python virtual environment with tracking database and artifact storage
  • Logs: Centralized logging directory

Common Operations

Starting Services

  1. Activate the environment:

    source [BASE_DIR]/activate.sh
  2. Start services:

    ./scripts/start_services.sh

The start script:

  • Automatically finds available ports starting from defaults (3000 for Forgejo, 5000 for MLflow)
  • Displays service URLs on startup
  • Saves port information to .forgejo.port and .mlflow.port files
  • Saves process IDs to .forgejo.pid and .mlflow.pid files for reliable shutdown
  • Writes logs to the configured log directories

Prerequisites

  • Python 3.7+ for MLflow virtual environment
  • curl for downloading Forgejo binary
  • openssl for generating security keys

Stopping Services

  • Run ./scripts/stop_services.sh to gracefully shutdown both services
  • Uses saved PID files for reliable process identification
  • Attempts graceful shutdown with SIGTERM first
  • Falls back to SIGKILL if processes don't respond within 10 seconds
  • Includes fallback process search if PID files are missing
  • Cleans up PID and port files, displays final status
  • Preserves log files for troubleshooting

Checking Service Status

  • Run ./scripts/status_services.sh to check current service status
  • Displays process status with PIDs for running services
  • Shows clickable URLs for accessing web interfaces
  • Tests service responsiveness with curl (if available)
  • Identifies orphaned processes (running without PID files)
  • Provides quick access URLs and log file locations
  • Includes service management command reference

Forgejo Setup

  • Initial Setup: On first visit to Forgejo web interface, complete the installation wizard to create admin user
  • Security: Registration is disabled by default for enhanced security
  • Access: Login with admin credentials at the Forgejo web interface URL displayed by status script
  • Single User: Most installations only need the admin user for LLM development work

Other Operations (to be implemented)

  • Backup and recovery procedures
  • Configuration updates