PyLet Architecture

This document describes how PyLet works and why it's designed this way.

Overview

┌──────────────┐     poke      ┌──────────────────┐
│  Controller  │──────────────>│    Scheduler     │
│  (FastAPI)   │               │   (in-process)   │
└──────┬───────┘               └────────┬─────────┘
       │                                │
       │         ┌──────────────┐       │
       └────────>│    SQLite    │<──────┘
                 │     (WAL)    │
                 └──────────────┘
                       ^
                       │ heartbeat (long-poll)
       ┌───────────────┴───────────────┐
       │               │               │
  ┌────┴────┐    ┌────┴────┐    ┌────┴────┐
  │ Worker  │    │ Worker  │    │ Worker  │
  └─────────┘    └─────────┘    └─────────┘

Head node: Runs the controller (FastAPI server) and scheduler. Single source of truth via SQLite.

Workers: Connect to head, receive desired state via heartbeat, reconcile local processes.

The One Primitive: Instance

PyLet has exactly one concept: the instance - a process with resource allocation.

An instance has:

A command to run
Resource requirements (CPU, GPU, memory)
A lifecycle (PENDING → ASSIGNED → RUNNING → COMPLETED/FAILED)
An optional endpoint (host:port) for service discovery

That's it. No pods, replicas, services, deployments, or jobs. Higher-level systems compose instances via labels and application logic.

Fine-Grained GPU Scheduling

PyLet provides precise GPU control because research workloads need it. These features emerged from real use cases (ServerlessLLM, etc.).

Physical GPU indices (gpu_indices): Request specific GPUs by index. Exposed via CUDA_VISIBLE_DEVICES.
GPU sharing (exclusive): When false, GPUs aren't reserved exclusively. Enables daemons (e.g., model storage servers) to coexist with inference instances.
Worker placement (target_worker): Target a specific worker (e.g., where a model is cached).

Worker Reconciliation Model

Workers don't receive commands. They receive desired state and reconcile:

Desired (from head):  [instance_a@attempt=2, instance_b@attempt=1]
Actual (local):       [instance_a@attempt=1, instance_c@attempt=1]

Reconcile:
  - instance_a@attempt=1: stale attempt → kill
  - instance_b@attempt=1: not running → start
  - instance_c@attempt=1: not desired → kill

This is declarative: head says "what should be", worker figures out "how to get there".

Benefits:

Crash recovery: worker restarts, receives desired state, reconciles
Network partition: stale workers don't affect correctness (attempt fencing)
Simplicity: no command queue, no ack/retry logic

Instance Lifecycle

PENDING ──[assign]──> ASSIGNED ──[start]──> RUNNING ──[exit]──> COMPLETED
    │                    │                     │                    │
    │                    │                     │                 FAILED
    │                    │                     │
    │                    └─[worker offline]────┴──> UNKNOWN
    │                                                   │
    └──[cancel]──────────────────────────────────> CANCELLED

State	Meaning
PENDING	Waiting for worker assignment
ASSIGNED	Worker selected, process not yet started
RUNNING	Process is running
UNKNOWN	Worker went offline, outcome unknown
COMPLETED	Process exited with code 0
FAILED	Process exited with code != 0
CANCELLED	User cancelled the instance

Valid transitions (see schemas.py:VALID_TRANSITIONS):

PENDING -> ASSIGNED, CANCELLED
ASSIGNED -> RUNNING, UNKNOWN, FAILED, CANCELLED
RUNNING -> COMPLETED, FAILED, UNKNOWN, CANCELLED
UNKNOWN -> RUNNING, COMPLETED, FAILED, CANCELLED

Cancellation

Cancellation uses a timestamp model (like Kubernetes deletionTimestamp):

User requests cancel → cancellation_requested_at is set
Instance excluded from desired state
Worker sees absence, sends SIGTERM
Grace period (default 30s)
SIGKILL if still running
Worker reports CANCELLED

Heartbeat Protocol

Workers use generation-based long-polling:

Worker sends heartbeat with last_seen_gen and instance reports
Controller processes reports (with attempt fencing)
Controller waits for generation change or timeout (30s)
Controller returns new gen and desired_instances

Cancel-and-reissue: When local state changes (process starts/exits), worker cancels in-flight heartbeat and issues a new one immediately.

Attempt-Based Fencing

Each instance has an attempt counter that increments on each assignment:

Instance assigned to worker A (attempt=1)
Network partition
Instance reassigned to worker B (attempt=2)
Worker A reconnects, reports for attempt=1
Controller ignores (stale attempt)

Only reports matching the current attempt can mutate state. This prevents:

Stale reports from affecting current state
Duplicate execution after partition

System Invariants

These must always hold:

1. Resource Conservation

Allocated resources on worker ≤ worker's total resources

2. Assignment Uniqueness

An instance has at most one worker executing it.

3. Attempt Fencing

Reports with attempt != current_attempt are ignored.

4. Status-Reality Consistency

Status	assigned_to	endpoint	Process exists?	Resources held?
PENDING	None	None	No	No
ASSIGNED	set	None	No	Yes (reserved)
RUNNING	set	set*	Yes	Yes
UNKNOWN	set	maybe	Unknown	Yes
COMPLETED	set	stale	No	No
FAILED	set	stale	No	No
CANCELLED	set/None	None	No	No

*endpoint is set when instance binds to port

5. Endpoint Validity

endpoint != None implies instance is/was RUNNING. Endpoint may be stale after completion.

Components

File	Purpose
`controller.py`	Core scheduling and state management
`worker.py`	Process management and reconciliation
`schemas.py`	Pydantic models, state transitions
`db.py`	SQLite persistence layer
`server.py`	FastAPI HTTP endpoints
`client.py`	Async HTTP client

Data Storage

All state under ~/.pylet/:

Path	Contents
`~/.pylet/pylet.db`	SQLite database (WAL mode)
`~/.pylet/run/`	Worker local state files
`~/.pylet/logs/`	Instance log files

Design Decisions

Why SQLite?

Single file, no external dependencies
WAL mode handles concurrent reads
Good enough for ~100 nodes (target scale)
State survives head restart

Why Long-Poll Heartbeat?

Workers get updates immediately (no polling delay)
Natural distributed rate limiting
Timeout = liveness check built-in

Why Declarative Reconciliation?

Crash recovery is automatic
No command queue to manage
Network partitions don't cause duplicate execution
Simpler than command/ack protocols

Why Single Head Node?

Dramatically simpler than consensus
Single source of truth, no split-brain
Good enough for target scale (~100 nodes)
Can always run head on reliable hardware

Log Capture

Instance stdout/stderr is captured using a sidecar pattern:

Worker wraps each command: (cmd) 2>&1 | python3 -m pylet.log_sidecar <log_dir> <instance_id>
Sidecar writes to rotating log files in ~/.pylet/logs/
Worker runs an HTTP server (port 15599) for log retrieval
Head proxies log requests to workers via /instances/{id}/logs

Why sidecar? The log capture process survives even if the instance crashes, ensuring logs aren't lost. The pipe pattern also allows log rotation without instance cooperation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PyLet Architecture

Overview

The One Primitive: Instance

Fine-Grained GPU Scheduling

Worker Reconciliation Model

Instance Lifecycle

Cancellation

Heartbeat Protocol

Attempt-Based Fencing

System Invariants

1. Resource Conservation

2. Assignment Uniqueness

3. Attempt Fencing

4. Status-Reality Consistency

5. Endpoint Validity

Components

Data Storage

Design Decisions

Why SQLite?

Why Long-Poll Heartbeat?

Why Declarative Reconciliation?

Why Single Head Node?

Log Capture

FilesExpand file tree

architecture.md

Latest commit

History

architecture.md

File metadata and controls

PyLet Architecture

Overview

The One Primitive: Instance

Fine-Grained GPU Scheduling

Worker Reconciliation Model

Instance Lifecycle

Cancellation

Heartbeat Protocol

Attempt-Based Fencing

System Invariants

1. Resource Conservation

2. Assignment Uniqueness

3. Attempt Fencing

4. Status-Reality Consistency

5. Endpoint Validity

Components

Data Storage

Design Decisions

Why SQLite?

Why Long-Poll Heartbeat?

Why Declarative Reconciliation?

Why Single Head Node?

Log Capture