Skip to content

OpenAgentsInc/psionic

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1,618 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Psionic

Psionic is a Rust-native ML and inference stack.

It owns the machine-facing execution substrate behind local inference, serving, training, distributed execution, artifact truth, and clustered compute. The project is broader than one app or one benchmark lane. It is the crate family that OpenAgents uses for inference, training, cluster bring-up, and execution evidence.

Psionic should be read hardware-first. It owns the admitted hardware strategy for each lane: backend family, residency mode, topology, serving or training role, and the capability, refusal, and evidence surfaces that higher layers consume. Upstream systems such as llama.cpp, vLLM, SGLang, MLX, and other reference repos are inputs for specific layers or hardware classes, not the identity of the shipped Psionic stack.

The training side now also carries one bounded gemma4:e4b CUDA adapter-SFT trainer above the shared adapter substrate: LM-head-only final-hidden-state supervision, frozen-base semantics, typed export, exact checkpoint resume, served-base plus tokenizer compatibility checks, and explicit refusal truth for wider Gemma regions that remain out of scope. The same bounded lane now also closes the first trainer-to-serving refresh seam: typed Gemma checkpoints plus exported adapter artifacts can be revalidated into the live CUDA mesh lane without a process restart, the active served revision is surfaced in response provenance, stale or mismatched revisions fail closed, and operators can roll back to the last known-good promoted revision. The same lane is now also eval-first: it binds one canonical held-out eval pack, one four-split dataset contract, one short baseline sweep against the untuned base, one overlap and decontam gate, one canned promoted-checkpoint vibe-review packet, and one promotion decision that refuses held-out regressions or failed operator review.

Start Here

Main Tracks

Psion Training Shortcut

If you want the current top Psion training lane instead of guessing among benchmark-adjacent lanes, run:

./TRAIN

That command now targets the actual Psion pretraining lane and materializes the retained launch, status, preflight, checkpoint, dashboard, alert, and closeout surfaces under ~/scratch/psion_actual_pretraining_runs/<run_id>.

Use:

./TRAIN --dry-run
./TRAIN resume --run-root <path>
./TRAIN status --run-root <path>

for plan inspection and operator follow-up on the actual lane.

The older bounded reference pilot still exists as the smoke/reference lane:

./TRAIN --lane reference_pilot --dry-run
./TRAIN --lane reference_pilot --mode local_reference

Tassadar Training Shortcut

If you want the current default Tassadar training lane instead of guessing among older bounded benchmark lanes, run:

./TRAIN_TASSADAR

That command now means the bounded trace-bound article-transformer weight-production lane that produces the retained tassadar-article-transformer-trace-bound-trained-v0 family under fixtures/tassadar/runs/tassadar_article_transformer_weight_production_v1.

The lane contract lives in docs/TASSADAR_DEFAULT_TRAIN_LANE.md.

The operator launcher lives in docs/TASSADAR_TRAIN_LAUNCHER.md.

The bounded default-lane rehearsal lives in docs/TASSADAR_DEFAULT_TRAIN_REHEARSAL.md.

Tassadar Executor Lane

Executor-class research and runtime work for exact computation starts with docs/ROADMAP_TASSADAR.md.

Local GPT-OSS Inference

Psionic ships a dedicated local GPT-OSS server in crates/psionic-serve/src/bin/psionic-gpt-oss-server.rs. It exposes:

  • GET /health
  • GET /v1/models
  • POST /v1/chat/completions

Build it:

cargo build -p psionic-serve --bin psionic-gpt-oss-server --release

Run it on a Linux NVIDIA host:

./target/release/psionic-gpt-oss-server \
  -m /path/to/gpt-oss-20b-mxfp4.gguf \
  --backend cuda \
  --host 127.0.0.1 \
  --port 8080 \
  -c 4096 \
  -ngl 999

Run it on Apple Silicon:

./target/release/psionic-gpt-oss-server \
  -m /path/to/gpt-oss-20b-mxfp4.gguf \
  --backend metal \
  --metal-mode native \
  --host 127.0.0.1 \
  --port 8080 \
  -c 1024 \
  -ngl 4

Call it:

curl -s http://127.0.0.1:8080/v1/models | jq

curl -s http://127.0.0.1:8080/v1/chat/completions \
  -H 'content-type: application/json' \
  -d '{
    "model": "gpt-oss-20b-mxfp4.gguf",
    "messages": [
      {"role": "system", "content": "You are ChatGPT."},
      {"role": "user", "content": "Why does HTTPS matter?"}
    ]
  }' | jq

Benchmark it against local llama.cpp:

scripts/benchmark-gpt-oss-vs-llama.sh \
  --psionic-backend cuda \
  --model /path/to/gpt-oss-20b-mxfp4.gguf \
  --llama-bin /path/to/llama-server \
  --json-out /tmp/psionic-gpt-oss-bench

More detail lives in docs/GPT_OSS_LOCAL_SERVING.md.

Installable Mesh Lanes

Psionic also ships crates/psionic-serve/src/bin/psionic-mesh-lane.rs as the supported service-mode entrypoint for durable inference-mesh nodes.

It materializes one lane root with config, file-backed node identity, durable network state, logs, model paths, and generated launchd / systemd service artifacts. openagents and probe integrate against that Psionic-owned service binary and its management surfaces directly; the supported pooled inference path does not depend on any separate mesh sidecar runtime. The full operator runbook lives in docs/MESH_LANE_SERVICE_MODE.md.

GPT-OSS Benchmark Proof

The current benchmark harness is scripts/benchmark-gpt-oss-vs-llama.sh. It uses the explicit GPT-OSS system/developer/user request contract, checks visible output equality, and records prompt-cache-hit throughput.

The closed benchmark proof referenced publicly here is:

  • OpenAgents issue comment: openagents#3248 comment 4028968842
  • exact reported result on that host:
    • Psionic prompt_cache_hit: 172.84 tok/s
    • llama.cpp prompt_cache_hit: 160.98 tok/s
    • prompt_cache_hit_visible_output_match=true
    • visible output: HTTPS protects users by encrypting traffic, preventing tampering, and confirming they are connected to the right website.

That proof is grounded in the shipped server binary, the shipped benchmark script, and the explicit hardware-validation posture in docs/HARDWARE_VALIDATION_MATRIX.md.

Project Shape

The main crate families are:

  • framework core: psionic-core, psionic-ir, psionic-compiler, psionic-runtime
  • backends: psionic-backend-cpu, psionic-backend-cuda, psionic-backend-metal
  • serving and provider surfaces: psionic-serve, psionic-provider, psionic-router
  • cluster and distributed execution: psionic-cluster, psionic-collectives, psionic-distributed, psionic-net
  • training, eval, and optimizer substrate: psionic-train, psionic-data, psionic-eval, psionic-adapters, psionic-optimize

Use docs/WORKSPACE_MAP.md for the full doc index, crate map, and subsystem entrypoints.

About

Rust ML stack

Resources

License

Stars

Watchers

Forks

Contributors

Languages