fula-ai-training

Off-device infrastructure for the Blox AI model improvement loop. Two siblings live in this repo:

Transcript intake server (server/) — Phase 20 of the parent plan; receives opt-in anonymized troubleshooting transcripts uploaded by the Blox phone app.
Training pipeline (corpus/, labeller/, training/, compilation/, publish/) — Phase 19 sub-plan; turns intake transcripts into new .rkllm model files + bumps the device-side /etc/fula/ai-manifest.json so the fleet upgrades.

This README covers the training pipeline. The intake server has its own contract in server/README.md (TLS posture, schema, PII scanner).

Privacy contract (inherited from Phase 20)

Transcripts in corpus/raw/ are already anonymized on the phone before upload (peerIds, IPs, paths, SSIDs, BSSIDs, timestamps stripped per Phase 21's anonymizeTranscript.ts). The intake server runs a defense-in-depth PII scanner before persisting. The training pipeline inherits this: it never reattaches identifiers, never re-runs the model on raw transcripts before the labeler reviewed them.

If you find PII in corpus/raw/, that's an intake-server failure — fix the scanner and re-evaluate every affected transcript before training on it.

The 9 sub-phases

Sub-phase	Module	Purpose
19.1	`corpus/sync_corpus.py`	Pull raw transcripts from intake-server object storage; idempotent + schema-validating
19.2	`labeller/app.py`	Tiny FastAPI tool: developer reviews transcripts + flags verdict/action correctness
19.3	`corpus/test_set/`	Hand-authored canonical scenarios that the eval gate runs against
19.4	`training/lora_train.py`	PEFT-driven LoRA fine-tune on labelled transcripts
19.5	`training/eval_held_out.py`	Hard-gate scorer (≥85% verdict / ≥90% root-cause / ≥80% action / ZERO whitelist violations)
19.6	`compilation/rkllm_convert.sh`	Wrapper around Rockchip's RKLLM-Toolkit; converts merged HF weights to W8A8 `.rkllm`
19.7	`compilation/lab_canary.sh`	A/B test on the lab device before any CDN upload
19.8	`publish/update_manifest.py`	Bump `current`/`rollback` entries in `ai-manifest.json`; uploads model + manifest to CDN
19.9	`OPERATIONS.md`	Triage runbook: how to interpret eval reports, roll back, add test scenarios

Cadence + operational ingredients

A full cycle (transcripts → new published model) requires:

Ingredient	Where it lives	Who provides
≥500 labelled transcripts	`corpus/labelled/`	accumulates from intake-server over weeks; manual labelling pass via `labeller/app.py`
Base Qwen 2.5 3B Instruct weights	Hugging Face: `Qwen/Qwen2.5-3B-Instruct`	downloaded by `training/lora_train.py`
GPU box with CUDA 11.8+ + ≥24 GB VRAM	external	operator
Rockchip RKLLM-Toolkit	installed on the GPU box; pinned version in `compilation/pin_rkllm_toolkit.md`	operator (Rockchip developer account)
Lab RK3588 device for A/B canary	`pi@192.168.68.107`	shared lab device
CDN upload credentials	env vars / config — see `publish/README.md`	operator

Target cycle time: 4-8 hours of operator time per cycle once the pipeline is operational. 1-2 cycles per month after the first 500 transcripts accumulate.

Quick start (once you have a labelled corpus)

# On the GPU box
pip install -e ".[train,eval]"

# 1. Sync transcripts from intake server's S3 bucket to corpus/raw/
python -m corpus.sync_corpus --bucket s3://fula-ai-training-prod-transcripts

# 2. (Manual) Label transcripts via the labeller UI
uvicorn labeller.app:app --host 127.0.0.1 --port 8765
# Open http://127.0.0.1:8765/ in a browser and label each transcript.

# 3. Fine-tune LoRA on the labelled set
python -m training.lora_train --config training/configs/qwen3b_lora.yaml

# 4. Evaluate against held-out test scenarios (HARD GATE — exits non-zero on failure)
python -m training.eval_held_out --adapter training/output/<date>/adapter

# 5. Merge LoRA + convert to RKLLM W8A8
python -m compilation.merge_lora --adapter training/output/<date>/adapter
bash compilation/rkllm_convert.sh training/output/<date>/merged

# 6. A/B canary on the lab device (deploys to :test tag first)
bash compilation/lab_canary.sh training/output/<date>/merged.rkllm

# 7. After canary passes (manual sign-off), publish + bump manifest
python -m publish.update_manifest \
    --new-current-model-version $(date +%Y-%m-%d) \
    --new-current-url https://functionyard.fx.land/qwen-3b-$(date +%Y-%m-%d).rkllm \
    --new-current-sha256 $(sha256sum training/output/<date>/merged.rkllm | awk '{print $1}') \
    --new-current-size-bytes $(stat -c%s training/output/<date>/merged.rkllm)

What this pipeline does NOT do

On-device fine-tuning: RK3588 is inference-only; LoRA happens off-device.
Continuous fine-tuning without human-in-the-loop: every cycle goes through a manual labelling pass.
Auto-promotion: the eval gate is advisory until a human reviews the report. Final publication is always an operator-driven step.
Cross-org transcript sharing: each org's transcripts stay in that org's bucket; the pipeline doesn't aggregate across deployments.

Tests

pip install -e ".[dev,eval]"
pytest -ra -q

Server tests live in server/tests/; pipeline tests live in tests/. Both run from one pytest invocation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

fula-ai-training

Privacy contract (inherited from Phase 20)

The 9 sub-phases

Cadence + operational ingredients

Quick start (once you have a labelled corpus)

What this pipeline does NOT do

Tests

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
.github/workflows		.github/workflows
compilation		compilation
corpus		corpus
labeller		labeller
publish		publish
server		server
tests		tests
training		training
.gitignore		.gitignore
Dockerfile		Dockerfile
OPERATIONS.md		OPERATIONS.md
README.md		README.md
docker-compose.yml		docker-compose.yml
install.sh		install.sh
pyproject.toml		pyproject.toml

Folders and files

Latest commit

History

Repository files navigation

fula-ai-training

Privacy contract (inherited from Phase 20)

The 9 sub-phases

Cadence + operational ingredients

Quick start (once you have a labelled corpus)

What this pipeline does NOT do

Tests

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages