Skip to content

feat(p2p): route federation with shared clusterrouting policy (load + VRAM)#10124

Open
localai-bot wants to merge 2 commits into
masterfrom
feat/p2p-federation-clusterrouting
Open

feat(p2p): route federation with shared clusterrouting policy (load + VRAM)#10124
localai-bot wants to merge 2 commits into
masterfrom
feat/p2p-federation-clusterrouting

Conversation

@localai-bot
Copy link
Copy Markdown
Collaborator

What

Phase 2 of unified cluster routing: the p2p federation server now selects peers with the shared pkg/clusterrouting policy (least in-flight, then most free VRAM) instead of its ad-hoc least-used counter, and each node gossips its free GPU VRAM so that tier has data.

Why

Phase 1 (#10123) extracted the replica-selection policy into the dependency-light pkg/clusterrouting. This wires the p2p federation side onto that same policy, so both of LocalAI's distributed transports (NATS distributed mode and p2p federation) now make routing decisions through one implementation. VRAM-awareness means that among equally-loaded peers, requests prefer the one with more free GPU headroom.

How

  • Gossip free VRAM (commit 1): add AvailableVRAM uint64 to the gossiped schema.NodeData, populated in the edgevpn announce loop from xsysinfo.GetGPUAggregateInfo().FreeVRAM (the same source the NATS worker heartbeat uses). Also split NodeData.IsOnline() into a clock-injectable IsOnlineAt(now) (with a NodeOnlineWindow const) for testability.
  • Candidate-based selection (commit 2): replace FederatedServer.SelectLeastUsedServer with SelectBestServer, which maps the online peers into clusterrouting.ReplicaCandidates and calls the Phase 1 PickBestReplica. The existing per-peer request counter remains the load signal; VRAM breaks ties (previously arbitrary map-iteration order).

Behavior

  • In-flight primary ordering is preserved (no load-balancing regression); the only change is that the tiebreak among equally-loaded peers becomes a deterministic largest-free-VRAM instead of arbitrary map order.
  • Backward-compatible gossip: NodeData is JSON-marshaled into the edgevpn ledger, so adding a uint64 field is safe in a mixed-version swarm (old peers ignore the unknown key; new peers read zero from old peers, which the policy treats as the lowest VRAM tier).
  • New Ginkgo tests for core/schema (online-window boundary, VRAM field) and the first tests for core/p2p (candidate building + policy ranking). go test -race ./core/p2p/ ./core/schema/ is green; golangci-lint: 0 issues.

Scope note

Federation is L4 (it never parses the request), so model-aware routing is not achievable here and is deferred to Phase 3, which introduces the L7 HTTP-terminating proxy (also enabling prefix-cache affinity by reusing the existing prefixcache.ExtractChain).


Assisted-by: Claude Code:claude-opus-4-8

mudler added 2 commits June 1, 2026 07:54
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
… VRAM)

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants