Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 18 additions & 0 deletions .changeset/p1-control-plane.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
---
'@objectstack/rest': patch
'@objectstack/service-cluster-redis': patch
---

perf(rest): cache hostname→environment resolution; document cluster pub/sub durability (P1-4, P1-5)

- **rest (P1-4):** `resolveByHostname()` ran on every unscoped request — a
control-plane lookup (typically a DB query) in the hot path. `RestServer` now
caches `hostname → environmentId` in-memory with a 30s TTL across all three
resolution sites, caching negative results too so unknown hosts don't hammer the
registry. Registry errors are not cached, so a transient blip self-heals.
- **service-cluster-redis (P1-5):** recorded the durability contract for
`metadata.changed` in `pubsub.ts`. Redis pub/sub is at-most-once **by design**;
the event is a cache-invalidation hint only — the durable source of truth is the
transactional `sys_metadata` (+ `sys_metadata_history`) write, so a missed event
causes a stale cache until the next reload, never data loss. No code change to
the delivery semantics; risk accepted and documented.
11 changes: 9 additions & 2 deletions docs/launch-readiness.md
Original file line number Diff line number Diff line change
Expand Up @@ -144,7 +144,7 @@ fix or acceptance.**
- **Risk:** `resolveByHostname()` runs on every unscoped request → control-plane
latency spike under load; silent fallback to default project masks it.
- **Action:** Add an in-memory TTL cache (~30s) for `hostname → environmentId`.
- **Owner:** _______ · Verify · Sign-off ☐ · Notes: _______
- **Owner:** _______ · Verify ✅ (confirmed real @ `main`) · Sign-off ☐ · Notes: Fixed — `RestServer.resolveHostnameCached()` caches hostname→env (positive **and** negative) for 30s across all 3 call sites; +3 tests. Awaiting human sign-off.

### P1-5 — Cluster pub/sub is fire-and-forget (metadata-changed)
- **Area:** `service-cluster-redis` — `src/pubsub.ts:~75–90`
Expand All @@ -153,7 +153,14 @@ fix or acceptance.**
- **Action:** Acceptable for non-critical events if documented; ensure schema
mutations re-sync on error boundaries (history exists in `sys_metadata_history`).
Record the durability contract.
- **Owner:** _______ · Verify ☐ · Sign-off ☐ · Notes: _______
- **Durability contract (recorded):** Redis pub/sub is **at-most-once** by design
(already noted in `pubsub.ts`). `metadata.changed` is a **cache-invalidation hint
only** — the durable source of truth is the transactional write to `sys_metadata`
(+ `sys_metadata_history`). A node that misses the event serves its cached schema
until the next reload and **loses no data** (self-heals on reload/restart against
the DB). Documented in `pubsub.ts publish()`. **Accept** for v1: no exactly-once
state may flow through this channel — durable state uses an outbox.
- **Owner:** _______ · Verify ✅ (by design — contract recorded) · Sign-off ☐ · Notes: No code fix needed; risk **accepted** with rationale + code comment. Awaiting human sign-off.

---

Expand Down
31 changes: 28 additions & 3 deletions packages/rest/src/rest-server.ts
Original file line number Diff line number Diff line change
Expand Up @@ -437,6 +437,15 @@ export class RestServer {
private routeManager: RouteManager;
private kernelManager?: RestKernelManager;
private envRegistry?: RestEnvRegistry;
/**
* Short-TTL cache for `hostname → environmentId` (P1-4). `resolveByHostname`
* is a control-plane lookup (typically a DB query) that otherwise runs on
* *every* unscoped request; caching it — including negative results, so
* unknown hosts don't hammer the registry — removes that per-request cost.
* The TTL is short so a newly-bound hostname becomes routable quickly.
*/
private readonly hostnameCache = new Map<string, { value: { environmentId: string } | null; expiresAt: number }>();
private readonly hostnameCacheTtlMs = 30_000;
private defaultEnvironmentIdProvider?: () => string | undefined;
private authServiceProvider?: (environmentId?: string) => Promise<any | undefined>;
private objectQLProvider?: (environmentId?: string) => Promise<any | undefined>;
Expand Down Expand Up @@ -501,13 +510,29 @@ export class RestServer {
* (and any other client) speak a single, uniform URL family without
* duplicating route logic for the platform surface.
*/
/**
* Cached wrapper around `envRegistry.resolveByHostname` (P1-4). Returns the
* cached result while fresh; on a miss it queries the registry and caches the
* outcome (positive *and* negative) for {@link hostnameCacheTtlMs}. Registry
* errors are not cached so a transient control-plane blip self-heals on the
* next request.
*/
private async resolveHostnameCached(host: string): Promise<{ environmentId: string } | null | undefined> {
const now = Date.now();
const hit = this.hostnameCache.get(host);
if (hit && hit.expiresAt > now) return hit.value;
const result = (await this.envRegistry!.resolveByHostname(host)) ?? null;
this.hostnameCache.set(host, { value: result, expiresAt: now + this.hostnameCacheTtlMs });
return result;
}

private async resolveProtocol(environmentId?: string, req?: any): Promise<ObjectStackProtocol> {
if (environmentId === 'platform') return this.protocol;
if (!environmentId && req && this.envRegistry && this.kernelManager) {
const host = this.extractHostname(req);
if (host) {
try {
const result = await this.envRegistry.resolveByHostname(host);
const result = await this.resolveHostnameCached(host);
if (result?.environmentId) environmentId = result.environmentId;
} catch {
// fall through to next strategy
Expand Down Expand Up @@ -567,7 +592,7 @@ export class RestServer {
const host = this.extractHostname(req);
if (host) {
try {
const result = await this.envRegistry.resolveByHostname(host);
const result = await this.resolveHostnameCached(host);
if (result?.environmentId) environmentId = result.environmentId;
} catch { /* fall through */ }
}
Expand Down Expand Up @@ -644,7 +669,7 @@ export class RestServer {
const host = this.extractHostname(req);
if (host) {
try {
const result = await this.envRegistry.resolveByHostname(host);
const result = await this.resolveHostnameCached(host);
if (result?.environmentId) environmentId = result.environmentId;
} catch { /* fall through */ }
}
Expand Down
37 changes: 37 additions & 0 deletions packages/rest/src/rest.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -1324,6 +1324,43 @@ describe('RestServer.resolveProtocol', () => {
expect(f.kernelManager.getOrCreate).toHaveBeenCalledWith('proj_a');
});

it('caches hostname→env resolution within the TTL and refreshes after it (P1-4)', async () => {
const projectKernel = makeKernel('proj_a');
const resolveByHostname = vi.fn().mockResolvedValue({ environmentId: 'proj_a' });
const f = makeFixture({
envRegistry: { resolveByHostname, resolveById: vi.fn() },
kernels: { proj_a: projectKernel },
});
let now = 1_000_000;
const spy = vi.spyOn(Date, 'now').mockImplementation(() => now);
try {
const req = { headers: { host: 'a.example.com' } };
await (f.rest as any).resolveProtocol(undefined, req);
await (f.rest as any).resolveProtocol(undefined, req);
await (f.rest as any).resolveProtocol(undefined, req);
expect(resolveByHostname).toHaveBeenCalledTimes(1); // 2nd/3rd served from cache

now += 31_000; // past the 30s TTL
await (f.rest as any).resolveProtocol(undefined, req);
expect(resolveByHostname).toHaveBeenCalledTimes(2); // refreshed
} finally {
spy.mockRestore();
}
});

it('caches a negative result so unknown hosts do not hammer the registry (P1-4)', async () => {
const resolveByHostname = vi.fn().mockResolvedValue(null);
const f = makeFixture({
envRegistry: { resolveByHostname, resolveById: vi.fn() },
defaultProvider: () => 'proj_local',
kernels: { proj_local: makeKernel('proj_local') },
});
const req = { headers: { host: 'unknown.example.com' } };
await (f.rest as any).resolveProtocol(undefined, req);
await (f.rest as any).resolveProtocol(undefined, req);
expect(resolveByHostname).toHaveBeenCalledTimes(1); // negative result cached
});

it('routes via X-Environment-Id header when hostname resolution fails', async () => {
const projectKernel = makeKernel('proj_b');
const resolveById = vi.fn().mockResolvedValue({ /* truthy driver */ });
Expand Down
14 changes: 14 additions & 0 deletions packages/services/service-cluster-redis/src/pubsub.ts
Original file line number Diff line number Diff line change
Expand Up @@ -75,6 +75,20 @@ export class RedisPubSub implements IPubSub {
});
}

/**
* Durability contract (P1-5): this awaits the Redis `PUBLISH` command (so the
* message left this node), but Redis pub/sub is **at-most-once** — there is no
* delivery guarantee to subscribers and no replay for a node that was down or
* slow at publish time. This is acceptable **only** for events that are pure
* cache-invalidation hints, never the source of truth.
*
* In particular `metadata.changed` is such a hint: the durable record of every
* metadata mutation is the transactional write to `sys_metadata`
* (+ `sys_metadata_history`). A subscriber that misses the event keeps serving
* its cached schema until its next reload and **loses no data** — it self-heals
* on restart / reload against the DB. Do not route any state that must be
* delivered exactly-once through this channel; use a durable outbox instead.
*/
async publish<T = unknown>(
channel: string,
payload: T,
Expand Down