Optional cross-worker lock for shared adaptor repo by stuartc · Pull Request #1416 · OpenFn/kit

stuartc · 2026-05-19T13:38:26Z

Short Description

Adds an optional filesystem lock so multiple ws-workers can safely share a single adaptor repo directory (e.g. an NFS mount or k8s PVC), without two workers racing on the same npm install.

Fixes #1414

Implementation Details

When the repo directory is shared between worker pods, two workers can otherwise hit handleInstall for the same adaptor at the same time and corrupt each other's node_modules. This PR adds a new opt-in flag — --repo-lock / WORKER_REPO_LOCK — that wraps the engine's autoinstall handlers with cross-process coordination.

How it works:

A per-adaptor lockfile under <repoDir>/.locks/<alias>.lock is acquired via proper-lockfile before any install runs. Different adaptors don't block each other.
After a successful install, a sentinel file is written under <repoDir>/.sentinels/<alias>.done. handleIsInstalled requires BOTH the sentinel AND node_modules/<alias>/package.json to be present, so a half-finished install left by a crashed worker is correctly re-attempted.
The cache-hit path (handleIsInstalled returns true) stays lock-free — no syscall fan-out for adaptors that are already installed.
If installFn throws, the sentinel isn't written, and the next worker re-runs the install.
The retry ceiling (6 min) sits above the stale-lock window (5 min), so when a pod dies mid-install its lock expires before waiting workers give up. Cold-start of N pods against an empty repo is recoverable, not fatal.
Off by default. --repo-lock requires --repo-dir; if it's set without one we warn and continue without the lock.

New dep: proper-lockfile@4.1.2 (with graceful-fs, retry, signal-exit transitives). All checked for CVEs — clean.

QA Notes

The feature is gated behind WORKER_REPO_LOCK=true. With the flag off, behaviour is identical to before this PR — please confirm that path is unchanged.

With the flag on, the interesting cases to exercise:

Two workers pointing at the same --repo-dir starting a run that needs the same adaptor at the same time — only one npm install should actually run; the other should see the sentinel and skip.
A worker killed mid-install (SIGKILL) — the next worker should recover after the stale window (5 min) rather than getting stuck forever.
Different adaptors should install in parallel, not serialise.
Without --repo-lock, multi-worker behaviour should be unchanged.

There's a multi-process test suite using child_process.fork that exercises all of the above deterministically — pnpm test test/util/repo-lock-multiprocess.test.ts from packages/ws-worker.

K8s deployments using this need NTP/chrony across nodes — stale-lock detection is mtime-based.

AI Usage

I have used Claude Code
I have used another model
I have not used AI

You can read more details in our
Responsible AI Policy

Add `--repo-lock` / `WORKER_REPO_LOCK` to coordinate adaptor installs across multiple workers sharing one repo directory (e.g. an NFS mount or k8s PVC). Uses proper-lockfile per-adaptor plus a sentinel cache, so the cache-hit path stays lock-free. Off by default; requires --repo-dir. Lock retry ceiling (6 min) is set above STALE_MS (5 min) so a dead holder's stale window expires before waiters give up, making cold- start of N pods against an empty repo recoverable rather than fatal.

Drop redundant pre-check in ensureLockTarget (wx already throws EEXIST), parallelise fileExists pairs in handleIsInstalled and the post-lock double-check, collapse trivial if/return, drop unused default export, and fix the worker test harness' mode list comment plus an any-typed logger.

josephjclark · 2026-05-20T10:29:13Z

ok @stuartc having looked a bit more closely - this implementation MUST move into the engine

The engine already has an autoinstall.ts. You should just extend that to use the lockfile stuff before calling the runtime's install. I think you probably want the lock and install logic in two different files, but that's up to you.

What is the sentinel stuff all about?

josephjclark · 2026-05-20T10:07:37Z

+  const locksDir = path.join(repoDir, '.locks');
+  const sentinelsDir = path.join(repoDir, '.sentinels');
+
+  const ensureDirs = (async () => {


This is weird. Why not remove the wrapper and just await the mkdir calls?

josephjclark · 2026-05-20T10:30:32Z

+    logger.debug(`acquired install lock for ${specifier}`);
+
+    try {
+      const [hasSentinel, hasPkg] = await Promise.all([


As we're going to move this to the engine, the engine already has an isInstalled function, which determines whether an adaptor is installed or not.

Rather than implement that test twice - as we're doing here - I'd like tot ry and re-use that logic if possible.

stuartc added 2 commits May 19, 2026 15:20

taylordowns2000 added this to Core May 19, 2026

github-project-automation Bot moved this to New Issues in Core May 19, 2026

josephjclark reviewed May 19, 2026

View reviewed changes

Comment thread packages/ws-worker/src/start.ts

josephjclark reviewed May 19, 2026

View reviewed changes

Comment thread packages/ws-worker/src/util/repo-lock.ts

josephjclark reviewed May 20, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optional cross-worker lock for shared adaptor repo#1416

Optional cross-worker lock for shared adaptor repo#1416
stuartc wants to merge 2 commits into
mainfrom
1414-distributed-adaptor-install-lock

stuartc commented May 19, 2026

Uh oh!

Uh oh!

Uh oh!

josephjclark commented May 20, 2026

Uh oh!

josephjclark May 20, 2026

Uh oh!

josephjclark May 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

stuartc commented May 19, 2026

Short Description

Implementation Details

QA Notes

AI Usage

Uh oh!

Uh oh!

Uh oh!

josephjclark commented May 20, 2026

Uh oh!

josephjclark May 20, 2026

Choose a reason for hiding this comment

Uh oh!

josephjclark May 20, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants