Skip to content

Run the encoders in-process#29

Merged
jeromekelleher merged 2 commits into
sgkit-dev:mainfrom
jeromekelleher:drop-process
May 18, 2026
Merged

Run the encoders in-process#29
jeromekelleher merged 2 commits into
sgkit-dev:mainfrom
jeromekelleher:drop-process

Conversation

@jeromekelleher
Copy link
Copy Markdown
Member

Drops the separate multiprocessing.Process encoder-server and the AF_UNIX wire protocol. The VczReader and per-fh BedEncoder / BgenEncoder instances now live in the FUSE handler process. encoder.read, encoder.close, and reader teardown run on worker threads via trio.to_thread.run_sync so the pyfuse3 trio task stays responsive.

The 30s per-read timeout and 2s aclose timeout are preserved — a slow read still surfaces EIO to the kernel rather than blocking the consumer indefinitely. On read timeout the worker thread is abandoned (abandon_on_cancel=True) and the handle is marked dead; aclose drains the abandoned thread via a threading.Event before closing the encoder, or logs a warning and leaks if the encoder is permanently wedged.

The pyfuse3 mount Operations layer is largely untouched — it depends on a Protocol (renamed EncoderClientProto -> EncoderHostProto) that the new EncoderHost / StreamHandle satisfy. The accompanying tests (test_encoder_ops, test_plink_apps, test_bgen_apps) track the rename.

Net diff: -2335 LOC across the deleted encoder_{client,server,protocol} modules and their dedicated test files.

The glibc-arena fragmentation tuning called out in notes/memory_rss_investigation.md (MALLOC_ARENA_MAX, malloc_trim) is deferred to a separate change.

Drops the separate multiprocessing.Process encoder-server and the
AF_UNIX wire protocol. The VczReader and per-fh BedEncoder /
BgenEncoder instances now live in the FUSE handler process.
encoder.read, encoder.close, and reader teardown run on worker threads
via trio.to_thread.run_sync so the pyfuse3 trio task stays responsive.

The 30s per-read timeout and 2s aclose timeout are preserved — a slow
read still surfaces EIO to the kernel rather than blocking the
consumer indefinitely. On read timeout the worker thread is abandoned
(abandon_on_cancel=True) and the handle is marked dead; aclose drains
the abandoned thread via a threading.Event before closing the encoder,
or logs a warning and leaks if the encoder is permanently wedged.

The pyfuse3 mount Operations layer is largely untouched — it depends
on a Protocol (renamed EncoderClientProto -> EncoderHostProto) that
the new EncoderHost / StreamHandle satisfy. The accompanying tests
(test_encoder_ops, test_plink_apps, test_bgen_apps) track the rename.

Net diff: -2335 LOC across the deleted encoder_{client,server,protocol}
modules and their dedicated test files.

The glibc-arena fragmentation tuning called out in
notes/memory_rss_investigation.md (MALLOC_ARENA_MAX, malloc_trim) is
deferred to a separate change.
For both plink and bgen, mount the fixture VCZ via biofuse and verify
the first 100 MB of the streaming file matches the bytes produced by
BedEncoder / BgenEncoder run directly in-process.
@jeromekelleher
Copy link
Copy Markdown
Member Author

Here's the fs tests report for this change:

biofuse fs_tests report

  • Run started: 2026-05-18T08:22:39.973267+00:00
  • Host: claude-worker1 (Linux-6.8.0-111-generic-x86_64-with-glibc2.39)
  • Python: 3.11.15
  • biofuse commit: 9da47a49592531be35e8a83c1b6861dc68a00d12
  • External tool versions:
    • fio: fio-3.36
    • stress-ng: stress-ng, version 0.17.06 (gcc 13.2.0, x86_64 Linux 6.8.0-111-generic)
    • fusermount3: fusermount3 version: 3.14.0
    • git: git version 2.43.0

Overall: PASS

91 / 91 checks passed across 8 runners.

Per-runner summary

Runner Status Passed Failed Duration Notes
posix PASS 51 0 10.35s 51/51 POSIX checks passed
bulk-data PASS 2 0 14.49s bulk-data cross-validation: encoder vs mount, cap=100 MB
pjdfstest PASS 16 0 187.82s pjdfstest@03eb257 on read-only mount (16 groups; results informational — see per-group logs for failure samples)
fio PASS 10 0 208.14s fixture=medium jobs=7
fsx PASS 3 0 27.46s fsx read-only cross-validation: 50000 ops × 3 seeds
stress-ng PASS 3 0 65.67s open/read loops + optional stress-ng background load
lifecycle PASS 3 0 257.45s 50 mount/unmount cycles; mean 5.15s
active-under-stress PASS 3 0 36.45s background=fio-multithread.fio duration=30s probes=3

Runner: posix

Check Status Duration Detail
open: O_RDONLY succeeds PASS 0.002s
open: O_RDONLY O_NONBLOCK O_CLOEXEC accepted PASS
open: O_WRONLY rejected with EROFS or EACCES PASS 0.000s
open: O_RDWR rejected with EROFS or EACCES PASS 0.000s
open: O_APPEND rejected with EROFS or EACCES PASS 0.000s
open: O_CREAT for new file rejected PASS 0.000s
open: O_DIRECTORY on regular file -> ENOTDIR PASS 0.000s
open: O_DIRECTORY on mountpoint -> ok PASS 0.000s
open: nonexistent path -> ENOENT PASS 0.000s
read: full file via os.read matches backing PASS 0.018s
pread: random offsets match backing PASS 4.102s
pread: at EOF returns empty PASS 0.000s
pread: spanning EOF returns trailing bytes only PASS 0.000s
readv / preadv: bytes match backing PASS 0.000s
lseek: SEEK_SET / SEEK_CUR / SEEK_END PASS 0.001s
lseek: negative offset -> EINVAL PASS 0.000s
lseek: past EOF + read -> 0 bytes PASS 0.000s
stat == lstat == fstat for regular files PASS 0.000s
stat: st_mode is S_IFREG with no write bits PASS 0.000s
stat: st_size matches reads PASS 0.000s
stat: st_dev consistent across files in mount PASS 0.000s
stat: st_ino unique per file PASS 0.000s
statvfs: ST_RDONLY flag set on mount PASS 0.001s
access: F_OK true for existing files PASS 0.000s
access: R_OK true for existing files PASS 0.000s
access: W_OK false on read-only mount PASS 0.000s
access: F_OK false for missing file PASS 0.000s
readdir: listdir matches backing names PASS 0.000s
scandir: entries match listdir PASS 0.000s
scandir: each entry is_file() and not is_dir() PASS 0.000s
openat / fstatat: relative resolution from dirfd PASS 0.000s
dup / dup2: independent offsets PASS 0.000s
fcntl: F_GETFL reports O_RDONLY PASS 0.000s
mmap: PROT_READ MAP_PRIVATE returns matching bytes PASS 0.080s
mmap: MAP_SHARED PROT_WRITE rejected PASS 0.000s
path: trailing slash on regular file -> ENOTDIR PASS 0.000s
path: redundant ./// segments resolve PASS 0.000s
chdir + relative open works PASS 0.000s
mutate: write rejected PASS 0.000s
mutate: unlink rejected PASS 0.000s
mutate: rename rejected PASS 0.000s
mutate: mkdir rejected PASS 0.000s
mutate: symlink rejected PASS 0.000s
mutate: link rejected PASS 0.000s
mutate: chmod rejected PASS 0.000s
mutate: chown rejected (only if non-root) PASS 0.000s
mutate: utime rejected PASS 0.000s
mutate: truncate rejected PASS 0.000s
xattr: getxattr returns ENOTSUP/ENODATA PASS 0.000s
xattr: setxattr rejected PASS 0.000s
fd churn: 1000 open/close cycles, no fd leak PASS 0.088s

Runner: bulk-data

Check Status Duration Detail
bulk-data:plink PASS 5.451s compared 8952503 bytes (encoder total_size=8952503)
bulk-data:bgen PASS 9.006s compared 104857600 bytes (encoder total_size=117100422)

Runner: pjdfstest

Check Status Duration Detail
pjdfstest:open PASS 5.183s ok=11 not_ok=312 timeouts=0 (read-only FS — high not_ok is expected; see log for samples)
pjdfstest:granular PASS 0.055s ok=7 not_ok=0 timeouts=0 (read-only FS — high not_ok is expected; see log for samples)
pjdfstest:chflags PASS 0.097s ok=14 not_ok=0 timeouts=0 (read-only FS — high not_ok is expected; see log for samples)
pjdfstest:chmod PASS 14.093s ok=2 not_ok=305 timeouts=0 (read-only FS — high not_ok is expected; see log for samples)
pjdfstest:chown PASS 82.211s ok=2 not_ok=1495 timeouts=0 (read-only FS — high not_ok is expected; see log for samples)
pjdfstest:ftruncate PASS 3.371s ok=3 not_ok=86 timeouts=0 (read-only FS — high not_ok is expected; see log for samples)
pjdfstest:link PASS 12.235s ok=16 not_ok=343 timeouts=0 (read-only FS — high not_ok is expected; see log for samples)
pjdfstest:mkdir PASS 2.615s ok=3 not_ok=115 timeouts=0 (read-only FS — high not_ok is expected; see log for samples)
pjdfstest:mkfifo PASS 2.457s ok=3 not_ok=117 timeouts=0 (read-only FS — high not_ok is expected; see log for samples)
pjdfstest:mknod PASS 5.025s ok=1 not_ok=185 timeouts=0 (read-only FS — high not_ok is expected; see log for samples)
pjdfstest:rename PASS 26.748s ok=6 not_ok=4851 timeouts=0 (read-only FS — high not_ok is expected; see log for samples)
pjdfstest:rmdir PASS 2.797s ok=4 not_ok=141 timeouts=0 (read-only FS — high not_ok is expected; see log for samples)
pjdfstest:symlink PASS 2.568s ok=3 not_ok=92 timeouts=0 (read-only FS — high not_ok is expected; see log for samples)
pjdfstest:truncate PASS 3.593s ok=3 not_ok=81 timeouts=0 (read-only FS — high not_ok is expected; see log for samples)
pjdfstest:unlink PASS 19.556s ok=3 not_ok=437 timeouts=0 (read-only FS — high not_ok is expected; see log for samples)
pjdfstest:utimensat PASS 1.573s ok=2 not_ok=120 timeouts=0 (read-only FS — high not_ok is expected; see log for samples)

Runner: fio

Check Status Duration Detail
fio:seq-read PASS 30.340s errors=0 io=396.4 MB runtime=30017ms throughput=13.2 MB/s
fio:rand-read PASS 60.544s errors=0 io=28.7 MB runtime=30063ms throughput=1.0 MB/s
fio:mmap-read PASS 30.941s errors=11 io=136.0 MB runtime=30705ms throughput=4.4 MB/s (informational)
fio:mmap-read:concurrent PASS 0.000s records=100 fhs=17 max_overlap=12
fio:parallel-seq-read PASS 30.534s errors=0 io=70.2 MB runtime=30300ms throughput=2.3 MB/s
fio:parallel-seq-read:concurrent PASS 0.000s records=556 fhs=21 max_overlap=18
fio:multithread PASS 30.305s errors=11 io=28.2 MB runtime=30070ms throughput=0.9 MB/s (informational)
fio:multithread:concurrent PASS 0.000s records=530 fhs=31 max_overlap=16
fio:static-stress-bim PASS 10.253s errors=0 io=224.3 MB runtime=10010ms throughput=22.4 MB/s
fio:static-stress-fam PASS 10.246s errors=0 io=361.9 MB runtime=10008ms throughput=36.2 MB/s

Runner: fsx

Check Status Duration Detail
fsx:seed-7 PASS 8.796s completed=50000/50000 mismatches=0 short_reads=0
fsx:seed-23 PASS 5.869s completed=50000/50000 mismatches=0 short_reads=0
fsx:seed-101 PASS 5.852s completed=50000/50000 mismatches=0 short_reads=0

Runner: stress-ng

Check Status Duration Detail
open-loop:4p:30s PASS 30.044s workers=4 ops=183 errors=0
open-loop:16p:30s PASS 30.123s workers=16 ops=19856 errors=0
stress-ng:background-load PASS 0.000s rc=0 failed=None completed=None

Runner: lifecycle

Check Status Duration Detail
lifecycle:cycles_complete PASS 257.420s completed 50/50; mean=5.15s p99=5.59s max=5.59s
lifecycle:no_orphan_mounts PASS 0.000s orphan fuse.biofuse mounts at /home/ubuntu/agents-work/sgkit-dev/biofuse/fs_tests/results/20260518T080912Z/lifecycle/mnt: 0
lifecycle:max_cycle_within_budget PASS 0.000s max cycle 5.59s vs budget 30.0s

Runner: active-under-stress

Check Status Duration Detail
liveness:readdir PASS 0.000s attempts=59 ok=59 timeouts=0 errors=0 max_latency=41.3ms
liveness:static-read:.bim PASS 0.000s attempts=59 ok=59 timeouts=0 errors=0 max_latency=15.0ms
liveness:static-read:.fam PASS 0.000s attempts=59 ok=59 timeouts=0 errors=0 max_latency=20.1ms

@jeromekelleher jeromekelleher merged commit eaa5e72 into sgkit-dev:main May 18, 2026
5 checks passed
@jeromekelleher jeromekelleher deleted the drop-process branch May 18, 2026 08:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant