Conversation
foot fails at startup under elfuse with "failed to create keyboard repeat timer FD: Inappropriate ioctl for device" because timerfd_create( CLOCK_BOOTTIME, TFD_CLOEXEC | TFD_NONBLOCK) returns -1. Two root causes: - The clockid allow-list only accepted CLOCK_REALTIME and CLOCK_MONOTONIC, so CLOCK_BOOTTIME (7) returned -EINVAL. Linux has no suspend-aware equivalent on macOS; treating BOOTTIME as MONOTONIC matches the existing translate_clockid mapping in src/syscall/time.c. - TFD_NONBLOCK was applied via fd_set_nonblock(kq), which issues fcntl(F_SETFL, O_NONBLOCK) on the kqueue host fd. macOS rejects that with ENOTTY (errno 25), so every timerfd_create(..., TFD_NONBLOCK) call failed regardless of clockid. The "Inappropriate ioctl for device" string foot logs is glibc strerror for ENOTTY leaking through. The non-blocking flag now lives in fd_table[gfd].linux_flags alongside the existing CLOEXEC bit, and timerfd_read consults that field after snapshotting it under fd_lock (order 3) before acquiring sfd_lock (order 5a). The lock-order snapshot matches the documented discipline in src/syscall/internal.h and the eventfd_dup_fd pattern. Tests cover create succeeding with CLOCK_BOOTTIME+TFD_NONBLOCK and an armed-but-unfired non-blocking read returning EAGAIN through the shadow rather than the unarmed-timer EAGAIN path. Verified against an ARM64 Linux host that read-before-fire on a non-blocking timerfd returns errno=EAGAIN, matching the elfuse behavior. Close #82
Without an FD_TIMERFD branch, fcntl(timerfd, F_GETFL) routed to the kqueue host fd and surfaced macOS-side flags, while F_SETFL hit the same ENOTTY rejection that broke the create path. Wire both branches through fd_table[fd].linux_flags so the shadow is the source of truth. F_GETFL returns O_RDWR plus the writable bits Linux honors on a timerfd inode: O_APPEND, O_NONBLOCK, and O_NOATIME (Linux's full SETFL_MASK minus O_ASYNC, which timerfd_fops drops because it lacks ->fasync, and minus O_DIRECT). O_RDWR is hard-coded because Linux opens the inode O_RDWR via anon_inode_getfd in fs/timerfd.c, and is also stamped into linux_flags at create time so the access mode is visible to other consumers. F_SETFL accepts O_APPEND, O_NONBLOCK, and O_NOATIME, silently drops access mode / CLOEXEC / non-writable bits matching how Linux F_SETFL treats them, and returns -EINVAL on O_DIRECT mirroring Linux's vfs_set_direct_io_flags rejecting it on an inode without FMODE_CAN_ODIRECT. dup(2) preservation: the install_fd_alias_metadata_atomic preserved mask now includes LINUX_O_NONBLOCK, and fuse_dup_fd does the same. Without this a duplicated non-blocking fd of any kind that stores NONBLOCK in linux_flags rather than on the host fd (FUSE files, now also timerfds) silently reverted to blocking. abi.h adds LINUX_O_ASYNC=0x2000 and LINUX_O_NOATIME=0x40000 (octal 020000 and 01000000 from asm-generic/fcntl.h). The translate helpers now use the LINUX_O_ASYNC symbol instead of the 0x2000 inline literal. Tests cover the full fcntl coherence surface: O_RDWR access mode is visible, accepted-plus-stray F_SETFL persists O_APPEND while dropping O_WRONLY and O_CLOEXEC, O_DIRECT returns EINVAL, O_NOATIME round-trips, and dup preserves both NONBLOCK and the access mode.
The FUSE F_SETFL branch's preserved mask omitted LINUX_O_ACCMODE, while the assignment OR'd in arg bits outside the preserved set. As a result, fcntl(fuse_rdwr_fd, F_SETFL, 0) silently turned an O_RDWR FUSE shadow into O_RDONLY, and a subsequent fcntl(fd, F_GETFL) reported the wrong access mode -- breaking the Linux contract that F_SETFL cannot change the access mode. Add LINUX_O_ACCMODE to both the preserved mask and the strip applied to the incoming arg, matching how Linux generic_setfl() preserves the access mode bits outside its SETFL_MASK.
Several callers wrote fd_table[gfd].linux_flags under different locks or none at all, so a concurrent fcntl(F_SETFL/F_SETFD) on fd_lock could race a creator's bare assignment. sys_fcntl read the slot's type and flags outside fd_lock and mutated them without revalidation, so a close+reopen between the read and the write could update an unrelated fd. This commit unifies both concerns under fd_lock. fd_publish_linux_flags helper New fdtable helper takes fd_lock around a single linux_flags write. Replaces bare assignments in sys_timerfd_create, sys_eventfd, sys_signalfd, sys_inotify_init1, the FUSE dev mount, and fuse_open. fuse_dup_fd takes fd_lock once for both the source read and the destination write so the preserved-flags snapshot stays consistent with a racing F_SETFL on either fd. The result: every write to fd_table[*].linux_flags is now serialized on the same lock, with no fuse_lock<->fd_lock nesting introduced. sys_fcntl snapshot-then-revalidate sys_fcntl now takes a single fd_snapshot at entry and uses it for F_GETFD, F_GETFL, and F_DUPFD source reads. F_SETFD and the FUSE / timerfd F_SETFL writers reacquire fd_lock and revalidate against fd_snap.generation before mutating linux_flags. fd_alloc bumps a monotonic generation counter per slot reuse, so close+reopen between snapshot and lock is caught and returns EBADF rather than mutating an unrelated fd. The timerfd F_SETFL O_DIRECT EINVAL check moves inside the lock so a stale-snapshot race cannot report EINVAL based on a fd that is no longer a timerfd; the revalidation returns EBADF first instead. A new test exercises the cross-cutting fd_lock RMW: F_SETFL stamps the writable status bits, then F_SETFD toggles CLOEXEC, and F_GETFL must still surface the status bits unperturbed.
Contributor
Author
|
@doanbaotrung , Please confirm if this PR helps. |
|
Dear @jserv , I've just built code from this branch and try to execute the application. The issue of timerfd was gone. It works now. Thank, |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
sys_timerfd_create is known to be incomplete:
The non-blocking flag now lives in fd_table[gfd].linux_flags alongside the existing CLOEXEC bit, and timerfd_read consults that field after snapshotting it under fd_lock (order 3) before acquiring sfd_lock (order 5a). The lock-order snapshot matches the documented discipline in src/syscall/internal.h and the eventfd_dup_fd pattern.
Close #82
Summary by cubic
Adds support for
CLOCK_BOOTTIMEand makes timerfd non-blocking andfcntlbehavior match Linux on macOS. Also unifieslinux_flagshandling so reads,dup, and FUSE behave correctly.New Features
CLOCK_BOOTTIMEintimerfd_create(mapped to MONOTONIC).Bug Fixes
O_NONBLOCKinfd_table[].linux_flags;timerfd_readsnapshots it.fcntl(F_GETFL)for timerfd returnsO_RDWR|{O_APPEND,O_NONBLOCK,O_NOATIME}, andF_SETFLupdates those bits;O_DIRECTnow returns-EINVAL.linux_flagswrites viafd_publish_linux_flags;sys_fcntlsnapshots and revalidates beforeF_SETFD/F_SETFLupdates underfd_lock.O_NONBLOCKand access mode acrossdup(fs and FUSE); FUSEF_SETFLnow preserves access mode.LINUX_O_ASYNCandLINUX_O_NOATIME; translate helpers use these symbols.F_GETFL/F_SETFLsemantics, anddupflag preservation.Written for commit 0a2199c. Summary will update on new commits.