You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Closing a guest fd does not remove that fd from any epoll instance's interest
table. Each epoll instance keeps its own epoll_reg_t regs[FD_TABLE_SIZE]
indexed by guest fd number (src/syscall/poll.c:640-642), but sys_close()
(src/syscall/fs.c:423) — and every other close path that funnels through fd_cleanup_entry() (dup2-over-existing, the execve CLOEXEC sweep) — only tears
down the fd_table[fd] slot. It never touches the regs[fd] entry of the epoll
instances that registered that fd, so regs[fd].active stays true after the
fd is gone.
On Linux, closing the last descriptor that refers to an open file description
auto-removes it from every epoll interest list. elfuse leaves the registration
behind.
This was flagged by jserv in the PR #73 review as an adjacent, pre-existing
issue, explicitly to be filed separately:
Adjacent (pre-existing, but more observable after this change): sys_close
does not clear inst->regs[fd].active for epoll instances holding the fd, so
a closed-then-reopened guest fd still looks active to epoll. Worth filing
separately.
Root cause
There is no reverse index from a target fd to the set of epoll instances that
registered it. Epoll instances are reachable only through their own epfd in fd_table[epfd].dir. A close has the fd number in hand but no cheap way to find
"which epoll instances watch this fd," so it does nothing — the registration
tables drift out of sync with reality and only get corrected lazily, the next
time epoll_ctl() happens to touch that exact (epfd, fd) pair.
Current mitigation (PR #73) and why it is not a fix
PR #73 added an fd_entry_t.generation stamp and a cross-call ABA guard in sys_epoll_ctl() (src/syscall/poll.c:743). At the head of every epoll_ctl(), a registration whose stamped generation no longer matches fd_table[fd].generation is treated as stale and dropped, so DEL/MOD report ENOENT and ADD starts fresh after a close+reopen of the same fd number. This
makes the observable close+reopen ABA correct and is covered by tests/test-epoll-aba.c.
But the guard is a point-of-use patch at the symptom layer, not a fix for
the root cause:
The registration table still lies: regs[fd].active == true for a fd that
is closed (and possibly reopened as a different file). Correctness now depends
on every reader re-deriving truth from generation.
sys_epoll_ctl() does consult generation. sys_epoll_pwait() does not.
The readiness loop maps kevents[i].udata (the guest fd number) straight back
into inst->regs[gfd].active / .oneshot_armed with no generation check
(src/syscall/poll.c:1010-1012 and 1046-1051). Today this is masked because
a closed host fd's knote is already gone from kqueue, so no event carries the
stale udata — but that is an indirect kqueue side effect, not an invariant
the code enforces. Any future change that re-adds a host fd, or a path that
reuses a udata value, resurfaces the stale entry as a wrong/mismatched-data
readiness report.
The guard is therefore load-bearing: remove or bypass the generation check
in any one epoll read path and the original wrong-knote / wrong-data bug
returns. That is fragile for a correctness-critical path.
Stale entries also simply linger for the epoll instance's whole lifetime
(regs[] is FD_TABLE_SIZE == 1024 entries per instance), never reflecting that
the fd is gone.
Out of scope (related, separate divergence)
elfuse keys epoll state on the guest fd number, while Linux keys on the open
file description. So dup()-then-close-the-original (registration should
survive via the surviving descriptor) is a distinct modeling gap that neither
the generation guard nor eager-cleanup-on-close addresses. Track separately;
do not conflate with this issue.
Reproduction (state-level; observable symptom is currently masked by the guard)
Inspect the instance: regs[fd].active is still true (Linux: the
registration is gone).
With the PR #73 guard in place the observable DEL/MOD/ADD behavior after a
reopen is correct (see test-epoll-aba.c); this issue is about the underlying
state being wrong and the guard being the only thing standing between that wrong
state and a user-visible bug.
Summary
Closing a guest fd does not remove that fd from any epoll instance's interest
table. Each epoll instance keeps its own
epoll_reg_t regs[FD_TABLE_SIZE]indexed by guest fd number (
src/syscall/poll.c:640-642), butsys_close()(
src/syscall/fs.c:423) — and every other close path that funnels throughfd_cleanup_entry()(dup2-over-existing, the execve CLOEXEC sweep) — only tearsdown the
fd_table[fd]slot. It never touches theregs[fd]entry of the epollinstances that registered that fd, so
regs[fd].activestaystrueafter thefd is gone.
On Linux, closing the last descriptor that refers to an open file description
auto-removes it from every epoll interest list. elfuse leaves the registration
behind.
This was flagged by jserv in the PR #73 review as an adjacent, pre-existing
issue, explicitly to be filed separately:
Root cause
There is no reverse index from a target fd to the set of epoll instances that
registered it. Epoll instances are reachable only through their own
epfdinfd_table[epfd].dir. A close has the fd number in hand but no cheap way to find"which epoll instances watch this fd," so it does nothing — the registration
tables drift out of sync with reality and only get corrected lazily, the next
time
epoll_ctl()happens to touch that exact(epfd, fd)pair.Current mitigation (PR #73) and why it is not a fix
PR #73 added an
fd_entry_t.generationstamp and a cross-call ABA guard insys_epoll_ctl()(src/syscall/poll.c:743). At the head of everyepoll_ctl(), a registration whose stamped generation no longer matchesfd_table[fd].generationis treated as stale and dropped, so DEL/MOD reportENOENTand ADD starts fresh after a close+reopen of the same fd number. Thismakes the observable close+reopen ABA correct and is covered by
tests/test-epoll-aba.c.But the guard is a point-of-use patch at the symptom layer, not a fix for
the root cause:
regs[fd].active == truefor a fd thatis closed (and possibly reopened as a different file). Correctness now depends
on every reader re-deriving truth from
generation.sys_epoll_ctl()does consultgeneration.sys_epoll_pwait()does not.The readiness loop maps
kevents[i].udata(the guest fd number) straight backinto
inst->regs[gfd].active/.oneshot_armedwith no generation check(
src/syscall/poll.c:1010-1012and1046-1051). Today this is masked becausea closed host fd's knote is already gone from kqueue, so no event carries the
stale
udata— but that is an indirect kqueue side effect, not an invariantthe code enforces. Any future change that re-adds a host fd, or a path that
reuses a
udatavalue, resurfaces the stale entry as a wrong/mismatched-datareadiness report.
in any one epoll read path and the original wrong-knote / wrong-data bug
returns. That is fragile for a correctness-critical path.
Stale entries also simply linger for the epoll instance's whole lifetime
(
regs[]isFD_TABLE_SIZE == 1024entries per instance), never reflecting thatthe fd is gone.
Out of scope (related, separate divergence)
elfuse keys epoll state on the guest fd number, while Linux keys on the open
file description. So
dup()-then-close-the-original (registration shouldsurvive via the surviving descriptor) is a distinct modeling gap that neither
the generation guard nor eager-cleanup-on-close addresses. Track separately;
do not conflate with this issue.
Reproduction (state-level; observable symptom is currently masked by the guard)
epfd = epoll_create1();fd = eventfd().epoll_ctl(epfd, EPOLL_CTL_ADD, fd, …)→regs[fd].active = true.close(fd).regs[fd].activeis stilltrue(Linux: theregistration is gone).
With the PR #73 guard in place the observable DEL/MOD/ADD behavior after a
reopen is correct (see
test-epoll-aba.c); this issue is about the underlyingstate being wrong and the guard being the only thing standing between that wrong
state and a user-visible bug.
References
the generation guard and
test-epoll-aba.c.src/syscall/poll.c:743— cross-call ABA guard (consultsgeneration).src/syscall/poll.c:1010-1012,1046-1051—sys_epoll_pwaitreadsregs[].active/.oneshot_armedwithout a generation check.src/syscall/poll.c:640-642— per-instanceregs[FD_TABLE_SIZE]table.src/syscall/fs.c:423(sys_close),src/syscall/fdtable.c:457(
fd_cleanup_entry) — close teardown that does not touch epoll state.