You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We're seeing persistent WriteFailedException: Failed to write request to socket [broken pipe] errors at a ~1-2% error rate across all Lambda instances. Bref recovers by restarting FPM, but the current invocation returns a 500.
The error pattern on a single Lambda instance:
Request N completes normally (e.g., 220ms)
Request N+1 arrives ~6ms later
WriteFailedException immediately — the FPM socket is already dead
Bref restarts FPM, next request works
Errors also spontaneously drop to near-zero (~0.01%) for 7-14 hours without any deployment, then climb back to ~1-2%. This suggests state corruption that self-heals through Lambda instance recycling.
How to reproduce
We haven't found a reliable reproduction — it's intermittent and only manifests in production under real traffic. It has persisted since our initial Bref migration and across PHP 8.1 and 8.4.
Redis/Valkey — Predis runs in child only, master never touches it
Large responses — no correlation between response size and errors
stderr log_limit — max log message is ~2KB, well under the 8192 limit
Cold starts — orders of magnitude fewer than error count
PHP version — reproduced on both 8.1 and 8.4
Questions
We noticed the catch block in FpmHandler.php (line 162) calls $this->stop() before capturing
proc_get_status($this->fpm), so the exit code and termination signal are lost. Would you accept a PR adding
this logging on the failure path?
Description
We're seeing persistent
WriteFailedException: Failed to write request to socket [broken pipe]errors at a ~1-2% error rate across all Lambda instances. Bref recovers by restarting FPM, but the current invocation returns a 500.The error pattern on a single Lambda instance:
WriteFailedExceptionimmediately — the FPM socket is already deadErrors also spontaneously drop to near-zero (~0.01%) for 7-14 hours without any deployment, then climb back to ~1-2%. This suggests state corruption that self-heals through Lambda instance recycling.
How to reproduce
We haven't found a reliable reproduction — it's intermittent and only manifests in production under real traffic. It has persisted since our initial Bref migration and across PHP 8.1 and 8.4.
bref/php-84-fpm:2pm=static,max_children=1,log_limit=8192)What we've ruled out
opcache.jit = disable, verified from imageQuestions
proc_get_status($this->fpm), so the exit code and termination signal are lost. Would you accept a PR adding
this logging on the failure path?