Persistent broken pipe errors (~1-2% of requests)

**Description**

We're seeing persistent `WriteFailedException: Failed to write request to socket [broken pipe]` errors at a ~1-2% error rate across all Lambda instances.  Bref recovers by restarting FPM, but the current invocation returns a 500.

The error pattern on a single Lambda instance:

1. Request N completes normally (e.g., 220ms)
2. Request N+1 arrives ~6ms later
3. `WriteFailedException` immediately — the FPM socket is already dead
4. Bref restarts FPM, next request works


Errors also spontaneously drop to near-zero (~0.01%) for 7-14 hours without any deployment, then climb back to ~1-2%. This suggests state corruption that self-heals through Lambda instance recycling.

**How to reproduce**

We haven't found a reliable reproduction — it's intermittent and only manifests in production under real traffic. It has persisted since our initial Bref migration and across PHP 8.1 and 8.4.

- **Bref**: 2.4.18 (Docker images, not layers)
- **Docker image**: `bref/php-84-fpm:2`
- **PHP**: 8.4.18
- **AWS service**: Lambda (via API Gateway HTTP API)
- **Lambda config**: 1024 MB memory, 28s timeout
- **FPM config**: Bref defaults (`pm=static`, `max_children=1`, `log_limit=8192`)
- **Extensions**: redis (Predis pure PHP client), intl, opcache, pcntl, posix, pdo_mysql
- **Framework**: Laravel 10

**What we've ruled out**

- SIGPIPE (#1854) — zero matches in CloudWatch
- JIT segfaults (#842) — `opcache.jit = disable`, verified from image
- OOM — memory well within Lambda limits
- Excimer profiler — removed from Dockerfile entirely, errors persist
- Sentry tracing/profiling — fully disabled, errors persist
- Redis/Valkey — Predis runs in child only, master never touches it
- Large responses — no correlation between response size and errors
- stderr log_limit — max log message is ~2KB, well under the 8192 limit
- Cold starts — orders of magnitude fewer than error count
- PHP version — reproduced on both 8.1 and 8.4

**Questions**

  - We noticed the catch block in [FpmHandler.php](https://github.com/brefphp/bref/blob/master/src/FpmRuntime/FpmHandler.php#L162) (line 162) calls $this->stop() before capturing
  proc_get_status($this->fpm), so the exit code and termination signal are lost. Would you accept a PR adding
  this logging on the failure path?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Persistent broken pipe errors (~1-2% of requests) #2077

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Persistent broken pipe errors (~1-2% of requests) #2077

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions