We use php-fpm-exporter to export Prometheus metrics about PHP-FPM's internal state, such as the number of active vs idle workers.
However, in order for php-fpm to serve its own status information (scraped by the exporter), at least one worker thread must be idle. This is often an issue in saturation events, where we lose PHP monitoring data entirely during the affected interval, on either a small number of appservers or on the whole fleet.
There's a couple options for getting around this:
- Create a dedicated statistics worker. We'd get a full export of all of php-fpm's current metrics, but this would also involve major modifications to the php-fpm source.
- Have workers report their idle/busy status via some out-of-band mechanism, in addition to the existing one.
It turns out #2 is pretty easy to implement. P11187 is a patch that simply updates each worker's argv with 'busy' or 'idle' whenever the state changes:
(The reason for the difference between the per-process reported state and the Status: "Processes active: 0, idle 8 state that php-fpm already provides to systemd is that the latter is only updated once every 10 seconds.)
Changing argv is implemented via PHP's own emulation of BSD's setproctitle() -- in this case, a memset() followed by some strncopy()s, all smallish. Very likely fast enough to do on every request.
Looking at history, fpm_requests.c doesn't change too often (a few times a year), so maintaining this patch on our own seems quite feasible -- but that being said, it still seems like a fine candidate for upstreaming, assuming it works out well for us.
From there we could have a trivial exporter that scanned php-fpm process argvs and exported a state gauge.