Since roughly 3:40 am UTC, mw1304 emits TIMEOUT OCCURRED memcached errors at a rate of roughly 400 events per minute.
It also does not show up in https://grafana.wikimedia.org/d/000000377/host-overview so maybe the host is broken somehow.
Mentioned in SAL (#wikimedia-operations) [2021-03-30T07:37:21Z] <elukey> restart-php7.2-fpm on mw1304, jobrunner completely overwhelmed by ffmpeg/transcode jobs (not publishing metrics, erroring out for memcached timeouts) - T278734
The timeout errors have vanished. No idea why the job runner would over run video transcoding on a given host though.
Root cause is not addressed but flushing the stuck php transcode jobs has made the server responsive again.
The same issue happened later which is now tracked in an incident document.