It seems that the service-runner worker sometimes misses its heartbeat to the master, which causes the master to kill it. I can reproduce this on staging fairly consistently, if I send it a a lot of events.
We tried to roll out mediawiki.api-request to group1 wikis today, but the frequent worker restarts caused errors, and also caused k8s' operational latency to increase.