The outcome of T253673, is to go with idea 3 - rolling restarts.
Work:
- Patch Scap to do the rolling restart always (instead of only every once in a while, currently based on unreliable opcache thresholds) -- https://gerrit.wikimedia.org/r/631776
- Patch Scap to implement an emergency flag that will perform this restart in a way that does not ensure live server capacity, in case of a bad patch having taken down the site in large part. Tracked by T243009: Add option in Scap to restart php-fpm for emergency deployments, and skip depooling/pooling servers
- Deploy updated Scap to Beta Cluster -- https://integration.wikimedia.org/ci/job/scap-beta-deb/118/
- Measure timing before and after, and report on task.
- Package Scap for production aptitude.
- Deploy updated Scap to production.
- Measure timing before and after, and report on task.