Scap offers the option of restarting the HHVM processes after the sync step. It correctly issues the depool commands, but then it fails to actually restart the service:
sudo -u mwdeploy -n -- scap hhvm-restart on mw1230.eqiad.wmnet returned : Failed to restart hhvm.service: The name org.freedesktop.PolicyKit1 was not provided by any .service files See system logs and 'systemctl status hhvm.service' for details.
Failing to restart HHVM can cause (and has caused) serious problems. The solution here is either to remove it or fix it properly.
For the time being (and in order to prevent failures in the short term) the option should be removed altogether and made inaccessible to deployers. In the long run, it could be fixed to properly restart the service. Also note that here one of the problems is that Scap went through the whole list of targets despite the fact that the restart failed on every single one of them. This should not be the case; if 10% (or a similar, configurable %) of hosts fail to restart, the process should stop. Finally, when Scap failed to restart HHVM, it left the servers depooled. Instead, it should check if the service is up, and if it is, depool the server back.