T163944: Timeouts on CODFW caused a major issue during today's deployment, but only CODFW nodes failed. The best I can tell is that uwsgi was not picking up new code when it restarted, but celery did. This mismatch among uwsgi processes and celery processes resulted in a bunch of errors. For some reason, every now and then a request would make it through. As far as I could tell, all of the codfw nodes had fully updated code in the /srv/deployment/ores/deploy/ directory. I'd confirmed that the uwsgi processes were being restarted too. I also confirmed that there wasn't some old, weird, version of ORES installed in /srv/deployment/ores/venv/.
This task is done when we figure out why the CODFW nodes did not successfully pick up the new code during the deployment.