After a change to InitialiseSettings.php, for at least 5 hours HHVM was still running with outdated code.
It is unknown how long it would've continued like this, but it stands to reason that if it lasted 5 hours, it probably just missed the change and wasn't gonna detect it later. It was worked around by performing another deployment, which appears to have been detected consistently by at least the servers that previously didn't detect it.
We should figure out:
- What caused it?
- How can we avoid it in the future?
- How can we detect it?
The detection is imho most important because the rest can be worked around, and there might in fact be more than one scenario/cause that could lead to stale HHVM byte code. A generic detection is more important than fixing or avoiding this particular case.
Note that it would be wrong to yesterday's deployment was exceptional in not reaching all HHVM instances. It's quite likely the secondary fix-up sync also didn't reach all instances. The only thing we know is that the subset of instances that didn't get it the first time, got it the second time.
Lastly, we should also keep an eye out for the possibility that this isn't caused by HHVM's byte code cache, but instead a bug in our own application-layer caches. Because for the specific case of wmf-config settings, we do cache those in some way as well.