It looks like HHVM has a small memory leak, which results into appservers OOMing approximately every two weeks:This usually happens in clusters as well, probably since appservers usually get (re)started at about the same time.
In turn, this causes spikes of 500s, as it happened today and can be seen with.
@Joe currently thinks that this is not the same bug as T103886. From my (limited) understanding of that bug it doesn't seem the same either: memory usage is a slowly but gradually increasing, it doesn't look like it is correlated with deploy times.
This causes mini-outages every so often, hence prioritizing this to UBN.