So, the backend purging queues in esams are way behind. On the one node I'm staring at the most, there are currently about 87 million backlogged purge requests, which is probably somewhere in the ballpark of 10 hours of lag time. The backlog is in the local daemon on the esams hosts themselves (so this isn't a network issue with delivering the purges to the hosts over the WAN), so the culprit is likely the ATS daemon consuming them slowly.
This is text@eqiad GETs vs PURGEs the past week. You can see GETs have the usual organic pattern, and PURGEs are fairly spiky as we normally see.
{F31726687}
Whereas with text@esams, we see a curious pattern to the PURGE traffic, seems like it's being load-limited and somewhat-recovering when organic traffic is low overnight:
{F31726689}
<edited here>: Those purge rates are from the frontend varnish, but the frontend varnish PURGE queue doesn't receive entries until they've traversed the backend one, which is backlogged, so that's why we still see the effect there.