March 4 13:54 UTC, icinga alerted on PROBLEM - LVS HTTP IPv4 on wdqs.svc.eqiad.wmnet is CRITICAL: CRITICAL - Socket timeout after 10 seconds. wdqs-blazegraph was restarted, which resolved the issue (rolling restart, took a few minutes to complete).
Looking at the Grafana dashboards, it seems that only wdqs1004 and wdqs1005 were affected (see the banned requests and lag graphs), from ~13:45 UTC to ~14:05 UTC.
My best guess is that this is related to specific user generated load that evaded throttling, but I have not found the specific problematic requests. I don't have a great idea of how to prevent this happening again, but I'm open to suggestions.
Note that this raises again the question of what SLO we want for WDQS (T199228). Since we don't have a great way to ensure this never happen again, we should manage the expectations.