We got the following alert
PROBLEM - Mobileapps LVS codfw on mobileapps.svc.codfw.wmnet is CRITICAL: /{domain}/v1/data/css/mobile/site (Get site-specific CSS) timed out before a response was received https://wikitech.wikimedia.org/wiki/Mobileapps_%28service%29
Which is consistent with https://grafana.wikimedia.org/d/5CmeRcnMz/mobileapps?panelId=20&fullscreen&orgId=1&refresh=5m&from=now-1h&to=now&var-dc=codfw%20prometheus%2Fk8s&var-service=mobileapps&var-container_name=All having a suspiciously flat line for --domain_v1_data_css_mobile_site. Quantiles aren't much better (https://grafana.wikimedia.org/d/5CmeRcnMz/mobileapps?panelId=37&fullscreen&orgId=1&refresh=5m&from=now-1h&to=now&var-dc=codfw%20prometheus%2Fk8s&var-service=mobileapps&var-container_name=All). Again suspiciously flat.
This is indicative of a timeout of some sort, however it's still unclear why this is happening, the migration was paused and the percentage of traffic reaching k8s dropped back to 10%. Even if those requests fail, restbase will retry and probably sent to an scb host which will reply and then the response will be cached.