Try scrolling through https://maps.wikimedia.org/ on a mid zoom level like 12, tiles load noticeably slowly. Times over 0.5s are typical, worst I've seen is 35 seconds. This is very different from before, so I suspect that caches migration to upload cluster (T164608) is the cause.
Description
Related Objects
- Mentioned Here
- T164608: Merge cache_maps into cache_upload functionally
Event Timeline
Are you comparing cache hits to cache misses? From where? What was the timing like before?
Another thought - could we be maxing out parallel connections to the kartotherian machines? We've always had a max_connections of 1000 (per varnish backend, to kartotherian), but the number of varnish backends has multiplied with this change, which could have multiplied the overall limit on connections opened from varnish to kartotherian, and then hit some limit causing further connections to stall?
I can't seem to reproduce the problem from my browser. Looking at the varnish response time stats, I don't see much variation over the last 30 days. The 95%-ile seem to be between 150ms and 400ms, the 99%-ile between 1.5s and 3.5s, with a few peaks up to 4s. Visually, not much has changed.
@MaxSem how did you do those measurements?