Starting around 10 November, the rendering speed for Special:Homepage becomes much spikier:
We should investigate what caused it and fix if possible.
| kostajh | |
| Nov 24 2021, 10:30 AM |
| F34882770: image.png | |
| Dec 13 2021, 9:35 PM |
| F34820865: grafik.png | |
| Dec 3 2021, 1:40 PM |
| F34764499: image.png | |
| Nov 24 2021, 10:30 AM |
| F34764497: image.png | |
| Nov 24 2021, 10:30 AM |
Starting around 10 November, the rendering speed for Special:Homepage becomes much spikier:
We should investigate what caused it and fix if possible.
| Subject | Repo | Branch | Lines +/- | |
|---|---|---|---|---|
| statsd: Instrument local search task suggester | mediawiki/extensions/GrowthExperiments | master | +71 -6 |
There was no train that week, nor any relevant backport/config patches that day:
From looking at https://sal.toolforge.org/production?p=0&q=deploy1002&d=2021-11-11, it looks like all CirrusSearch traffic was routed to codfw: https://wikitech.wikimedia.org/wiki/Incident_documentation/2021-11-10_cirrussearch_commonsfile_outage. If search traffic is still routed through codfw (I think it is) that might be part of the reason why we see increased variability in the rendering speed.
The CacheDecorator is supposed to minimize how we are affected by search speeds, but perhaps there is some uncached call to the search backend somewhere in the rendering of Special:Homepage.
Traffic is still being routed to codfw. Presumably these are CacheDecorator misses that are showing up in the rendering speed variations.
CacheDecorator::filter(), called after cache hits, also goes through CirrusSearch (although should be in theory much faster than the other searches).
Maybe we should profile search speed like we do with the API etc?
Change 741908 had a related patch set uploaded (by Kosta Harlan; author: Kosta Harlan):
[mediawiki/extensions/GrowthExperiments@master] statsd: Instrument local search task suggester
Good idea; done in the above patch. I propose that we merge the patch, then wait to see how the graphs look after traffic is switched back to eqiad.
Looking at CacheDecorator::filter(), the spikiness probably occurs when the user has more than one task type selected, because then we'll need to make an additional 2-3 queries to ElasticSearch.
Moving back to In Progress for the actual investigation (once some data has been collected).
Change 741908 merged by jenkins-bot:
[mediawiki/extensions/GrowthExperiments@master] statsd: Instrument local search task suggester
CirrusSearch went back to eqiad today around 2021-11-29-T20:00 UTC. So far the Special:Homepage graph looks a little better, but too early to say.
yes, it definitely looks better, thanks for having a look. Leaving this task in progress until the instrumentation patch is in production, and relevant panels can be added to the dashboard.