Page MenuHomePhabricator

Investigate rendering speed variations starting around 10 November
Open, MediumPublic

Description

Starting around 10 November, the rendering speed for Special:Homepage becomes much spikier:

image.png (662×2 px, 502 KB)

image.png (662×2 px, 503 KB)

We should investigate what caused it and fix if possible.

Event Timeline

kostajh triaged this task as Medium priority.Wed, Nov 24, 10:30 AM
kostajh created this task.
kostajh renamed this task from Investigate rendering speed variations starting around 11 November to Investigate rendering speed variations starting around 10 November.Wed, Nov 24, 11:59 AM
kostajh updated the task description. (Show Details)

There was no train that week, nor any relevant backport/config patches that day:

From looking at https://sal.toolforge.org/production?p=0&q=deploy1002&d=2021-11-11, it looks like all CirrusSearch traffic was routed to codfw: https://wikitech.wikimedia.org/wiki/Incident_documentation/2021-11-10_cirrussearch_commonsfile_outage. If search traffic is still routed through codfw (I think it is) that might be part of the reason why we see increased variability in the rendering speed.

The CacheDecorator is supposed to minimize how we are affected by search speeds, but perhaps there is some uncached call to the search backend somewhere in the rendering of Special:Homepage.

Traffic is still being routed to codfw. Presumably these are CacheDecorator misses that are showing up in the rendering speed variations.

CacheDecorator::filter(), called after cache hits, also goes through CirrusSearch (although should be in theory much faster than the other searches).
Maybe we should profile search speed like we do with the API etc?

Change 741908 had a related patch set uploaded (by Kosta Harlan; author: Kosta Harlan):

[mediawiki/extensions/GrowthExperiments@master] statsd: Instrument local search task suggester

https://gerrit.wikimedia.org/r/741908

CacheDecorator::filter(), called after cache hits, also goes through CirrusSearch (although should be in theory much faster than the other searches).
Maybe we should profile search speed like we do with the API etc?

Good idea; done in the above patch. I propose that we merge the patch, then wait to see how the graphs look after traffic is switched back to eqiad.

Looking at CacheDecorator::filter(), the spikiness probably occurs when the user has more than one task type selected, because then we'll need to make an additional 2-3 queries to ElasticSearch.

Moving back to In Progress for the actual investigation (once some data has been collected).

Change 741908 merged by jenkins-bot:

[mediawiki/extensions/GrowthExperiments@master] statsd: Instrument local search task suggester

https://gerrit.wikimedia.org/r/741908

From looking at https://sal.toolforge.org/production?p=0&q=deploy1002&d=2021-11-11, it looks like all CirrusSearch traffic was routed to codfw: https://wikitech.wikimedia.org/wiki/Incident_documentation/2021-11-10_cirrussearch_commonsfile_outage. If search traffic is still routed through codfw (I think it is) that might be part of the reason why we see increased variability in the rendering speed.

CirrusSearch went back to eqiad today around 2021-11-29-T20:00 UTC. So far the Special:Homepage graph looks a little better, but too early to say.