Page MenuHomePhabricator

Reduce the load of CirrusSearch update jobs on MW jobrunners
Closed, ResolvedPublic

Description

Looking at the flame graphs of a jobrunner it appears that CirrusSearch jobs are taking most of the jobrunner resources.

Few ideas to improve the situation:

  • verify that ContentHandler::getParserOutputForIndexing() is not asking to render the HTML output on wikidata
    • generate-html is set to false when rendering the output.
  • disable the saneitizer for one week and assess the impact
    • if the impact is big consider lowering the number of parses by making a dedicated profile for wikis like commons and increase reindex_after_loops from 8 to e.g. 16.
  • verify that running the jobs for both eqiad & codfw re-use the parser output (no double parse)
    • eqiad and codfw writes are done in the same job and they re-use the same documents
      • the above statement is wrong, the parser output is actually accessed twice but I believe that \MediaWiki\Page\ParserOutputAccess::$localCache is being used to avoid a double parse
  • Consider using memcache (~6hours ttl) to hold the indexed content to be re-used by subsequent ElasticaWrite jobs running for cloudelastic

AC:

  • reduce by X% the impact of CirrusSearch jobs on jobrunners

Event Timeline

Change 920785 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[mediawiki/extensions/CirrusSearch@master] Add WANCache to ParserOutputPageProperties::finalize

https://gerrit.wikimedia.org/r/920785

I don't know internals of CirrusSearch very well so the patch might be super super wrong. Sorry if I missed something super obvious.

Change 920785 merged by jenkins-bot:

[mediawiki/extensions/CirrusSearch@master] Add WANCache to ParserOutputPageProperties::finalize

https://gerrit.wikimedia.org/r/920785

Change 924568 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[mediawiki/extensions/CirrusSearch@wmf/1.41.0-wmf.11] Add WANCache to ParserOutputPageProperties::finalize

https://gerrit.wikimedia.org/r/924568

Change 924569 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[mediawiki/extensions/CirrusSearch@wmf/1.41.0-wmf.10] Add WANCache to ParserOutputPageProperties::finalize

https://gerrit.wikimedia.org/r/924569

Change 924568 merged by jenkins-bot:

[mediawiki/extensions/CirrusSearch@wmf/1.41.0-wmf.11] Add WANCache to ParserOutputPageProperties::finalize

https://gerrit.wikimedia.org/r/924568

Mentioned in SAL (#wikimedia-operations) [2023-05-30T20:30:32Z] <ladsgroup@deploy1002> Started scap: Backport for [[gerrit:924568|Add WANCache to ParserOutputPageProperties::finalize (T336698)]]

Mentioned in SAL (#wikimedia-operations) [2023-05-30T20:32:00Z] <ladsgroup@deploy1002> ladsgroup: Backport for [[gerrit:924568|Add WANCache to ParserOutputPageProperties::finalize (T336698)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet

Mentioned in SAL (#wikimedia-operations) [2023-05-30T20:39:59Z] <ladsgroup@deploy1002> Finished scap: Backport for [[gerrit:924568|Add WANCache to ParserOutputPageProperties::finalize (T336698)]] (duration: 09m 27s)

Change 924569 merged by jenkins-bot:

[mediawiki/extensions/CirrusSearch@wmf/1.41.0-wmf.10] Add WANCache to ParserOutputPageProperties::finalize

https://gerrit.wikimedia.org/r/924569

Mentioned in SAL (#wikimedia-operations) [2023-05-30T20:57:02Z] <ladsgroup@deploy1002> Started scap: Backport for [[gerrit:924569|Add WANCache to ParserOutputPageProperties::finalize (T336698)]]

Mentioned in SAL (#wikimedia-operations) [2023-05-30T20:58:36Z] <ladsgroup@deploy1002> ladsgroup: Backport for [[gerrit:924569|Add WANCache to ParserOutputPageProperties::finalize (T336698)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet

@Ladsgroup the impact is impressive, thanks!
I'm tempted to skip the disable the saneitizer for one week and assess the impact idea and consider the improvements you've made via the use of memcache good enough to achieve the desired outcome on the load of the jobrunners.
Tentatively closing but please feel free to re-open if you still want us to investigate the impact of the saneitizer on commons.

Thanks. I honestly would like to have some visibility to the load of sanitizer. If we could find a way to make it to the flamegraphs (e.g. by using a dedicated class), that'd be more than enough for me.

Thanks. I honestly would like to have some visibility to the load of sanitizer. If we could find a way to make it to the flamegraphs (e.g. by using a dedicated class), that'd be more than enough for me.

Mostly to be able to make informed decisions, e.g. if job runners are about to fall apart, we could save x% by turning off sanitizer jobs. Or realizing that it's only small portion of the load and we can safely ignore them, etc.

Change 924904 had a related patch set uploaded (by DCausse; author: DCausse):

[mediawiki/extensions/CirrusSearch@master] Help measure the impact of saneitizer jobs

https://gerrit.wikimedia.org/r/924904

Thanks. I honestly would like to have some visibility to the load of sanitizer. If we could find a way to make it to the flamegraphs (e.g. by using a dedicated class), that'd be more than enough for me.

Mostly to be able to make informed decisions, e.g. if job runners are about to fall apart, we could save x% by turning off sanitizer jobs. Or realizing that it's only small portion of the load and we can safely ignore them, etc.

Sure, added a small patch that should create another branch in the flamegraphs when these jobs are processed.

Change 924904 merged by jenkins-bot:

[mediawiki/extensions/CirrusSearch@master] Help measure the impact of saneitizer jobs

https://gerrit.wikimedia.org/r/924904

Change 926860 had a related patch set uploaded (by Ladsgroup; author: DCausse):

[mediawiki/extensions/CirrusSearch@wmf/1.41.0-wmf.11] Help measure the impact of saneitizer jobs

https://gerrit.wikimedia.org/r/926860

Change 926860 merged by jenkins-bot:

[mediawiki/extensions/CirrusSearch@wmf/1.41.0-wmf.11] Help measure the impact of saneitizer jobs

https://gerrit.wikimedia.org/r/926860

Mentioned in SAL (#wikimedia-operations) [2023-06-05T22:03:50Z] <ladsgroup@deploy1002> Started scap: Backport for [[gerrit:926860|Help measure the impact of saneitizer jobs (T336698)]]

Mentioned in SAL (#wikimedia-operations) [2023-06-05T22:05:30Z] <ladsgroup@deploy1002> ladsgroup: Backport for [[gerrit:926860|Help measure the impact of saneitizer jobs (T336698)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet

Mentioned in SAL (#wikimedia-operations) [2023-06-05T22:13:39Z] <ladsgroup@deploy1002> Finished scap: Backport for [[gerrit:926860|Help measure the impact of saneitizer jobs (T336698)]] (duration: 09m 48s)