Page MenuHomePhabricator

Improve caching in CI tests
Closed, ResolvedPublic

Description

For example browser tests can be way faster if we change their caching from CACHE_DB to CACHE_ACCEL or 'hash'

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Change 516475 had a related patch set uploaded (by Ladsgroup; owner: Ladsgroup):
[mediawiki/core@master] Set cache types to APC/APCu/WinCache in DevelopmentSettings.php

https://gerrit.wikimedia.org/r/516475

^ This patch reduces the time for selenium tests from 1:18 to 0:50 (1 2). Number of calls to SqlBagOStuff reduces drastically:

amsa@amsa-Latitude-7480:~/Downloads$ grep SqlBag mw-debug-www.log | wc -l
1623
amsa@amsa-Latitude-7480:~/Downloads$ grep SqlBag mw-debug-www.log.1 | wc -l
113

Number of DBQueries drop by 1000:

amsa@amsa-Latitude-7480:~/Downloads$ grep DBQuery mw-debug-www.log | wc -l
24814
amsa@amsa-Latitude-7480:~/Downloads$ grep DBQuery mw-debug-www.log.1 | wc -l
23775

Change 516475 merged by jenkins-bot:
[mediawiki/core@master] Set cache types to APC/APCu/WinCache in DevelopmentSettings.php

https://gerrit.wikimedia.org/r/516475

I am not quite sure how MediaWiki selects the caches its is going to use. A few notes:

  • for the PHPUnit test suites, we might just be fine using a per process hash.
  • for browser tests, they do their queries in parallel and when the cache is backed up by sqlite (at least), there are lock contention issues. I had the issue previously with the localization cache which I have fixed by having Quibble to build the cache before proceeding with tests ( T196347 ).

What would be nice is to check which caches are being detected now for each of the PHPUnit testsuite and the browser tests. Elected backends should be findable in the MediaWiki debug log files attached to each builds.

To speed up browser tests, Kosta found out that the PHP built-in server (php -S) is dramatically slower than using Apache. I guess because it is single threaded. T225218: Consider httpd for quibble instead of php built-in server

This seems to have had a ~20% speed improvement – picking two patches merged into MW-core either side of this change (but without any change in the number or nature of tests), the before durations are 135 and 156 seconds, and the after durations are 162 and 187. Of course, the durations bump around based on CI server load, but 30 seconds saved is 30 seconds saved.

Change 516728 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[mediawiki/core@master] DevelopmentSettings: Preserve non-default MainCacheType, simplify override

https://gerrit.wikimedia.org/r/516728

This seems to have had a ~20% speed improvement – picking two patches merged into MW-core either side of this change (but without any change in the number or nature of tests), the before durations are 135 and 156 seconds, and the after durations are 162 and 187. Of course, the durations bump around based on CI server load, but 30 seconds saved is 30 seconds saved.

30 seconds is for core, We have more time saved in extensions that have browser tests (Wikibase, VE, etc.)

Change 516784 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[mediawiki/core@master] installer: Detect APC for MainCacheType in CLI installer

https://gerrit.wikimedia.org/r/516784

Change 516784 merged by jenkins-bot:
[mediawiki/core@master] installer: Detect APC for MainCacheType in CLI installer

https://gerrit.wikimedia.org/r/516784

Change 516728 merged by jenkins-bot:
[mediawiki/core@master] DevelopmentSettings: Remove redundant CacheType overrides

https://gerrit.wikimedia.org/r/516728

How much further improvement are we expecting? Without before/after benchmarks it's hard to know when this is done…

I am not quite sure how MediaWiki selects the caches its is going to use. A few notes:

  • for the PHPUnit test suites, we might just be fine using a per process hash.

MediaWiki does this for PHPUnit tests already.

  • for browser tests, they do their queries in parallel and when the cache is backed up by sqlite, there are lock contention issues. [..]

What would be nice is to check which caches are being detected now [..]

For light-weight values that are not worth a Memc or DB rountrip to fetch, we use APC in MediaWiki always. This is not configurable and there is no opt-in or opt-out. This is referred to as "LocalServerCache" and is separate from the (configurable) WANObjectCache (which wraps "MainCacheType").

The MainCacheType/WANObjectCache is "none" by default, not "db". So we were not using SqlBagOStuff in CI for general caching. That would've likely been slower than no caching at all, and is good that we didn't do that. So for the most part lock contention shouldn't have been an issue during browser tests.

For other caches (like MessageCache, ParserCache and Session) the default is indeed "db", but should have relatively little contention. Aaron has done a lot of work over the past year to improve sqlite performance; e.g. db784ada9f9d1, and 1081356412d. And these cache types have less contention in general. But, with or without contention, using the DB is still slow, and APC would be much faster.

MediaWiki doesn't randomly pick caches at run-time. This is mainly to avoid corrupted or unexpected changes in different cache tiers, hash rings, and to avoid missed purges if the environment shifts back and forth or differs between app servers for some reason. If the environment has changed, we currently require sysadmins to update LocalSettings to reflect these changes (and to ensure hard failure if such things are missing).

But during Installation we auto-detect APC and use it as the default MainCacheType, with the recommendation to install Memc or Redis and configure that for even better performance. We added this in 2017 (35c725e157e53c, T160519) but it only applied to the Web installer. Not the CLI installer. This is now fixed with:

Change 516784

[mediawiki/core@master] installer: Detect APC for MainCacheType in CLI installer
https://gerrit.wikimedia.org/r/516784

How much further improvement are we expecting? Without before/after benchmarks it's hard to know when this is done…

The first commit from Ladsgroup changed ParserCache, MessageCache, and Session from DB to APC.
The second commit (from me) enabled WANCache/MainCacheType, by setting it to APC as well. (previously None, the first commit did set the variable, but it was re-overridden back to None by the installer's generated LocalSettings, which my second commit fixes)

Looking at the graphs, I don't see a conclusive drop or change in either direction, but the anecdotal numbers in this task suffice I think to close it. Even if it had no improvement, there aren't any other major cache controls to enable really. Short of introducing HTMLFileCache/Varnish but that's not likely to have much impact given we're doing not doing many repeat/anon page views, if at all.

Change 517396 had a related patch set uploaded (by Hashar; owner: Krinkle):
[mediawiki/core@REL1_31] installer: Detect APC for MainCacheType in CLI installer

https://gerrit.wikimedia.org/r/517396

Change 517397 had a related patch set uploaded (by Hashar; owner: Krinkle):
[mediawiki/core@fundraising/REL1_31] installer: Detect APC for MainCacheType in CLI installer

https://gerrit.wikimedia.org/r/517397

Change 517400 had a related patch set uploaded (by Hashar; owner: Krinkle):
[mediawiki/core@REL1_32] installer: Detect APC for MainCacheType in CLI installer

https://gerrit.wikimedia.org/r/517400

Change 517402 had a related patch set uploaded (by Hashar; owner: Krinkle):
[mediawiki/core@REL1_33] installer: Detect APC for MainCacheType in CLI installer

https://gerrit.wikimedia.org/r/517402

Change 517396 merged by jenkins-bot:
[mediawiki/core@REL1_31] installer: Detect APC for MainCacheType in CLI installer

https://gerrit.wikimedia.org/r/517396

Change 517402 merged by jenkins-bot:
[mediawiki/core@REL1_33] installer: Detect APC for MainCacheType in CLI installer

https://gerrit.wikimedia.org/r/517402

Change 517400 merged by jenkins-bot:
[mediawiki/core@REL1_32] installer: Detect APC for MainCacheType in CLI installer

https://gerrit.wikimedia.org/r/517400

Change 517397 merged by jenkins-bot:
[mediawiki/core@fundraising/REL1_31] installer: Detect APC for MainCacheType in CLI installer

https://gerrit.wikimedia.org/r/517397