Page MenuHomePhabricator

Noticeable increase in db load after wmf.12 roll out
Closed, ResolvedPublic

Assigned To
Authored By
Ladsgroup
Dec 14 2021, 4:27 AM
Referenced Files
F34883063: image.png
Dec 14 2021, 5:54 AM
F34883058: image.png
Dec 14 2021, 5:54 AM
F34883060: image.png
Dec 14 2021, 5:54 AM
F34883056: image.png
Dec 14 2021, 5:50 AM
F34883052: image.png
Dec 14 2021, 5:50 AM
F34883031: image.png
Dec 14 2021, 5:13 AM
F34883013: image.png
Dec 14 2021, 4:28 AM
F34883010: image.png
Dec 14 2021, 4:28 AM
Tokens
"Mountain of Wealth" token, awarded by Marostegui.

Event Timeline

Change 746920 had a related patch set uploaded (by Ladsgroup; author: Esanders):

[mediawiki/extensions/DiscussionTools@wmf/1.38.0-wmf.12] Cache page properties in memory to avoid extra queries

https://gerrit.wikimedia.org/r/746920

I'm fast-tracking several changes that reduce the load and due to T297667 any sort of increase in db load would increase in memory usage.

Change 746920 merged by jenkins-bot:

[mediawiki/extensions/DiscussionTools@wmf/1.38.0-wmf.12] Cache page properties in memory to avoid extra queries

https://gerrit.wikimedia.org/r/746920

Ladsgroup triaged this task as Unbreak Now! priority.Dec 14 2021, 5:07 AM

We can't leave it like this during Christmas. I'm mostly mitigating it, haven't looked it what has actually increased.

Mentioned in SAL (#wikimedia-operations) [2021-12-14T05:09:05Z] <ladsgroup@deploy1002> Synchronized php-1.38.0-wmf.12/extensions/DiscussionTools/includes/Hooks/HookUtils.php: Backport: [[gerrit:746920|Cache page properties in memory to avoid extra queries (T297132 T297669)]] (duration: 00m 57s)

This probably has something to do with parsing because basically every section except wikidata has seen a massive increase:
https://grafana.wikimedia.org/d/000000278/mysql-aggregated?viewPanel=1&orgId=1&from=1639385882027&to=1639450654650

image.png (968×1 px, 163 KB)

Wikidata being massive actually is hiding the problem by reducing impact of this bug in total numbers.

It is possible that T296063: 4x increase in database queries after deploy of 1.38.0-wmf.9 to all wikis is not fixed properly yet (in wmf.9 we reverted the patch, in wmf.12 the refactor with its fixes went live). cc @daniel

We can't leave it like this during Christmas. I'm mostly mitigating it, haven't looked it what has actually increased.

Agree - if we cannot find the issue before the code freeze, this would need to be reverted

I'm looking at the queries in wmf.12 and compare it with an earlier version. It looks like Wikipage::getPageData completely lost any in-process cache it has:

image.png (316×1 px, 129 KB)

image.png (316×1 px, 129 KB)

Shouldn't we avoid using this function completely? I dig.

aand a lot of this:

image.png (555×1 px, 236 KB)

But in previous systems, it was batched:

image.png (117×1 px, 74 KB)

The funny thing is that the new one also queries batched as well:

image.png (140×1 px, 80 KB)

(and then again individually)

Aha, Those were diffefrent queries, the ones that are not batched now are actually categories. Which is from OutputPage::addCategoryLinksToLBAndGetResult I think I'm close to fixing this.

Change 747043 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[mediawiki/core@master] Reuse the query result in addCategoryLinks instead of relying on cache

https://gerrit.wikimedia.org/r/747043

Change 747045 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[mediawiki/core@master] cache: Add two fields to LinkCache::getSelectFields

https://gerrit.wikimedia.org/r/747045

This patch fixes the biggest reason behind the load and I tested it locally that 1- nothing broke in my localhost 2- It fixed the issue.

I found the underlying problem as well ^^. The patch afterwards fixes that.

Change 747043 merged by jenkins-bot:

[mediawiki/core@master] Reuse the query result in addCategoryLinks instead of relying on cache

https://gerrit.wikimedia.org/r/747043

Change 747068 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[mediawiki/core@wmf/1.38.0-wmf.12] Reuse the query result in addCategoryLinks instead of relying on cache

https://gerrit.wikimedia.org/r/747068

Change 747069 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[mediawiki/core@wmf/1.38.0-wmf.13] Reuse the query result in addCategoryLinks instead of relying on cache

https://gerrit.wikimedia.org/r/747069

Change 747068 merged by jenkins-bot:

[mediawiki/core@wmf/1.38.0-wmf.12] Reuse the query result in addCategoryLinks instead of relying on cache

https://gerrit.wikimedia.org/r/747068

Change 747069 merged by jenkins-bot:

[mediawiki/core@wmf/1.38.0-wmf.13] Reuse the query result in addCategoryLinks instead of relying on cache

https://gerrit.wikimedia.org/r/747069

Mentioned in SAL (#wikimedia-operations) [2021-12-14T14:16:31Z] <ladsgroup@deploy1002> Synchronized php-1.38.0-wmf.12/includes/OutputPage.php: Backport: [[gerrit:747068|Reuse the query result in addCategoryLinks instead of relying on cache (T297669)]] (duration: 00m 57s)

So this reduced the number of queries but it's still elevated. The fix for underlying issue will possibly improve things much better.

Change 747072 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[mediawiki/core@wmf/1.38.0-wmf.13] cache: Add four fields to LinkCache::getSelectFields

https://gerrit.wikimedia.org/r/747072

Change 747073 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[mediawiki/core@wmf/1.38.0-wmf.12] cache: Add four fields to LinkCache::getSelectFields

https://gerrit.wikimedia.org/r/747073

Change 747045 merged by jenkins-bot:

[mediawiki/core@master] cache: Add four fields to LinkCache::getSelectFields

https://gerrit.wikimedia.org/r/747045

Change 747072 merged by jenkins-bot:

[mediawiki/core@wmf/1.38.0-wmf.13] cache: Add four fields to LinkCache::getSelectFields

https://gerrit.wikimedia.org/r/747072

Change 747073 merged by jenkins-bot:

[mediawiki/core@wmf/1.38.0-wmf.12] cache: Add four fields to LinkCache::getSelectFields

https://gerrit.wikimedia.org/r/747073

Mentioned in SAL (#wikimedia-operations) [2021-12-14T15:54:00Z] <ladsgroup@deploy1002> Synchronized php-1.38.0-wmf.12/includes/cache/LinkCache.php: Backport: [[gerrit:747073|cache: Add four fields to LinkCache::getSelectFields (T297669)]] (duration: 00m 57s)

So this reduced the number of queries but it's still elevated. The fix for underlying issue will possibly improve things much better.

The underlying fix reduced the load but not as much as I hoped for which is possibly due to all of WAN caches being cold (and warm cache entries possibly contain corrupted data so I'm not sure the new value gets replaced or we have to wait for them to expire) but it look like it's slowly going down. I expect this to be much better state by 24 hours from now. If not, then I suggest reverting the PageStore patch (but very likely won't be needed)

Ladsgroup claimed this task.
Ladsgroup moved this task from In progress to Done on the DBA board.

It is back to original levels or at least it's not noticeable anymore.

Change 811282 had a related patch set uploaded (by Zabe; author: Amir Sarabadani):

[mediawiki/core@REL1_37] cache: Add four fields to LinkCache::getSelectFields

https://gerrit.wikimedia.org/r/811282

Change 811282 merged by jenkins-bot:

[mediawiki/core@REL1_37] cache: Add four fields to LinkCache::getSelectFields

https://gerrit.wikimedia.org/r/811282