Currently, MediaWiki\Extension\DiscussionTools\Hooks\HookUtils::hasPagePropCached is taking 5.5% of all requests to production by querying page props non-stop, can you cache this in memcached for an hour or so?
Description
Details
Related Objects
Event Timeline
For comparison, if you look at all other consumers of databases, this is by far the biggest. The second biggest one is only consuming 1.6% of resources.
I see why it's such a big consumer, it's making the same query more than 30 times in the same request: https://logstash.wikimedia.org/goto/b44eab1488241ab5b1f124c72e66bbf5
Edit: 34 times to be exact
Change #1030603 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):
[mediawiki/extensions/DiscussionTools@master] Fix static cache access
FIWI, that's roughly 24% of all the load on the databases coming from the main appservers. Something like 20% of the load on the all dbs basically.
Wow... Nice finding. Let's deploy the patch sooner rather than later. 20% is quite a lot
Nice one Amir!
Change #1030866 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):
[mediawiki/extensions/DiscussionTools@wmf/1.43.0-wmf.4] Fix static cache access
Change #1030867 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):
[mediawiki/extensions/DiscussionTools@REL1_42] Fix static cache access
Change #1030866 merged by jenkins-bot:
[mediawiki/extensions/DiscussionTools@wmf/1.43.0-wmf.4] Fix static cache access
Change #1030867 merged by jenkins-bot:
[mediawiki/extensions/DiscussionTools@REL1_42] Fix static cache access
Mentioned in SAL (#wikimedia-operations) [2024-05-13T07:41:30Z] <ladsgroup@deploy1002> Started scap: Backport for [[gerrit:1030866|Fix static cache access (T364693)]]
Mentioned in SAL (#wikimedia-operations) [2024-05-13T07:44:01Z] <ladsgroup@deploy1002> ladsgroup: Backport for [[gerrit:1030866|Fix static cache access (T364693)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
Mentioned in SAL (#wikimedia-operations) [2024-05-13T07:58:25Z] <ladsgroup@deploy1002> Finished scap: Backport for [[gerrit:1030866|Fix static cache access (T364693)]] (duration: 16m 54s)
Here is the load on the databases going down:
(deploy finished at 7:56)
Mean latency of mw requests (k8s, currently 80% of traffic):
Mean latency of mw requests (bare metal)
Change #1030881 had a related patch set uploaded (by Reedy; author: Amir Sarabadani):
[mediawiki/extensions/DiscussionTools@REL1_41] Fix static cache access
Change #1030881 merged by jenkins-bot:
[mediawiki/extensions/DiscussionTools@REL1_41] Fix static cache access
Change #1030603 merged by jenkins-bot:
[mediawiki/extensions/DiscussionTools@master] Fix static cache access
The bug was introduced in https://gerrit.wikimedia.org/r/c/mediawiki/extensions/DiscussionTools/+/960158, which was attempting to improve the very same problem :( (T347123: Reduce database queries on parsercached logged-out page views (Sep 2023))
Curiously, this is the second time that we've had the same performance issue in DiscussionTools, the previous one was T297132: DiscussionTools is making duplicate DB requests back to back. I think at least some of the blame lies on the PageProps class being difficult to use for simple cases.
It happens. I wish we had better monitoring to see jumps like this. I actually have a plan for a SLO that would have failed with such regressions forcing us to look at it. Hopefully soon.