Page MenuHomePhabricator

Obsessive serverIsReadOnly() checking in MySQL
Open, HighPublic

Description

I was looking into optimizing database queries for entity RDF dumping, and I have noticed a real lot of serverIsReadOnly() calls:

[DBQuery] Wikimedia\Rdbms\DatabaseMysqlBase::serverIsReadOnly [0.001s]: SHOW GLOBAL VARIABLES LIKE 'read_only'

On the course of test dumpRdf.php run for 10 entities, this function was called 130 times. That is for a workload that doesn't even need a single write, so it doesn't matter even a little bit whether DB is read only. But it seems to check it obsessively:

DBQuery] Wikimedia\Rdbms\DatabaseMysqlBase::serverIsReadOnly [0.001s]: SHOW GLOBAL VARIABLES LIKE 'read_only'
[DBQuery] LinkCache::fetchPageRow [0s]: SELECT  page_id,page_len,page_is_redirect,page_latest,page_restrictions,page_content_model,page_touched  FROM `page`    WHERE page_namespace = '6' AND page_title = 'LighthouseinDublin2.jpg'  LIMIT 1  
[DBQuery] Wikimedia\Rdbms\DatabaseMysqlBase::serverIsReadOnly [0.001s]: SHOW GLOBAL VARIABLES LIKE 'read_only'
[DBQuery] LinkCache::fetchPageRow [0s]: SELECT  page_id,page_len,page_is_redirect,page_latest,page_restrictions,page_content_model,page_touched  FROM `page`    WHERE page_namespace = '6' AND page_title = 'Unnamed.png'  LIMIT 1

etc. I think this needs to be fixed - which that DB call is probably very fast, doing it hundreds over hundreds of times - and I assume this is real DB call that goes over the wire? - will add up.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJul 12 2019, 2:03 AM
Smalyshev updated the task description. (Show Details)Jul 12 2019, 2:03 AM

May I ask where you tested this, and if it was on your own installation, more data about it (version, topology, configuration, etc.?), and in any case, how you did profile the queries executed (just setup debug for all queries?)? Also please point us to the code entry for that function (is it using a master or a replica to perform the reads?).

Smalyshev added a comment.EditedJul 12 2019, 7:53 AM

May I ask where you tested this

On a VM (specifically federated-commons.eqiad.wmflabs)

and if it was on your own installation, more data about it (version, topology, configuration, etc.?)

Nothing too special, regular mediawiki install AFAIK, with Wikibase, Commons & WikibaseMediaInfo, running against local DB.

and in any case, how you did profile the queries executed

Regular debug logging.

Also please point us to the code entry for that function

Specific script that I've been using was:

php extensions/Wikibase/repo/maintenance/dumpRdf.php --entity-type item --limit 10 --output test

but I think any code calling LinkCache::fetchPageRow would encounter the same. The way the code gets to this specific function is kinda complex, but I could post a backtrace if it is helpful.

Based on your feedback, my guess is that because you are reading from the "master" (there is no other host, really), this effect happens. Probably this doesn't happen on WMF production where reads are from replicas and they are already read only, and lag is (I believe) cached. I am guessing the lack of caching infrastructure + single master topology is causing this. That doesn't mean it is not an issue, there is probably a way to optimize this, but I will let others comment if/how, as I am more familiar with WMF use case than mediawiki in general.

Smalyshev added a comment.EditedJul 12 2019, 10:17 PM

Probably this doesn't happen on WMF production

That's good to know, thanks for checking it. But I think still looks weird and may be some logical flaw here - and may be performance issue for non-WMF people running Mediawiki on single DB, happens quite a lot.

Is $wgMainCacheType set to CACHE_NONE ?

Is $wgMainCacheType set to CACHE_NONE ?

Yes, looks like it. Should it be changed? What would be the proper setting for standalone VM install?

jcrespo removed a project: DBA.Jul 15 2019, 7:57 AM

I will remove DBA but remain subscribed, as all points this should be a decision of mediawiki optimization for 3rd party usage, not WMF.

aaron added a comment.EditedJul 15 2019, 5:47 PM

The relevant getWithSetCallback() call uses pcTTL, so there still shouldn't be many of these queries. Unless a large number of distinct connections were acquired. Not just that, but connections to different load balancer clusters.

kchapman assigned this task to aaron.Jul 15 2019, 8:04 PM
kchapman moved this task from Inbox to Doing on the Performance-Team board.

Change 523660 had a related patch set uploaded (by Aaron Schulz; owner: Aaron Schulz):
[mediawiki/core@master] rdbms: refactor caching in LoadBalancer::getReadOnlyReason()

https://gerrit.wikimedia.org/r/523660

@aaron I'm going to untag CPT, let us know if we'll be needed for review at a later point.

Krinkle triaged this task as High priority.Tue, Jul 23, 7:09 PM
Restricted Application added a project: Core Platform Team. · View Herald TranscriptTue, Jul 23, 7:09 PM