Page MenuHomePhabricator

MultiWriteBagOStuff caches are missing DI defaults
Closed, ResolvedPublic

Description

The MultiWriteBagOStuff class was changed at some point to not call static ObjectCache::newFromParams from its constructor, as this bypassed dependency injection principles.

However, we never compensated for this by injecting things like $logger from a higher level into the configuration entries in $wgObjectCaches that use MultiWriteBagOStuff.

MultiWriteBag is used at WMF for its ParserCache as mysql-multiwrite: 0=MemcachedBag, 1=SqlBag. The outer MultiWrite bag naturally gets dependencies injected when it is obtained via ObjectCache::getInstance from service wiring. However, the inner bags have lost the state that would normally be injected.

From wmf-config/CommonSettings.php:

$wgObjectCaches['mysql-multiwrite'] = [
	'class' => 'MultiWriteBagOStuff',
	'caches' => [
		0 => [
			'factory' => [ 'ObjectCache', 'getInstance' ],
			'args' => [ 'mcrouter-with-onhost-tier' ]
		],
		1 => [
			'class' => 'SqlBagOStuff',
			'servers' => $pcServers,
			'purgePeriod' => 0,
			'tableName' => 'pc',
			'shards' => 256,
		],
	],
];

The result is, for example, when using "Verbose logs" via WikimediaDebug and performing an edit or purge, that there are no log entries about the writes to the ParserCache database.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript
Krinkle renamed this task from [Regression] MultiWriteBagOStuff caches are missing DI defaults to MultiWriteBagOStuff caches are missing DI defaults.Sep 21 2022, 8:01 PM
Krinkle added a project: Regression.

The side-effects of this are that besides lacking logger, it also lacks keyspace, asyncHandler and stats injected to the inner SqlBag used by ParserCache.

For keyspace this (fortunately) not causing corruption today (i.e. all wikis sharing a conflicting keyspace) because ::makeKey() calls to MultiWrite are routed to the first entry (mcrouter) and that one is enjoying dependency injection by proxy of re-using the 'mcrouter-with-onhost-tier' instance by reference, which has its own entry in $wgObjectCaches and gets constructed properly.

For asyncHandler this appears to luckily not cause an issue because the main use case of it in this context is to defer the SQL writes, which the outer MultiWrite bag does based on its own injected asyncHandler value. The inner SqlBag does sometimes make use of asyncHandler as well, but only for garbage collection which is disabled for this particular SqlBag.

For stats this means we don't record any get/set metrics of the inner Sql bag. Afaik we don't currently plot or monitor any of the BagOStuff metrics anyway, but in this case we'd already get them from the mcrouter one for sets. Having both is probalby not super useful and would lead to somewhat confusing entries in Graphite I imagine, so also kind of working well as-is, albeit by accident.

Change 834383 had a related patch set uploaded (by Krinkle; author: Krinkle):

[mediawiki/core@master] objectcache: Fix lack of DI for MultiWriteBagOStuff sub caches

https://gerrit.wikimedia.org/r/834383

Change 834383 merged by jenkins-bot:

[mediawiki/core@master] objectcache: Fix lack of DI for MultiWriteBagOStuff sub caches

https://gerrit.wikimedia.org/r/834383

Most of it is coming through now, except the one log message I was originally interested in that is arguably the most important in Rdbms - Logging of the actual query being performed.

I've confirmed via eval.php in prod, using ObjectCache::getInstance and TestingAccessWrapper, that the logger is correctly injected all the way to the SqlBag, and from there also to the IDatabase object in its conns after a dummy $sqlBag->get('x').

The reason it is logging everything except the queries, is because that logic is behind the DBO_DEBUG flag in Database.php. And when SqlBagOStuff creates its own connection objects, this uses DatabaseFactory which has no knowledge of $wgDebugDumpSql and/or the flags setting that MWLBFactory would have computed for general LB-based connections.

Solutions that come to mind:

  • Move the DI logic from MWLBFactory to DatabaseFactory, and move the latter to ServiceWiring with wgDebugDumpSql as the first option it handles, as default for `flags.
  • Duplicate the logic in ObjectCache::newFromParams under the SqlBag condition, which is what we currently do for other DB-related settings that apply to both LBF and SqlBag.

You asked me to take a look and I just got to look at this, yes. I agree the DI and parameterization of config for DB object creation is literally all over the place[1]. gerrit:886011 would help but we more to do, once the refactors are done, we can take a look at fixing this once it's more central. Sorry it's a vague response, I have nothing better to contribute :(

[1] What I recently learned that for some parameters that gets passed to lb config, they will override the parameters set to init the db object and if they are not set, the code's default will override the parameters for db config (which is the exact opposite of definition of default). I think 'flags' is one of them.

Change 951168 had a related patch set uploaded (by D3r1ck01; author: Derick Alangi):

[mediawiki/core@master] rdbms: Make DatabaseFactory know aobut DebugDumpSql

https://gerrit.wikimedia.org/r/951168

Change 951168 abandoned by D3r1ck01:

[mediawiki/core@master] rdbms: Make DatabaseFactory know about DebugDumpSql

Reason:

scrap this, will retry in another patch.

https://gerrit.wikimedia.org/r/951168

Change 951168 restored by D3r1ck01:

[mediawiki/core@master] rdbms: Make DatabaseFactory know about DebugDumpSql

https://gerrit.wikimedia.org/r/951168

Change 952896 had a related patch set uploaded (by D3r1ck01; author: Derick Alangi):

[mediawiki/core@master] DatabaseFactory: Consolidate MWLBFactory into DatabaseFactory

https://gerrit.wikimedia.org/r/952896

Change 952938 had a related patch set uploaded (by D3r1ck01; author: Derick Alangi):

[mediawiki/core@master] objectCache: Make DBFactory aware of DebugDumpSql setting

https://gerrit.wikimedia.org/r/952938

Change 952938 merged by jenkins-bot:

[mediawiki/core@master] objectcache: Make SqlBagOStuff aware of wgDebugDumpSql setting

https://gerrit.wikimedia.org/r/952938

Change 954963 had a related patch set uploaded (by D3r1ck01; author: Derick Alangi):

[mediawiki/core@master] objectcache: Juggle `MainConfigNames::DebugDumpSql` type to int

https://gerrit.wikimedia.org/r/954963

Change 954963 merged by jenkins-bot:

[mediawiki/core@master] objectcache: Use DBO_DEBUG when DebugDumpSql setting is set to `true`

https://gerrit.wikimedia.org/r/954963

@DAlangi_WMF Test plan:

To test this in production, we'll need to create an interaction with MultiWriteBagOStuff. In production, we use this class for the ParserCache (Wikitech docs). During cache reads ("get") it will try MemcBag first and then fallback to SqlBag (ParserCache DB). During cache writes, it writes to both.

The way I'd do this is through WikimediaDebug with Verbose logging enabled, and then view a random page on test.wikipedia.org. The reason for testwiki and random is that it will increase odds of it not being warmed up in the first layer (Memc) and thus will likely result in a MultiBag->SqlBag interaction.

You can then confirm in Logstash by reqId what happened, e.g. was it a cache hit in Memc? Ideally not. Is there than a SELECT query logged to a parser cache server? To confirm that the query did happen, we can see if Rdbms logged a connection to one of the pc**** hosts.

If we find it here, then this is done. If we don't find it, we can use mwmaint and eval.php to test it under more controlled circumstances. This way we don't have to hope for a cache miss.

For example:

krinkle@mwdebug1001$
sudo -u "$MEDIAWIKI_WEB_USER" MW_DEBUG_LOCAL=1 php /srv/mediawiki/multiversion/MWScript.php eval.php enwiki


$mb = ObjectCache::getInstance('mysql-multiwrite');
$sb = Wikimedia\TestingAccessWrapper::newFromObject($mb)->caches[1];
$sb->get('krinkle:test:1');
$sb->get('krinkle:test:2');
krinkle@mwdebug1001$ rm /tmp/wiki.log
krinkle@mwdebug1001$ tail -f /tmp/wiki.log

From there you can narrow it down further. If you get stuck, let me know and we can pair up on some of this.

Change 955336 had a related patch set uploaded (by Krinkle; author: Derick Alangi):

[mediawiki/core@master] rdbms: Move DebugDumpSql knowledge from SqlBagOStuff to DatabaseFactory

https://gerrit.wikimedia.org/r/955336

Change 955336 merged by jenkins-bot:

[mediawiki/core@master] rdbms: Move DebugDumpSql knowledge from SqlBagOStuff to DatabaseFactory

https://gerrit.wikimedia.org/r/955336

Change 956917 had a related patch set uploaded (by Krinkle; author: Derick Alangi):

[mediawiki/core@master] rdbms: Use `debugSql` instead of `debugDumpSql` which is unuset

https://gerrit.wikimedia.org/r/956917

Change 956917 merged by jenkins-bot:

[mediawiki/core@master] rdbms: Use `debugSql` instead of `debugDumpSql` which is unuset

https://gerrit.wikimedia.org/r/956917

Change 956817 had a related patch set uploaded (by D3r1ck01; author: Derick Alangi):

[mediawiki/core@wmf/1.40.0-wmf.26] rdbms: Use `debugSql` instead of `debugDumpSql` which is unuset

https://gerrit.wikimedia.org/r/956817

Change 956817 abandoned by D3r1ck01:

[mediawiki/core@wmf/1.40.0-wmf.26] rdbms: Use `debugSql` instead of `debugDumpSql` which is unuset

Reason:

https://gerrit.wikimedia.org/r/956817

Change 956818 had a related patch set uploaded (by D3r1ck01; author: Derick Alangi):

[mediawiki/core@wmf/1.41.0-wmf.26] rdbms: Use `debugSql` instead of `debugDumpSql` which is unuset

https://gerrit.wikimedia.org/r/956818

Change 956818 merged by jenkins-bot:

[mediawiki/core@wmf/1.41.0-wmf.26] rdbms: Use `debugSql` instead of `debugDumpSql` which is unuset

https://gerrit.wikimedia.org/r/956818

Mentioned in SAL (#wikimedia-operations) [2023-09-13T13:18:31Z] <lucaswerkmeister-wmde@deploy1002> Started scap: Backport for [[gerrit:956818|rdbms: Use debugSql instead of debugDumpSql which is unuset (T318272)]]

Mentioned in SAL (#wikimedia-operations) [2023-09-13T13:20:13Z] <lucaswerkmeister-wmde@deploy1002> lucaswerkmeister-wmde and d3r1ck01: Backport for [[gerrit:956818|rdbms: Use debugSql instead of debugDumpSql which is unuset (T318272)]] synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)

Mentioned in SAL (#wikimedia-operations) [2023-09-13T13:34:13Z] <lucaswerkmeister-wmde@deploy1002> Finished scap: Backport for [[gerrit:956818|rdbms: Use debugSql instead of debugDumpSql which is unuset (T318272)]] (duration: 15m 42s)

There was some back and forth on this but the issue is fixed, and got backported and deployed today and it's finally logging the queries now. All looks good 🎉

Thank you @Krinkle for the collaboration on this one 🤝

Change 952896 abandoned by D3r1ck01:

[mediawiki/core@master] [WIP] DatabaseFactory: Consolidate MWLBFactory into DatabaseFactory

Reason:

https://gerrit.wikimedia.org/r/952896