Page MenuHomePhabricator

Migrate TwoColConflict usage of objectcache away from the table
Closed, ResolvedPublic

Description

Currently there are two usages of objectcache table in production. One is from TCC. Nothing in production should use objectcache table and it's currently blocking schema changes on it: T272512: Apply outstanding schema changes for "objectcache" tables in production (exptime, flags, modtoken)

If the plan is just to be a cache. Simply use MainWANObjectCache. If you need something that's multi-dc aware otherwise conflicts corrupt data (in case of dc switchover or active/active). Then you need to use MainStash. Having it on this table (specially given that it has huge blobs) would make db replication of all of the wiki slower.

Event Timeline

Change 748792 had a related patch set uploaded (by Thiemo Kreuz (WMDE); author: Thiemo Kreuz (WMDE)):

[mediawiki/extensions/TwoColConflict@master] [DNM] Replace \"db-replicated\" cache with WANObjectCache

https://gerrit.wikimedia.org/r/748792

Change 748793 had a related patch set uploaded (by Thiemo Kreuz (WMDE); author: Thiemo Kreuz (WMDE)):

[mediawiki/extensions/FileImporter@master] [DNM] Replace \"db-replicated\" cache with WANObjectCache

https://gerrit.wikimedia.org/r/748793

@Ladsgroup, I'm a bit lost. What is a (the?) "object cache table"?

I found MediaWikiServices::getInstance()->getMainWANObjectCache() as a possible replacement for the currently used ObjectCache::getInstance( 'db-replicated' ). But I don't know how it behaves, and how it compares to the old method.

The behavior we want in Two-Column-Edit-Conflict-Merge and FileImporter is mostly the same (see T254440 and T226075 for when it was introduced): We want a larger value (something we can't put in a URL) to survive when the user opens a link in a new tab. The value should ideally live for something like 60 minutes. What's the best caching layer for this?

@Ladsgroup, I'm a bit lost. What is a (the?) "object cache table"?

There is a core table called "objectcache" That is the table.

I found MediaWikiServices::getInstance()->getMainWANObjectCache() as a possible replacement for the currently used ObjectCache::getInstance( 'db-replicated' ). But I don't know how it behaves, and how it compares to the old method.

The behavior we want in Two-Column-Edit-Conflict-Merge and FileImporter is mostly the same (see T254440 and T226075 for when it was introduced): We want a larger value (something we can't put in a URL) to survive when the user opens a link in a new tab. The value should ideally live for something like 60 minutes. What's the best caching layer for this?

It depends if the value can be regenerated or not. WAN cache values sometimes won't be shown: 1- When there is a dc switchover (before the switch, the user saves it to memcached of this dc, after the switch they try to get it from the memcached of the other dc where the value doesn't exist) 2- When memcached has to evict for whatever reason.

If what you're storing is canonical data and there is no way to regenerate it, then you need to use MainStash (MainStash itself is being moved to something else IIRC but when the time comes, we know what's using it and move all together), if it can be generated then better to use WAN (it's okay if re-generation is slow/expensive, the ratio is small).

HTH

P.S. If it's canonical and can't be regenerated but small loss is acceptable, still better to go with WAN.

@Ladsgroup, I tried a few things but had to give up for now.

  • I tried to use said WANObjectCache, but it doesn't work when I test it locally. It's like the cache behaves like dev/null.
  • Everything in WANObjectCache is final. This makes it impossible to write tests.
  • WANObjectCache doesn't really have any interfaces either. I don't like having to hard-code dependencies on such a super specific 3100+ lines implementation.
  • BagOStuff and WANObjectCache don't share an interface for the relevant get and set methods, even if the methods are more or less the same.

There must be another way. Doesn't one of the BagOStuff subclasses and instances do what we want? Loosing the cache in case of e.g. a datacenter switchover is fine.

@Ladsgroup, I tried a few things but had to give up for now.

  • I tried to use said WANObjectCache, but it doesn't work when I test it locally. It's like the cache behaves like dev/null.

Yes. These caches are disabled by default. You need to explicitly enable them in your LocalSettings.php

This wouldn't work in tests, all tests just by default change all cache objects to be null so even if you set the cache type, it just fails.

  • Everything in WANObjectCache is final. This makes it impossible to write tests.

You can mock the object.

  • WANObjectCache doesn't really have any interfaces either. I don't like having to hard-code dependencies on such a super specific 3100+ lines implementation.

I do understand the code is suboptimal and it's in dire need of improvements. Improving these systems is in my roadmap and I have already started with IDatabase (T296960: Remove unused or barely used functions of IDatabase) but it takes time. In the meantime, you can write an adapter or facade to encapsulate the class. That's what Wikibase has done.

  • BagOStuff and WANObjectCache don't share an interface for the relevant get and set methods, even if the methods are more or less the same.

I suggest using the method getWithSetCallback only. That's the only thing you should need and it is shared between BagOStuff and WAN. And it's heavily, heavily used basically everywhere so it's quite mature and stable.

Change 749158 had a related patch set uploaded (by Thiemo Kreuz (WMDE); author: Thiemo Kreuz (WMDE)):

[mediawiki/extensions/FileImporter@master] [DNM] Rewrite to use session instead of \"db-replicated\"

https://gerrit.wikimedia.org/r/749158

These caches are disabled by default.

I don't think we can expect 3rd party installations to have a specific configuration just to not have a feature of the product being broken.

You can mock the object.

Maybe I miss something, but PHPUnit's mocks are subclasses. Subclasses can't overload methods that are marked as final.

[…] write an adapter or facade to encapsulate the class.

Makes sense, thanks.

I suggest using the method getWithSetCallback only.

I'm afraid that's not how the Two-Column-Edit-Conflict-Merge and FileImporter code we are talking about is arranged. We need to forward additional information to a spot in the UI that doesn't have access to where the data originally came from. This is a Status object in FileImporter, and an unsaved version of a wikitext page (up to 2 MiB) in TwoColConflict. The current implementation makes it possible to come back to the same URL any time later (within the TTL of 1 day), and the same user will see the same information (because the user is part of the cache key), but other users won't.

Maybe I'm overthinking this way to much. Session? I tried, but couldn't make it work either.

Server-side sessions is stored in generalized system that is MainStash (a different instance for security reasons but the technology is the same). Why not using MainStash instead If WAN is not working for you?

The service itself is called MainObjectStash. You can access it with MediaWikiServices::getInstance()->getMainObjectStash() which uses the $wgMainStash config and returns a BagOStuff. The default is db-replicated (which is the same you were using). But in production it goes to redis (via kask).

HTH

Change 749169 had a related patch set uploaded (by Thiemo Kreuz (WMDE); author: Thiemo Kreuz (WMDE)):

[mediawiki/extensions/FileImporter@master] Replace hard-coded \"db-replicated\" with MainObjectStash service

https://gerrit.wikimedia.org/r/749169

Change 749169 merged by jenkins-bot:

[mediawiki/extensions/FileImporter@master] Replace hard-coded \"db-replicated\" with MainObjectStash service

https://gerrit.wikimedia.org/r/749169

Change 749196 had a related patch set uploaded (by Thiemo Kreuz (WMDE); author: Thiemo Kreuz (WMDE)):

[mediawiki/extensions/TwoColConflict@master] Replace hard-coded \"db-replicated\" with MainObjectStash service

https://gerrit.wikimedia.org/r/749196

Change 748792 abandoned by Thiemo Kreuz (WMDE):

[mediawiki/extensions/TwoColConflict@master] [DNM] Replace \"db-replicated\" cache with WANObjectCache

Reason:

See If769c1a.

https://gerrit.wikimedia.org/r/748792

Change 748793 abandoned by Thiemo Kreuz (WMDE):

[mediawiki/extensions/FileImporter@master] [DNM] Replace \"db-replicated\" cache with WANObjectCache

Reason:

See If769c1a.

https://gerrit.wikimedia.org/r/748793

Change 749158 abandoned by Thiemo Kreuz (WMDE):

[mediawiki/extensions/FileImporter@master] [DNM] Rewrite to use session instead of \"db-replicated\"

Reason:

See If769c1a.

https://gerrit.wikimedia.org/r/749158

Change 749196 merged by jenkins-bot:

[mediawiki/extensions/TwoColConflict@master] Replace hard-coded \"db-replicated\" with MainObjectStash service

https://gerrit.wikimedia.org/r/749196

thiemowmde claimed this task.

Sorry, but I don't know what this means.

This may be worth reading as a general refresher:
https://www.mediawiki.org/wiki/Manual:Object_cache