Page MenuHomePhabricator

getLocalServerObjectCache() should default to HashBagOStuff in CLI mode.
Closed, DeclinedPublic

Description

MediaWikiServices::getLocalServerObjectCache() will currently return an EmptyBagOStuff in maintenance scripts, leading to performance degradation for batch jobs (e.g. T254430). Getting a non-functioning cache seems surprising. Perhaps we should default to a HashBagOStuff instead, so the cache will work at least transiently.

The root cause of this is the logic in ObjectCache::makeLocalServerCache() which is used by the wiring for 'LocalServerObjectCache'. Perhaps we want the wiring to call ObjectCache::getLocalServerInstance( 'hash' ) instead. However, ObjectCache::getLocalServerInstance() currently uses MediaWikiServices::getLocalServerObjectCache() , so this relationship would have to be inverted.

Usages of MediaWikiServices::getLocalServerObjectCache():
https://codesearch.wmcloud.org/search/?q=getLocalServerObjectCache&i=nope&files=&repos=

Event Timeline

Change 627466 had a related patch set uploaded (by Daniel Kinzler; owner: Daniel Kinzler):
[mediawiki/core@master] Fix caching in CLI mode.

https://gerrit.wikimedia.org/r/627466

daniel renamed this task from ObjectCache should default to HashBagOStuff in CLI mode. to getLocalServerObjectCache() should default to HashBagOStuff in CLI mode. .Sep 15 2020, 3:41 PM
daniel updated the task description. (Show Details)

Caller survey:

Core usages:

  • WikiMap::getCanonicalServerInfoForAllWikis would benefit from in-process caching when called multiple times.
  • TransformationImageHandler::getMagickVersion would benefit from in-process caching when called multiple times.
  • EntryPoint::createRouter() is only used for web requests, so it's irrelevent.

Service wiring:

  • SiteStore service would benefit from in-process caching when called multiple times.
  • BadFileLookup service would benefit from in-process caching when called multiple times.
  • CryptHKDF wiring explicitly checks for EmptyBagOStuff and falls back to the local cluster instance in that case. This would probably have to be changed to also fall back in case of HashBagOStuff.
  • DBLoadBalancerFactory service benefits from using a hash, performance degrades with an EmptyBagOStuff, see T254430
  • FileBackendGroup wiring explicitly checks for EmptyBagOStuff and falls back to the local cluster instance in that case. This would probably have to be changed to also fall back in case of HashBagOStuff.
  • GlobalIdGenerator wiring explicitly uses an EmptyBagOStuff in CLI mode. The logic seems a bit backwards, but it won't break if we change getLocalServerObjectCache().
  • MessageCache wiring will use an EmptyBagOStuff if the UseLocalMessageCache option is not set. If it is set, we'd presumably want some caching to be applied in CLI mode, or at least it wouldn't hurt.
  • SiteStore wiring falls back to the local cluster cache if the local server cahes is an EmptyBagOStuff. This would probably have to be changed to also fall back in case of HashBagOStuff.

(Extensions TBD)

It seems that some code checks for EmptyBagOStuff, and falls back to a slower but wider cache type in that case. This code could be made to check for HashBagOStuff as well.

It would perhaps be cleaner to introduce an ObjectCacheManager service that has getters for different needs, like "local or hash" or "local or cluser". That would also cover T243233.

Change 627466 abandoned by Daniel Kinzler:
[mediawiki/core@master] Fix caching in CLI mode.

Reason:
Per Krinkle'S comment

https://gerrit.wikimedia.org/r/627466

@Krinkle wrote on https://gerrit.wikimedia.org/r/c/mediawiki/core/+/627466

@daniel the risk is that CLI contexts are generally sensitive in writes and don't expect caching, much how like in POST requests we generally don't consult caches to inform writes, either. The difference being that POST as web requests has always seen APCu so we've spent 15 years making sure we get that right. CLI on the other hand has effectively never seen caching from the "Local server" service. It's implied and generally understood in PHP development that apcu is always off on the CLI.

Enabling that service to have caching can have pretty widespread implications which I don't think are fully understood right now. at least not by me.

As an example, UID Generator uses APCU to optionally coordinate coordinate between proceses on the same server. Making the local server cache return something that is in fact not shared by other proceses on the same server, would violate that expectation. [1] I haven't reviewed it fully, but it seems likely that changing that would introduce suble problems.

@Krinkle thank you for explaining this. I understand this better now. It seems to me though that this needs to be made more explicit in the code base, and we should think through the implications.

For instance, Wikimedia runs the job queue via web requests, but most other people run them via the command line script. Depending on how jobs are executed, code deep down the stack in different parts of the code base may suddenly have very different performance characteristics. That seems problematic to me.

I'd much prefer a model where maintenance scripts would declare the caching behavior they want.

Closing in favor of T243233 and the introduction of a proper ObjectCacheManager service.