Page MenuHomePhabricator

Investigate coupling in cache layer (shared cache entries in WMF environment)
Closed, ResolvedPublic

Description

  • Repo<->client coupling
  • One version to next week's version

Event Timeline

Some general notes:

  • No cache that I found is explicitly split between Repo and Client functionality. Wikibase has shared cache key configuration settings, with a sharedCacheKeyGroup that’s more general and a sharedCacheKeyPrefix that’s more specific; in production, the sharedCacheKeyGroup is wikidatawiki or testwikidatawiki, and the sharedCacheKeyPrefix is the group plus the current train version. Some caches explicitly use one of those settings as part of their key to share the cache between Repo and Client; one cache uses the same string, containing “repo”, in both Repo and Client; others don’t put anything special in the cache key.
  • Whether the cache is shared between different wikis depends on how the key is created. makeKey() creates a key local to the current wiki, whereas makeGlobalKey() creates one that is shared between all wikis. Wikibase also has a SimpleCacheWithBagOStuff class, which always uses makeKey(), though I’m not sure whether some of those caches wouldn’t benefit from a global key instead. Some caches don’t use anything special to create the key, which is probably bad.

Breakdown of the caches:

ComponentDefined/used inClient/repo sharingCache scopeSplit by wiki?Split by train?Key factoryTTL
AbstractTermPropertyLabelResolver and subclassesDefined in Lib, only used in Client(?)shared cache keyglobalnoyesnoneone hour by default, 24 hours in production
EntityRevisionCacheLibshared cache keyglobalnoyesnoneone hour by default, 24 hours in production
CachingFallbackLabelDescriptionLookupLibclient uses repo keyglobalyesyesSimpleCacheWithBagOStuffone hour by default, 24 hours in production
CachingPropertyInfoLookup + CacheAwarePropertyInfoStoreLibshared cache keycluster + servernonomakeGlobalKeyone hour by default, 24 hours in production
CachingOtherProjectsSitesProviderClientimplicitclusteryesnomakeKeyone hour
ClientHooks::getSiteConfiguration()ClientimplicitserveryesnomakeKeyone day
SitesModule::getScript()LibimplicitserveryesnomakeKeyone hour
CachingPropertyOrderProviderLibimplicitserveryesnomakeKeyone hour
CachingPrefetchingTermLookup + UncachedTermsPrefetcherLibimplicitserveryesnoSimpleCacheWithBagOStuffone minute
SqlEntityInfoBuilderLibimplicitserveryesnoSimpleCacheWithBagOStuffone minute
DatabaseEntityInfoBuilderLibimplicitserveryesnoSimpleCacheWithBagOStuffone minute
CachingSiteLinkLookupDefined in Lib, only used in Clientimplicitrequestnoneone hour
CachingCommonsMediaFileNameLookupRepoimplicitrequestnoneten minutes

Misc notes:

  • ClientHooks::getSiteConfiguration() passes WanObjectCache options into a BagOStuff, which is probably a bug, but hopefully a harmless one.
  • CachingPropertyInfoLookup uses two cache layers, one for the whole cluster and one for the current server.
  • CachingPropertyInfoLookup works together with CacheAwarePropertyInfoStore, which takes care to update the cache when the property info changes. That said, I’m not sure if CacheAwarePropertyInfoStore correctly updates both cache layers that CachingPropertyInfoLookup uses? To me it looks like it might only updates the cluster layer, not the server layer.
  • The property info data cached by those two classes ultimately comes from JSON blobs controlled by Repo (wb_property_info table, pi_info column), so Client train versions must be able to deal with data of the Repo train version regardless of how stale or fresh the cached data is.
  • SqlEntityInfoBuilder is slated for removal (T254283), as is DatabaseEntityInfoBuilder, whose caching bits look suspiciously similar anyways.

IMO adding almost all of the info in the above comment to a topic under https://doc.wikimedia.org/Wikibase/master/php/md_docs_topics_index.html would be super useful!

Thank you for doing this investigation, it's pretty well thought and extensive. One thing that would be useful to have is to know the transitive usage of the caches. For example it says it's used in lib but that code might have only been used in client and not repo.

A few other thoughts from my side (mostly tickets we should create and add to the backlog)

  • AbstractTermPropertyLabelResolver should start passing the key into makeKeyGlobal (TODO ticket)
  • EntityRevisionCache with its current usage should probably be going away once we move to use more modern RevisionStore methods. T255305: [Investigation] - Consider if the Wikibase cache for CachingEntityRevisionLookup is needed any more
    • SqlBlobStore which RevisionStore uses already caches content in Memcached as needed
    • Having said that, it is currently also currently using in client with a HashBagOStuff
  • Nonetheless, EntityRevisionCache should also use makeKeyGlobal (TODO ticket)
  • SimpleCacheWithBagOStuff should allow for instance creations with differently scoped keys (TODO ticket)
    • Either always global, and let users of the cache determine if the key needs to be namespaces in some way
    • Or allow construction with some value to indicate global or local
  • Looking at a very surface level to the uses of SimpleCacheWithBagOStuff currently, it looks as if all uses of the cache were intended to use a shared global cache.
    • We added the makeKey call around 3 months ago https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/Wikibase/+/583199/ which altered this (before they were global)
    • super interesting that the cache hit rate is still 95-98% for most of the caches there even when split per wiki.
    • Of course, using a global cache introduces the need to keep compatibility between versions, however, if the data being stored changed between versions the cache key should be updated to avoid issues.
  • CachingPropertyOrderProvider is configurable per wikibase site, so that cache probably should be split per wiki, even if for wikidata.org using a single key might make sense.

One thing that would be useful to have is to know the transitive usage of the caches. For example it says it's used in lib but that code might have only been used in client and not repo.

Two components are marked as being defined in Lib but only used in Client; as far as I can tell, all the other Lib components are used in both Repo and Client.

CachingPropertyOrderProvider is configurable per wikibase site, so that cache probably should be split per wiki, even if for wikidata.org using a single key might make sense.

I was surprised to learn that this functionality is relevant to the client at all, but it’s exposed through Lua (and mwgrep reports a fair amount of usages of orderProperties, mostly in copies of Module:Databox and Module:Universal infocard). I’m also surprised that it apparently checks for the MediaWiki:Wikibase-SortedProperties message on the client wiki first before falling back to the repo wiki, so I checked if that message actually exists on clients – according to P11588, Wikidata and Commons are the only wikis that have that page.

IMO adding almost all of the info in the above comment to a topic under https://doc.wikimedia.org/Wikibase/master/php/md_docs_topics_index.html would be super useful!

I’m worried that parts of it, specifically the contents of the table, will get outdated fairly quickly… I hope that there are some code changes we can make that would make it easier to re-collect the information in that table, and then the documentation could be limited to more general comments (e. g. difference between makeKey and makeGlobalKey), maybe?

As the perfect example:

  • DatabaseEntityInfoBuilder
  • SqlEntityInfoBuilder

can already disappear

I don't have anything in particular to add, apart from a thank you to @Lucas_Werkmeister_WMDE for athering this info in one table (the overview of cache components is really helpful for a noob like me to learn more).

SummaryA lot of caches are shared between Client and Repo. It’s a bit messy, we found some issues/bugs to be fixed.
Risks, threats, challenges identifiedWe might want to remove the WMF version from certain caches to prevent them from being split by train. It is currently not easy to figure out where caches are used and shared in Wikibase.
Opportunities noticedSharing caches becomes easier if it’s possible to split by wiki (or client vs repo) and maintain the current cache hitrate Total opposite: we could share all the caches globally (some performance improvement?)
Other remarksTickets were created in: T255982, T255983, T255984. It could be beneficial to have one interface/class to access caches, and define where to use which caches and why.