Page MenuHomePhabricator

[Investigation] - Consider if the Wikibase cache for CachingEntityRevisionLookup is needed any more
Closed, ResolvedPublic

Description

CachingEntityRevisionLookup has existed in Wikibase for a long time.
Its main function is to interface with a shared cache for entity revisions, which is shared between ALL sites of a cluster.

With some of the code written for MCR and RevisionStore refactorings, MediaWiki core now has a BlobStore cache that is shared across the whole cluster.
Since then Wikibase is storing all retrieved entities twice in the cache.

We can now probably remove the layer of caching in Wikibase, but should first evaluate exactly what that will mean in terms of changed db calls etc.
Perhaps CachingEntityRevisionLookup will remain, as the entity revisions are still cached, but the cache is somewhere else.

This would:

  • Reduce Memcached usage in production
  • Reduce cache usage for 3rd party wikibase users
  • Reduce code complexity in Wikibase

Event Timeline

  • CachingEntityRevisionLookup::getEntityRevision has the entity id and rev id to lookup and tried to get it from the cache
  • Failing, it calls fetchEntityRevision
  • This ultimately calls WikiPageEntityRevisionLookup::getEntityRevision
  • This then calls a method on entityMetaDataAccessor, either loadRevisionInformationByRevisionId or loadRevisionInformation

Looking into the meta data accessor I see a note:

* @todo This whole interface may no longer be needed with the introduction of RevisionRecord which can
* represent a revision on any wiki.
* MetaData accessing can probably be killed and instead RevisionRecord just be returned.
  • loadEntity is the called
  • revisionStore->getRevisionById is called, returning out RevisionRecord
  • We then also call loadEntityDataFromWikiPageRevision ? which comes with the note:
		// NOTE: Support for cross-wiki content access in RevisionStore is incomplete when,
		// reading from the pre-MCR database schema, see T201194.
		// For that reason, we have to load and decode the content blob directly,
		// instead of using RevisionRecord::getContent() or SlotRecord::getContent().
		// TODO Once we can rely on the new MCR enabled DB schema, use getContent() directly!

This method then calls blobStore->getBlob which is where the new layer of shared caching lives.

Change 605333 had a related patch set uploaded (by Addshore; owner: Addshore):
[mediawiki/extensions/Wikibase@master] WIP DNM, update EntityRevisionLookup

https://gerrit.wikimedia.org/r/605333

Addshore renamed this task from Consider if CachingEntityRevisionLookup is needed any more to [Investigation] - Consider if CachingEntityRevisionLookup is needed any more.Aug 4 2020, 12:11 PM
Addshore renamed this task from [Investigation] - Consider if CachingEntityRevisionLookup is needed any more to [Investigation] - Consider if the Wikibase cache for CachingEntityRevisionLookup is needed any more.Aug 4 2020, 1:08 PM
Addshore updated the task description. (Show Details)

The investigation should shed light on the following questions:

  • How complicate would removing the Cache for CachingEntityRevisionLookup be?
    • Would this be the wok of a few people over a couple of days or should we have a hike/trailblaze for it?

You can store caches as database entries (in objectcache table) by setting $wgMainCacheType to CACHE_DB. I did it and this is I've got so far:

  • We actually ourselves store them twice, once under wikibase_shared/repo:WikiPageEntityRevisionLookup:Q228 and once under repo:WikiPageEntityRevisionLookup:Q228. The content is not exactly the same, I double check what's the difference.
  • The TTL we set for this is 3600 while the TTL for SqlBlobStore is seven days.
  • SqlBlobStore compresses the values while we don't,
  • I can't actually find any cache that stores SqlBlobStore. I might be missing something obvious.

Stay tuned.

I confirm that setting the entity revision lookup cache to none just makes it to fallback to the mediawiki's SqlBlobStore cache (checking from debug logs) and SqlBlobStore cache indeed compresses it (619 is revid of the most recent revision of Q228):

+--------------------------------------------------------+--------------------+
| keyname                                                | char_length(value) |
+--------------------------------------------------------+--------------------+
| WANCache:v:global:revision-row-1.29:repo:684:915       |                392 |
| repo:WBL:WikiPageEntityRevisionLookup:Q228             |               1680 |
| wikibase_shared/repo:WikiPageEntityRevisionLookup:Q228 |               1680 |
+--------------------------------------------------------+--------------------+

The next step would be to make this cache configurable, then set it to cache none in beta, test and production and if no regression shows up (specially in matter of performance regression, mcrouter traffic, nutcracker going nuts, etc.), then we can simply remove it. The only thing I need to make sure right now is that how the fallback from Q228 to rev id 915 works, is it cached too?

The page id to revision id (Q228 -> 915) happens through a WikiPageEntityMetaDataAccessor which has an implementation that has basic caching: PrefetchingWikiPageEntityMetaDataAccessor so it's not as severe than I thought.

Change 619547 had a related patch set uploaded (by Ladsgroup; owner: Ladsgroup):
[mediawiki/extensions/Wikibase@master] Set caching of CachingEntityRevisionLookup to CACHE_NONE

https://gerrit.wikimedia.org/r/619547

So my idea for this is to get this cherry-picked (but not merged on master) and deploy it on canaries (10% of traffic) for an hour tomorrow and monitor everything, from logstash to mcrouter, memcached, s8 load, es load and if we don't see a regression, we deploy it to everywhere and leave it until next train and in the next train, we remove everything. How does that sound?

Change 619597 had a related patch set uploaded (by Ladsgroup; owner: Ladsgroup):
[mediawiki/extensions/Wikibase@master] Set caching of CachingEntityRevisionLookup to CACHE_NONE in repo

https://gerrit.wikimedia.org/r/619597

Change 619615 had a related patch set uploaded (by Ladsgroup; owner: Ladsgroup):
[mediawiki/extensions/Wikibase@wmf/1.36.0-wmf.4] Set caching of CachingEntityRevisionLookup to CACHE_NONE in repo

https://gerrit.wikimedia.org/r/619615

Change 619615 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@wmf/1.36.0-wmf.4] Set caching of CachingEntityRevisionLookup to CACHE_NONE in repo

https://gerrit.wikimedia.org/r/619615

Mentioned in SAL (#wikimedia-operations) [2020-08-12T18:04:18Z] <ladsgroup@deploy1001> Synchronized php-1.36.0-wmf.4/extensions/Wikibase/repo/includes/Store/Sql/SqlStore.php: [[phab:T255305|Set caching of CachingEntityRevisionLookup to CACHE_NONE in repo]] (duration: 01m 06s)

Change 619616 had a related patch set uploaded (by Ladsgroup; owner: Ladsgroup):
[mediawiki/extensions/Wikibase@wmf/1.36.0-wmf.4] Set caching of CachingEntityRevisionLookup to CACHE_NONE in client

https://gerrit.wikimedia.org/r/619616

Change 619617 had a related patch set uploaded (by Ladsgroup; owner: Ladsgroup):
[mediawiki/extensions/Wikibase@wmf/1.36.0-wmf.3] Set caching of CachingEntityRevisionLookup to CACHE_NONE in client

https://gerrit.wikimedia.org/r/619617

One thing that was mentioned is that we should use getKnownCurrentRevision in revision lookup that has internal caching.

Change 619616 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@wmf/1.36.0-wmf.4] Set caching of CachingEntityRevisionLookup to CACHE_NONE in client

https://gerrit.wikimedia.org/r/619616

Mentioned in SAL (#wikimedia-operations) [2020-08-12T18:50:11Z] <ladsgroup@deploy1001> Synchronized php-1.36.0-wmf.4/extensions/Wikibase/client/includes/Store/Sql/DirectSqlStore.php: [[phab:T255305|Set caching of CachingEntityRevisionLookup to CACHE_NONE in client]] (duration: 02m 13s)

Change 619617 abandoned by Ladsgroup:
[mediawiki/extensions/Wikibase@wmf/1.36.0-wmf.3] Set caching of CachingEntityRevisionLookup to CACHE_NONE in client

Reason:
wmf.4 is being deployed everywhere instead

https://gerrit.wikimedia.org/r/619617

Now it's deployed across the fleet. Here's my analysis:

  • I was worried that constant decompressing of values coming from memcached would elevate the CPU usage but it didn't
  • I was worried that since wikibase has the caches keyed by entity id instead of revision id (which sqlblob store key them based on revision id) would increase the read on s8, this actually happened:

image.png (257×1 px, 53 KB)

(You can guess when it got deployed everywhere)
but it's not big enough to cause any major disruption (note that the graph doesn't start at zero).

  • Looking at mc* hosts and the overall network graphs of mcrouter/memcached, I see a rough 10% decrease in network but the noise makes it hard to show you:

image.png (273×918 px, 45 KB)

  • When I deployed it, the hits of SqlBlobStore didn't change much (it's pretty high anyway, 1M/minute) but misses doubled which is expected, the cache is warm. It's still putting the long tail into memcached:

image.png (547×1 px, 66 KB)

  • Naturally External Storage had a little pressure but recovered.

image.png (770×1 px, 145 KB)

  • After TTL of Wikibase memcached entries expiring, evictions are going up:

image.png (275×925 px, 35 KB)

No noticeable increase in user-facing error or fatals or timeouts has been observed.

I leave Xghui of two client cases:

Both are the second time requests to warm up their caches. The increase in DB is obvious but the easiest solution is to use RevisionLookup::getKnownCurrentRevision() instead of PrefetchingWikiPageEntityMetaDataAccessor which has a much better cache.

Change 619597 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@master] Set caching of CachingEntityRevisionLookup to CACHE_NONE in repo

https://gerrit.wikimedia.org/r/619597

Change 619547 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@master] Set caching of CachingEntityRevisionLookup to CACHE_NONE in client

https://gerrit.wikimedia.org/r/619547

Change 605333 abandoned by Addshore:
[mediawiki/extensions/Wikibase@master] WIP DNM, update EntityRevisionLookup

Reason:
The cache was removed in https://phabricator.wikimedia.org/T255305, this draft might still be useful for actually using RevisionStore directly, but that will likely now be done as a separate ticket.

https://gerrit.wikimedia.org/r/605333

Michael removed a project: Patch-For-Review.

As far as I can tell, all associated patches have been either merged or abandoned.

WMDE-leszek subscribed.

I note the conclusion of the investigation has been the cache is not needed. A follow up task T260600 has been filed.