Page MenuHomePhabricator

CacheAwarePropertyInfoStore performs 4000 Memc ops/s (APC not working?)
Closed, ResolvedPublic

Description

Background:

At T244340, @Joe wrote:

Some keys are super hot - take for instance WANCache:v:global:CacheAwarePropertyInfoStore:wikidatawiki:P244 which gets read about 4k times per second.

This doesn't make sense. This particular usage pattern from Wikibase has caused outages in the past and was given an APCu layer on top. […] Given we have less than 4000 servers, it sounds like this has stopped working?

Event Timeline

This really really shouldn't happen. I investigate.

I think I found out what's wrong. The APCu cache seems to be per-wiki but this doesn't need to be like that. Let me fix it.

I was wrong, we somehow removed the APCu cache bit altogether, I should find out what happened.

APC cache was removed in https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/Wikibase/+/498348/ T218197

I don't remember if this was intentional or not, commit msg, unfortunately, doesn't help me remember.

Change 602666 had a related patch set uploaded (by Ladsgroup; owner: Ladsgroup):
[mediawiki/extensions/Wikibase@master] Wrap WAN-cached PropertyInfoLookup with an APCu cache

https://gerrit.wikimedia.org/r/602666

Change 602666 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@master] Wrap WAN-cached PropertyInfoLookup with an APCu cache

https://gerrit.wikimedia.org/r/602666

Change 603482 had a related patch set uploaded (by Ladsgroup; owner: Ladsgroup):
[mediawiki/extensions/Wikibase@wmf/1.35.0-wmf.35] Wrap WAN-cached PropertyInfoLookup with an APCu cache

https://gerrit.wikimedia.org/r/603482

Change 603482 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@wmf/1.35.0-wmf.35] Wrap WAN-cached PropertyInfoLookup with an APCu cache

https://gerrit.wikimedia.org/r/603482

Mentioned in SAL (#wikimedia-operations) [2020-06-08T15:09:31Z] <ladsgroup@deploy1001> Synchronized php-1.35.0-wmf.35/extensions/Wikibase/lib/includes/Store/CachingPropertyInfoLookup.php: Wrap WAN-cached PropertyInfoLookup with an APCu cache, Part I out of III (T254536) (duration: 00m 59s)

Mentioned in SAL (#wikimedia-operations) [2020-06-08T15:10:54Z] <ladsgroup@deploy1001> Synchronized php-1.35.0-wmf.35/extensions/Wikibase/repo/includes/Store/Sql/SqlStore.php: Wrap WAN-cached PropertyInfoLookup with an APCu cache, Part II out of III (T254536) (duration: 00m 57s)

Mentioned in SAL (#wikimedia-operations) [2020-06-08T15:12:35Z] <ladsgroup@deploy1001> Synchronized php-1.35.0-wmf.35/extensions/Wikibase/client/includes/Store/Sql/DirectSqlStore.php: Wrap WAN-cached PropertyInfoLookup with an APCu cache, Part III out of III (T254536) (duration: 00m 57s)

This should be fixed, but I have no idea how to verify it – I couldn’t find any information on how to see if that cache key is still hot.

Thanks, this looks quite conclusive:

Screenshot_2020-06-08 WANObjectCache Key group - Grafana.png (258×1 px, 28 KB)

Good enough to close this task?

This changed the cache from having a TTL for 20 seconds to cacheDuration which is currently 24 hours.
This was deliberately set to be low before, although 24 hours seems to be doing fine?

image.png (323×1 px, 49 KB)

This does mean that there is a higher change of classes fetching data to get old or outdated property info data.

We have some code that updates the cache when the property info store changes (CacheAwarePropertyInfoStore), but that only affects the WAN cache, not the APC cache. Maybe we should lower the TTL of the APC cache only, while keeping the WAN cache TTL higher?

Looking at usages of that config variables now:

We would probably benefit from:

  • Continuing to remove this old entity cache that we pass CACHE_NONE into now
  • Evaluating if we need PropertyLabelResolver in its current form, or if it should change somewhat now?
  • Split the caches to have their own configurable ttls?
  • Evaluate the final CachingEntityRevisionLookup and see if this even needs the cache any more