Page MenuHomePhabricator

APCu caches are set to expire in 2073 instead of an hour if exptime is a unix timestamp
Closed, ResolvedPublic

Description

I triple checked this in mwdebug2001.

In BagOStuff::set(), the documentation clearly says @param int $exptime Either an interval in seconds or a unix timestamp for expiry. As result, most of Wikibase normalizes TTL to unix timestamp for the expiry (See SimpleCacheWithBagOStuff::normalizeTtl). Its implementations however doesn't seem to care about unix timestamp.

For example. MediumSpecificBagOStuff::set() just passes that to doSet() and for example APCUBagOStuff::doSet() directly feeds that to apcu_store and documentation for apcu_store clearly state it's TTL in seconds, meaning it sets it for that amount of unix time in future meaning the cache value will expire roughly around 2073.

Memcached and most other implementation seems to parse unix timestamps properly but wincache doesn't seem so.

Maybe we can check how many are set like this in APCu if possible. We need to fix this and also clean up APCus across the cluster. (That's why I tagged SRE)

Event Timeline

Clearing acpu is as easy as doing a rolling restart of the cluster. But I think we should first fix the BagOfStuff implementation and/or all the calls from wikibase.

Change 703483 had a related patch set uploaded (by Ladsgroup; author: Ladsgroup):

[mediawiki/core@master] objectcache: Normalize exptime to ttl in APCu and WinCache

https://gerrit.wikimedia.org/r/703483

I'd say the exptime parameter as timestamp is obscure and something I've not seen even once being used in the past ten years in any core code or extension, including third-party code. I didn't know Wikibase passes it this way, and would recommend that Wikibase be updated to not do this anymore, and thus match how the remaining of virtually all deployed code uses BagOStuff.

Having said that, given it is documented behaviour and presumably works for non-APCu backends, we should also fix this bug in ApcuBag.

While this is clearly the way to do but I'm slightly worried by deploying this we will cause issues since caches that used to be never expiring now start to expire and this clearly will put more work on mw or other caches like memcached.

Right, but changing ApcuBag first will do the same thing as well.

Perhaps a safer first step would be to change Wikibase to use ttl values, but set them to 1 week (TTL_WEEK) instead of 1 hour, which is about the most a value stays cached for in production (rarely longer in my experience, given LRU churn and restarts). Then meanwhile analyse callers and see if a week is actually sensible and compatible with assumptions elsewhere, and if not, build it down further (e.g. 1 day, and then 1 hour over two trains) and check impact on appservers and wikibase-related perf metrics.

I strongly doubt a TTL of a WEEK would change anything compared to the current situation - appservers get restarted more often than once a week.

I would suggest we just go with 1 hour, and rollback fast if something seems off, but I doubt it.

APCU entries have a high cache hit-ratio, which means on average getting one more hit per hour per appserver per wikibase key shouldn't pose an immediate threat. Else we have much larger problems here.

Change 703892 had a related patch set uploaded (by Ladsgroup; author: Ladsgroup):

[mediawiki/core@wmf/1.37.0-wmf.12] objectcache: Normalize exptime to ttl in APCu and WinCache

https://gerrit.wikimedia.org/r/703892

Change 704320 had a related patch set uploaded (by Ladsgroup; author: Ladsgroup):

[mediawiki/extensions/Wikibase@master] Send TTL instead of expiry in unix timestamp in calling BagOStuff

https://gerrit.wikimedia.org/r/704320

Change 704176 had a related patch set uploaded (by Ladsgroup; author: Ladsgroup):

[mediawiki/extensions/Wikibase@wmf/1.37.0-wmf.12] Send TTL instead of expiry in unix timestamp in calling BagOStuff

https://gerrit.wikimedia.org/r/704176

Change 704177 had a related patch set uploaded (by Ladsgroup; author: Ladsgroup):

[mediawiki/extensions/Wikibase@wmf/1.37.0-wmf.13] Send TTL instead of expiry in unix timestamp in calling BagOStuff

https://gerrit.wikimedia.org/r/704177

Change 704177 abandoned by Ladsgroup:

[mediawiki/extensions/Wikibase@wmf/1.37.0-wmf.13] Send TTL instead of expiry in unix timestamp in calling BagOStuff

Reason:

won't get deployed.

https://gerrit.wikimedia.org/r/704177

Change 704178 had a related patch set uploaded (by Ladsgroup; author: Ladsgroup):

[mediawiki/extensions/Wikibase@wmf/1.37.0-wmf.14] Send TTL instead of expiry in unix timestamp in calling BagOStuff

https://gerrit.wikimedia.org/r/704178

Change 704331 had a related patch set uploaded (by Ladsgroup; author: Ladsgroup):

[mediawiki/extensions/Wikibase@master] Increase ttl of CachingPrefetchingTermLookup to an hour from one minute

https://gerrit.wikimedia.org/r/704331

Change 704320 merged by jenkins-bot:

[mediawiki/extensions/Wikibase@master] Send TTL instead of expiry in unix timestamp in calling BagOStuff

https://gerrit.wikimedia.org/r/704320

Change 704176 merged by jenkins-bot:

[mediawiki/extensions/Wikibase@wmf/1.37.0-wmf.12] Send TTL instead of expiry in unix timestamp in calling BagOStuff

https://gerrit.wikimedia.org/r/704176

Mentioned in SAL (#wikimedia-operations) [2021-07-13T13:33:47Z] <ladsgroup@deploy1002> Synchronized php-1.37.0-wmf.12/extensions/Wikibase/lib/includes/SimpleCacheWithBagOStuff.php: Backport: [[gerrit:704176|Send TTL instead of expiry in unix timestamp in calling BagOStuff (T286260)]] (duration: 00m 58s)

Mentioned in SAL (#wikimedia-operations) [2021-07-13T13:37:23Z] <effie> rolling restart php-fpm across clusters - T286260

Change 704178 merged by jenkins-bot:

[mediawiki/extensions/Wikibase@wmf/1.37.0-wmf.14] Send TTL instead of expiry in unix timestamp in calling BagOStuff

https://gerrit.wikimedia.org/r/704178

so this is fixed from wikibase point of view but we need to come to a decision to either merged the fix patch or fully deprecate using unix for expiry.

Change 703892 abandoned by Ladsgroup:

[mediawiki/core@wmf/1.37.0-wmf.12] objectcache: Normalize exptime to ttl in APCu and WinCache

Reason:

https://gerrit.wikimedia.org/r/703892

Change 703483 merged by jenkins-bot:

[mediawiki/core@master] objectcache: normalize $exptime to a TTL in APCUBagOStuff/WinCacheBagOStuff

https://gerrit.wikimedia.org/r/703483

Ladsgroup claimed this task.

This is done, I'll file a ticket about fully deprecating unix timestamp as expiry

Change 704331 abandoned by Ladsgroup:

[mediawiki/extensions/Wikibase@master] Increase ttl of CachingPrefetchingTermLookup to an hour from one minute

Reason:

Not needed.

https://gerrit.wikimedia.org/r/704331

Change 721306 had a related patch set uploaded (by Addshore; author: Ladsgroup):

[mediawiki/extensions/Wikibase@REL1_36] Send TTL instead of expiry in unix timestamp in calling BagOStuff

https://gerrit.wikimedia.org/r/721306

Change 721306 merged by jenkins-bot:

[mediawiki/extensions/Wikibase@REL1_36] Send TTL instead of expiry in unix timestamp in calling BagOStuff

https://gerrit.wikimedia.org/r/721306

Change 790428 had a related patch set uploaded (by Legoktm; author: Ladsgroup):

[mediawiki/core@REL1_36] objectcache: normalize $exptime to a TTL in APCUBagOStuff/WinCacheBagOStuff

https://gerrit.wikimedia.org/r/790428

Change 790429 had a related patch set uploaded (by Legoktm; author: Ladsgroup):

[mediawiki/core@REL1_35] objectcache: normalize $exptime to a TTL in APCUBagOStuff/WinCacheBagOStuff

https://gerrit.wikimedia.org/r/790429

Change 790429 merged by jenkins-bot:

[mediawiki/core@REL1_35] objectcache: normalize $exptime to a TTL in APCUBagOStuff/WinCacheBagOStuff

https://gerrit.wikimedia.org/r/790429

Change 790428 merged by jenkins-bot:

[mediawiki/core@REL1_36] objectcache: normalize $exptime to a TTL in APCUBagOStuff/WinCacheBagOStuff

https://gerrit.wikimedia.org/r/790428