Page MenuHomePhabricator

Reduce complexity and time spent in WANObjectCache
Closed, ResolvedPublic

Description

Excimer wall-time from a 24-hour period, for the most sampled code involving WANCache with a high cache-hit rate that
I could find in the flame graphs, namely api.php use of PageImages/LocalFile (link):

  • PageImages/LocalFile/WANObjectCache::getWithSetCallback (8,533 samples)
  • PageImages/LocalFile/WANObjectCache/…/MemcachedPeclBagOStuff::doGetMulti (7,485 samples)

Screenshot 2021-08-12 at 02.00.50.png (732×2 px, 198 KB)

That suggests about 12% of our "get" time is actually spent in PHP, figuring out what command to send to Memcached (1 - 7485/8533).

An anecdotal Tideways profile seems to hold up this estimate (link):

  • WANObjectCache::fetchOrRegenerate (20 calls): 48.5ms
  • WANObjectCache::fetchKeys (21 calls): 41.3ms
  • MemcachedPeclBagOStuff::doGetMulti (22 calls): 42.5ms

The number of calls differ in Tideways because it only measures time in pairs of parent and child, so there was one fetchKeys call not from getWithSet, and one Memcached call not via WANObjectCache. But (1 - 41.3/48.5) arrives at 14% which we'd expect to be an under estimate since it includes one more call, so the actual overhead was higher in this particular profile.

Control flow outline: https://etherpad.wikimedia.org/p/3qfx5Dvvqg2Yafmxd5-q

Event Timeline

Krinkle updated the task description. (Show Details)
Krinkle updated the task description. (Show Details)
Krinkle updated the task description. (Show Details)

I wonder how much of this comes from SPI logger debug() calls?

Change 711855 had a related patch set uploaded (by Aaron Schulz; author: Aaron Schulz):

[mediawiki/core@master] objectcache: remove WANObjectCache::getProcessCacheKey()

https://gerrit.wikimedia.org/r/711855

Change 712548 had a related patch set uploaded (by Aaron Schulz; author: Aaron Schulz):

[operations/mediawiki-config@master] Avoid udp2log for \"objectcache\" channel

https://gerrit.wikimedia.org/r/712548

I wonder how much of this comes from SPI logger debug() calls?

Hm.. I think if much time was spent somewhere within Monolog, then at least some of its methods would appear somewhere on the stack samples, but even with the 1000s of samples here, none of them were ever anywhere in Monolog.

I believe that for non-debug requests, the logger attached here no-ops debug() very early on so which backends it would be logged to if it wasn't skipped shouldn't matter (re: avoiding udp2log). Or is that for a different reason? I think developers generally expect channels to go to both unless there's a privacy or performance reason for not doing so.

Krinkle triaged this task as High priority.

I don't think it goes to the server. I just didn't want the overhead of the SPI code having to iterate through another handler and check if it is enabled or not.

By "backend" I meant the various Monolog handlers and formatters we have within PHP. I agree it doesn't go to the server, indeed.

As none of the 1000/8000 samples in this flame graph matched logging-related methods, it would seem like the time may is self-time of fetchOrRegenerate (pure logic in that function, and any built-in calls).

But..., there's also the matter of flame graph minwidth=1, and while the screenshot above is fairly wide, it's a portion of a much larger graph, so it's not unlikely the calls in question may be cropped off due to being <1px in the zoomed-out view.

And indeed, looking at the raw log files, we get a bit more detail. I've taken 2021-08-19.excimer-wall.index, reduced it down to LocalFile::loadFromCache.*WANObjectCache::fetchOrRegenerate, then trimmed it to 2 calls deep within that, and then grouped by frequency.

Source code: fetchOrRegenerate

$ ack 'ocalFile::loadFromCache.*WANObjectCache::fetchOrRegenerate' 2021-08-19.excimer-wall.index.log | fgrep -a -v 'fetchOrRegenerate;{closure' | sed 's/.*\(WANObjectCache::fetchOrRegenerate\)/\1/' | cut -d';' -f1-2 | cut -d' ' -f1 | sort | uniq -c

  13337 WANObjectCache::fetchOrRegenerate;WANObjectCache::fetchKeys
    235 WANObjectCache::fetchOrRegenerate;Liuggio\…\StatsdDataFactory::timing
    219 WANObjectCache::fetchOrRegenerate;WANObjectCache::worthRefreshPopular
    202 WANObjectCache::fetchOrRegenerate;WANObjectCache::set
     76 WANObjectCache::fetchOrRegenerate;WANObjectCache::getCurrentTime
     70 WANObjectCache::fetchOrRegenerate;WANObjectCache::resolveCTL

     39 WANObjectCache::fetchOrRegenerate

     37 WANObjectCache::fetchOrRegenerate;WANObjectCache::determineKeyClassForStats
     36 WANObjectCache::fetchOrRegenerate;WANObjectCache::isValid
     17 WANObjectCache::fetchOrRegenerate;WANObjectCache::worthRefreshExpiring
     12 WANObjectCache::fetchOrRegenerate;WANObjectCache::checkAndSetCooloff
      9 WANObjectCache::fetchOrRegenerate;WANObjectCache::getInterimValue
      7 WANObjectCache::fetchOrRegenerate;WANObjectCache::setInterimValue
      7 WANObjectCache::fetchOrRegenerate;WANObjectCache::isAliveOrInGracePeriod
      3 WANObjectCache::fetchOrRegenerate;WANObjectCache::scheduleAsyncRefresh
      3 WANObjectCache::fetchOrRegenerate;Monolog\Logger::debug
      2 WANObjectCache::fetchOrRegenerate;WANObjectCache::claimStampedeLock
      1 WANObjectCache::fetchOrRegenerate;WANObjectCache::yieldStampedeLock

And three levels deep, in alphabetical order:

$ ack 'ocalFile::loadFromCache.*WANObjectCache::fetchOrRegenerate' 2021-08-19.excimer-wall.index.log | fgrep -a -v 'fetchOrRegenerate;{closure' | sed 's/.*\(WANObjectCache::fetchOrRegenerate\)/\1/' | cut -d';' -f1-3 | cut -d' ' -f1 | sort | uniq -c
     39 WANObjectCache::fetchOrRegenerate
      4 WANObjectCache::fetchOrRegenerate;Liuggio\…\StatsdDataFactory::timing
    231 WANObjectCache::fetchOrRegenerate;Liuggio\…\StatsdDataFactory::timing;BufferingStatsdDataFactory::produceStatsdData
      3 WANObjectCache::fetchOrRegenerate;Monolog\Logger::debug;Monolog\Logger::addRecord
      1 WANObjectCache::fetchOrRegenerate;WANObjectCache::checkAndSetCooloff
      2 WANObjectCache::fetchOrRegenerate;WANObjectCache::checkAndSetCooloff;Liuggio\…\StatsdDataFactory::timing
      1 WANObjectCache::fetchOrRegenerate;WANObjectCache::checkAndSetCooloff;Liuggio\…\StatsdDataFactory::updateCount
      8 WANObjectCache::fetchOrRegenerate;WANObjectCache::checkAndSetCooloff;MemcachedPeclBagOStuff::setNewPreparedValues
      2 WANObjectCache::fetchOrRegenerate;WANObjectCache::claimStampedeLock;MediumSpecificBagOStuff::add
     37 WANObjectCache::fetchOrRegenerate;WANObjectCache::determineKeyClassForStats
    135 WANObjectCache::fetchOrRegenerate;WANObjectCache::fetchKeys
  13005 WANObjectCache::fetchOrRegenerate;WANObjectCache::fetchKeys;MediumSpecificBagOStuff::getMulti
     54 WANObjectCache::fetchOrRegenerate;WANObjectCache::fetchKeys;WANObjectCache::getCurrentTime
     36 WANObjectCache::fetchOrRegenerate;WANObjectCache::fetchKeys;WANObjectCache::makeSisterKey
     14 WANObjectCache::fetchOrRegenerate;WANObjectCache::fetchKeys;WANObjectCache::processCheckKeys
     25 WANObjectCache::fetchOrRegenerate;WANObjectCache::fetchKeys;WANObjectCache::processFluxKeys
     68 WANObjectCache::fetchOrRegenerate;WANObjectCache::fetchKeys;WANObjectCache::unwrap
     76 WANObjectCache::fetchOrRegenerate;WANObjectCache::getCurrentTime
      9 WANObjectCache::fetchOrRegenerate;WANObjectCache::getInterimValue;MediumSpecificBagOStuff::get
      7 WANObjectCache::fetchOrRegenerate;WANObjectCache::isAliveOrInGracePeriod
     36 WANObjectCache::fetchOrRegenerate;WANObjectCache::isValid
     70 WANObjectCache::fetchOrRegenerate;WANObjectCache::resolveCTL
      3 WANObjectCache::fetchOrRegenerate;WANObjectCache::scheduleAsyncRefresh;DeferredUpdates::addCallableUpdate
      1 WANObjectCache::fetchOrRegenerate;WANObjectCache::set
      7 WANObjectCache::fetchOrRegenerate;WANObjectCache::setInterimValue;MediumSpecificBagOStuff::merge
    199 WANObjectCache::fetchOrRegenerate;WANObjectCache::set;MediumSpecificBagOStuff::add
      2 WANObjectCache::fetchOrRegenerate;WANObjectCache::set;WANObjectCache::wrap
     17 WANObjectCache::fetchOrRegenerate;WANObjectCache::worthRefreshExpiring
    156 WANObjectCache::fetchOrRegenerate;WANObjectCache::worthRefreshPopular
     63 WANObjectCache::fetchOrRegenerate;WANObjectCache::worthRefreshPopular;Monolog\Logger::debug
      1 WANObjectCache::fetchOrRegenerate;WANObjectCache::yieldStampedeLock;MediumSpecificBagOStuff::changeTTL

Uncropped samples of all StatsD from anywhere during index.php yesterday:

$ ack 'StatsdDataFactory::timing' 2021-08-27.excimer-wall.index.log | sed 's/.*\(;Liuggio\\StatsdClient\\Factory\\StatsdDataFactory::timing\)/\1/' | sort | uniq -c | sort -rn

  92 ;Liuggio\StatsdDataFactory::timing

   1 ;Liuggio\StatsdDataFactory::timing;AutoLoader::autoload
   3 ;Liuggio\StatsdDataFactory::timing;AutoLoader::autoload;AutoLoader::find
   1 ;Liuggio\StatsdDataFactory::timing;Composer\Autoload\ClassLoader::loadClass

 448 ;Liuggio\StatsdDataFactory::timing;BufferingLiuggio\StatsdDataFactory::produceStatsdData
1376 ;Liuggio\StatsdDataFactory::timing;BufferingLiuggio\StatsdDataFactory::produceStatsdData;BufferingLiuggio\StatsdDataFactory::normalizeMetricKey

 101 ;Liuggio\StatsdDataFactory::timing;BufferingLiuggio\StatsdDataFactory::produceStatsdData;Liuggio\StatsdData::setValue
  62 ;Liuggio\StatsdDataFactory::timing;BufferingLiuggio\StatsdDataFactory::produceStatsdData;Liuggio\StatsdData::setMetric
  58 ;Liuggio\StatsdDataFactory::timing;BufferingLiuggio\StatsdDataFactory::produceStatsdData;Liuggio\StatsdData::setKey
  42 ;Liuggio\StatsdDataFactory::timing;BufferingLiuggio\StatsdDataFactory::produceStatsdData;Liuggio\StatsdDataFactory::produceStatsdDataEntity
  29 ;Liuggio\StatsdDataFactory::timing;BufferingLiuggio\StatsdDataFactory::produceStatsdData;Liuggio\StatsdDataFactory::produceStatsdDataEntity;Liuggio\StatsdDataFactory::getEntityClass
   2 ;Liuggio\StatsdDataFactory::timing;BufferingLiuggio\StatsdDataFactory::produceStatsdData;Liuggio\StatsdDataFactory::produceStatsdDataEntity;Composer\Autoload\ClassLoader::loadClass
   2 ;Liuggio\StatsdDataFactory::timing;BufferingLiuggio\StatsdDataFactory::produceStatsdData;Liuggio\StatsdDataFactory::produceStatsdDataEntity;AutoLoader::autoload;AutoLoader::find

(Shortened Liuggio\StatsdClient\Factory\ to Liuggio\ for brevity.)

Change 719146 had a related patch set uploaded (by Krinkle; author: Krinkle):

[mediawiki/core@master] stats: Defer BufferingStatsdDataFactory::normalizeMetricKey to send phase

https://gerrit.wikimedia.org/r/719146

Change 719148 had a related patch set uploaded (by Krinkle; author: Krinkle):

[mediawiki/core@master] objectcache: Misc micro-optimisations in WANObjectCache

https://gerrit.wikimedia.org/r/719148

Change 719667 had a related patch set uploaded (by Krinkle; author: Aaron Schulz):

[mediawiki/core@master] objectcache: fix duplication of purge value parsing code in WANObjectCache

https://gerrit.wikimedia.org/r/719667

Change 719667 merged by jenkins-bot:

[mediawiki/core@master] objectcache: fix duplication of purge value parsing code in WANObjectCache

https://gerrit.wikimedia.org/r/719667

Change 711855 merged by jenkins-bot:

[mediawiki/core@master] objectcache: remove WANObjectCache::getProcessCacheKey()

https://gerrit.wikimedia.org/r/711855

Change 719148 merged by jenkins-bot:

[mediawiki/core@master] objectcache: Misc micro-optimisations in WANObjectCache

https://gerrit.wikimedia.org/r/719148

Change 719146 merged by jenkins-bot:

[mediawiki/core@master] stats: Defer BufferingStatsdDataFactory::normalizeMetricKey to send phase

https://gerrit.wikimedia.org/r/719146

From Task description:

In 2021-08-11.excimer-wall.api flame graph:

  • PageImages/LocalFile/WANObjectCache::getWithSetCallback (8,533 samples)
  • PageImages/LocalFile/WANObjectCache/…/MemcachedPeclBagOStuff::doGetMulti (7,485 samples)

That suggests about 12% of our "get" time is actually spent in PHP […]

@gerritbot wrote:

Change 719146 merged by jenkins-bot:

[mediawiki/core@master] stats: Defer BufferingStatsdDataFactory::normalizeMetricKey to send phase

https://gerrit.wikimedia.org/r/719146

This is been deployed meanwhile. In 2021-09-27.excimer-wall.api:

  • ApiQueryPageImages/LocalFile/getWithSetCallback (7,086 samples)
  • ApiQueryPageImages/LocalFile/getWithSetCallback/…/MemcachedPeclBagOStuff::doGetMulti (6,325 samples)

Currently down to ~10% overhead.

We also have new entries that consolidate all time spent in BufferingStatsd, which we believe was previously too thinly spread out to actually notice in the flame graphs (due to SVG minwidth):

Change 713571 had a related patch set uploaded (by Aaron Schulz; author: Aaron Schulz):

[mediawiki/core@master] objectcache: cleanup cache value attribute helper methods in WANObjectCache

https://gerrit.wikimedia.org/r/713571

Change 713571 merged by jenkins-bot:

[mediawiki/core@master] objectcache: cleanup cache value attribute methods in WANObjectCache

https://gerrit.wikimedia.org/r/713571

Change 731793 had a related patch set uploaded (by Krinkle; author: Aaron Schulz):

[mediawiki/core@master] objectcache: remove \"onHostRoutingPrefix\" feature from WANObjectCache

https://gerrit.wikimedia.org/r/731793

Change 731793 merged by jenkins-bot:

[mediawiki/core@master] objectcache: remove \"onHostRoutingPrefix\" feature from WANObjectCache

https://gerrit.wikimedia.org/r/731793