Page MenuHomePhabricator

Reduce request latency impact from APCUBagOStuff write lock contention
Closed, ResolvedPublic

Description

Per T293630 , high write rates to apcu cause contention for both readers and writers. We should minimize writes here, at least checking for lock key existence first. apcu_add and apcu_entry both trigger apc_cache_wlock() no matter what.

Event Timeline

Krinkle renamed this task from Optimize locking methods in APCUBagOStuff to Reduce request latency impact from APCUBagOStuff write lock contention.Aug 17 2022, 7:04 PM

Before we try to solve it, we may want to quantify the impact on latencies today in some way. Mainly two reasons:

  1. To know how severe it is and/or how close we are to it becoming a noticable problem. This would let us know if e.g. there are much bigger issues we could focus on first.
  2. To be able verify and take credit for the improvement afterward.

One way might be the T293630#7879457 analysis, and running it with concurrency levels set to e.g. the daily peak of a regular appserver and perhaps another run with concurrency levels set to where we typically peaked during recent load-related incidents. Again the idea being that higher concurrencies may end up showing a problem that, while real, is perhaps not possible to observe through MW requests that do other things in-between.

Another way might be to use operations/software/benchmw on a prod server with realistic workloads. E.g. depool and restart fpm to clear apcu, then run a normal bench, then restart/clear again and first patch APCUBagOStuff to hardcode a TTL of 1 hour, do a warmup, and then patch it to no-op all writes (e.g. pretend writes succeed, with locks denied as if claimed elsewhere) and do the comparison bench.

The latter might be more realistic and gain stronger confidence. Although it does have the risk of being unrepresentative if there are other callers to APCU besides RDBMS that e.g. put things in APCU but shorten their TTL artificially (e.g. LoadMonitor checking asOf separately from logical TTL).

Local testing shows that high rate stores of 1 byte lock keys does not hurt read speed of large values...

...nor do high rate writes of large values effects effect high rate stores of 1 byte lock keys.

However, acpu_fetch is faster than apcu_store/apcu_fetch (180ms vs 300ms for 1000000x). Using the former would be most useful for lock() calls with non-zero timeouts.

Change 824795 had a related patch set uploaded (by Aaron Schulz; author: Aaron Schulz):

[mediawiki/core@master] objectcache: optimize APCUBagOStuff::add() for lock keys and large values

https://gerrit.wikimedia.org/r/824795

Krinkle triaged this task as Medium priority.

Change 824795 merged by jenkins-bot:

[mediawiki/core@master] objectcache: optimize APCUBagOStuff::add() for lock keys and large values

https://gerrit.wikimedia.org/r/824795

Krinkle reassigned this task from Krinkle to aaron.