Page MenuHomePhabricator

Log `NOT_STORED` responses from memcached `add` to monitor concurrent-population races
Open, MediumPublic

Description

Originally filed as "Improve the memcached concurrency treatment" and narrowed during investigation. The known cause of theMemcached.add returning NOT_STORED on back-to-back-with-same-key calls (extra setKeyValuePairSync invocations in the fetch path) was eliminated by T412326 (closed Resolved).

Remaining scope: add logging when theMemcached.add returns NOT_STORED, so we can monitor whether residual back-to-back same-key races still occur in production.

Acceptance criteria:

  • Log entries are emitted when theMemcached.add returns NOT_STORED (at a level appropriate for periodic review, not per-request noise).
  • Logs include enough context (cache key, call site) to identify which code paths are still racing.
  • After ~2–4 weeks of logging, post a brief comment summarising what production data shows so we can decide whether any further concurrency work is justified.

See DMartin-WMF's 2025-12-09 comment for the diagnostic detail.

Event Timeline

In local development/testing I've observed the following which is probably relevant to this ticket:

The unit test retrieval from wikilambda_fetch populates memcached results in two calls to setKeyValuePairSync, and, during the 2nd time through that function, the call to theMemcached.add returns an error (which unfortunately is not very informative; it just says NOT_STORED).

This also appears to be the case with these tests:

  • retrieving multiple items from memcached
  • ReferenceResolver re-retrieves missing values in memcached from wikilambda_fetch and populates

EDIT: the extra calls to setKeyValuePairSync that we know about will be eliminated by T412326. what remains here is to arrange for logging of these NOT_STORED returns, so we can monitor how often this situation (2 back-to-back calls to setKeyValuePairSync with the same key and the 2nd one calls gets before the first one has stored its value) continues to occur.

Jdforrester-WMF renamed this task from Improve the memcached concurrency treatment to Log `NOT_STORED` responses from memcached `add` to monitor concurrent-population races.May 13 2026, 8:29 PM
Jdforrester-WMF lowered the priority of this task from High to Medium.
Jdforrester-WMF updated the task description. (Show Details)
Jdforrester-WMF subscribed.

Re-prioritising to Medium and re-scoping during an Engineering Backlog triage. The original framing has narrowed significantly since filing: DMartin's investigation comment (2025-12-09) found that the back-to-back setKeyValuePairSync calls producing NOT_STORED from theMemcached.add were caused by extra fetch-path calls, and T412326 (now closed Resolved) eliminated those. What remains is observability — logging the NOT_STORED returns so we can monitor whether residual races still occur in production. Rewriting the title and description to make that narrow deliverable visible: the previous framing was intimidating relative to the actual remaining scope, and a clearer card is easier for someone to pick up.