Page MenuHomePhabricator

ResourceLoader DepStore lock acquired twice?
Closed, ResolvedPublic

Description

In Logstash (Perf Team dashboard) most the messages are from failed attempts to acquire a Memcached lock (presumably due to TKOs and link saturation etc.)

It's not very frequent, but whenever it happens, it logs the following twice for the same reqId/url:

reqId: XsA-pQpAEJkAAn5JFC8AAABM
2020-05-16T19:31:49 ERROR memcached mw1395
Memcached error for key "rl-deps:skins.minerva.base.styles|minerva|ne:lock" on server "127.0.0.1:11213": A TIMEOUT OCCURRED

2020-05-16T19:31:50 ERROR memcached mw1395
Memcached error for key "rl-deps:skins.minerva.base.styles|minerva|ne:lock" on server "127.0.0.1:11213": A TIMEOUT OCCURRED

This suggests it's attempting the lock twice, which seems contrary to what the code says. A bug perhaps?

Related Objects

StatusSubtypeAssignedTask
Resolvedaaron
ResolvedKrinkle
Resolvedaaron
Resolvedtstarling
Declinedaaron
Resolvedaaron
ResolvedEevans
Resolvedaaron
ResolvedKrinkle
ResolvedPapaul
Resolved Marostegui
Resolvedaaron
ResolvedKrinkle
Resolvedtstarling
Resolvedtstarling
ResolvedPRODUCTION ERRORjcrespo
Resolvedtstarling
Resolved Marostegui
Resolvedtstarling

Event Timeline

Krinkle triaged this task as Medium priority.Jun 15 2020, 11:27 PM
Krinkle moved this task from Inbox to Confirmed Problem on the MediaWiki-ResourceLoader board.

Searching for << +channel:memcached +message:/.*deps.*/ >> I still see this sometimes.

I see plenty of timeouts where the error is not logged twice. Those that happen twice seem to be about 200 ms apart.

Do we know from the code why the same load.php web request process would be entering that path twice?

Krinkle changed the task status from Open to Stalled.Sep 24 2021, 9:44 PM

To be re-evaluated after T113916. Either way, it's a fairly mild inefficienty at worst, so not a high priority either way.

Krinkle removed a project: MediaWiki-ResourceLoader.
Krinkle added a subscriber: aaron.

Change 724528 had a related patch set uploaded (by Aaron Schulz; author: Aaron Schulz):

[mediawiki/core@master] objectcache: improve error status codes in MemcachedPeclBagOStuff

https://gerrit.wikimedia.org/r/724528

Change 724541 had a related patch set uploaded (by Aaron Schulz; author: Aaron Schulz):

[mediawiki/core@master] objectcache: set error status codes in MemcachedPhpBagOStuff

https://gerrit.wikimedia.org/r/724541

Logstash query:

channel:memcached AND message:*lock* AND url:*load.php*

https://logstash.wikimedia.org/goto/3c49860f3ebaead845660bf7593ff432

I do see a handful of entries with "Memcached error for key "...:rl-deps:...:lock" on server 127.0.0.1:11213: A TIMEOUT OCCURRED". But, I do not see any entries where the same lock is tried (and failed) twice in a row, or otherwise more than one entry from the same reqId.

Change 724528 merged by jenkins-bot:

[mediawiki/core@master] objectcache: improve error status codes in MemcachedPeclBagOStuff

https://gerrit.wikimedia.org/r/724528

Change 724541 merged by jenkins-bot:

[mediawiki/core@master] objectcache: set error status codes in MemcachedPhpBagOStuff

https://gerrit.wikimedia.org/r/724541