Page MenuHomePhabricator

Logstash poolcounter warnings "Usage error: You may only aquire a single non-nowait lock" on wikis with post-processing cache enabled
Closed, ResolvedPublicPRODUCTION ERROR

Description

While deploying Activate post-processing cache on some wikis today, we noticed the following logstash message on mwdebug:

Info
  • mwversion: 1.46.0-wmf.5
  • timestamp: 2025-12-17T14:50:46.561Z
  • phpversion: 8.3.26
  • reqId: ab251f54-2764-4aed-9464-5dc03d8ca706
  • Find reqId in Logstash
normalized_message
Pool key 'simplewiki:parsoid-pcache:129793:|#|:idhash:useParsoid=1:revid:10503646' (ArticleView): Usage error: You may only aquire a single non-nowait lock.
exception.trace

Subsequent investigation showed that this seems to be quite common outside of mwdebug, with a sharp uptick on 15 December, when Activate post-processing cache on idwiki was deployed:
https://logstash.wikimedia.org/goto/0c76f5bb63561e91410b391a570ebbe1

image.png (300×914 px, 35 KB)

There are some other occurrences of this usage error (including T412960: Pool key 'dewiki:SpecialContributions:a:127.0.0.1' (SpecialContributions): Usage error: You may only aquire a single non-nowait lock.), but they’re a lot less common.

Impact

Only logspam, as far as we’re aware. But quite a bit of it.

Event Timeline

Presumably this is being triggered when we do a PoolCounter parse for a postprocessed ParserOutput, which then recursively triggers another PoolCounter parse of the canonical ParserOutput. I believe the entire thing is protected by the outer lock and so there is no practical impact other than logspam -- I triggered this during testing on simplewiki by doing a purge, and saw no reader-visible effects. But we should fix the logspam and the underlying usage before rolling out the postprocessing cache further.

Change #1219197 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):

[mediawiki/core@master] ParserOutputAccess: don't use PoolCounter recursively

https://gerrit.wikimedia.org/r/1219197

Change #1219194 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):

[mediawiki/core@master] PoolCounter: fix typo in error message (aquire -> acquire)

https://gerrit.wikimedia.org/r/1219194

Change #1219197 merged by jenkins-bot:

[mediawiki/core@master] ParserOutputAccess: don't use PoolCounter recursively

https://gerrit.wikimedia.org/r/1219197

Change #1219194 merged by jenkins-bot:

[mediawiki/core@master] PoolCounter: fix typo in error message (aquire -> acquire)

https://gerrit.wikimedia.org/r/1219194

Change #1219216 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):

[mediawiki/core@wmf/1.46.0-wmf.5] ParserOutputAccess: don't use PoolCounter recursively

https://gerrit.wikimedia.org/r/1219216

Change #1219217 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):

[mediawiki/core@wmf/1.46.0-wmf.7] ParserOutputAccess: don't use PoolCounter recursively

https://gerrit.wikimedia.org/r/1219217

Change #1219216 merged by jenkins-bot:

[mediawiki/core@wmf/1.46.0-wmf.5] ParserOutputAccess: don't use PoolCounter recursively

https://gerrit.wikimedia.org/r/1219216

Change #1219217 merged by jenkins-bot:

[mediawiki/core@wmf/1.46.0-wmf.7] ParserOutputAccess: don't use PoolCounter recursively

https://gerrit.wikimedia.org/r/1219217

Mentioned in SAL (#wikimedia-operations) [2025-12-17T21:09:38Z] <cscott@deploy2002> Started scap sync-world: Backport for [[gerrit:1219216|ParserOutputAccess: don't use PoolCounter recursively (T412959)]], [[gerrit:1219217|ParserOutputAccess: don't use PoolCounter recursively (T412959)]]

Mentioned in SAL (#wikimedia-operations) [2025-12-17T21:11:50Z] <cscott@deploy2002> cscott: Backport for [[gerrit:1219216|ParserOutputAccess: don't use PoolCounter recursively (T412959)]], [[gerrit:1219217|ParserOutputAccess: don't use PoolCounter recursively (T412959)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.

Mentioned in SAL (#wikimedia-operations) [2025-12-17T21:18:28Z] <cscott@deploy2002> Finished scap sync-world: Backport for [[gerrit:1219216|ParserOutputAccess: don't use PoolCounter recursively (T412959)]], [[gerrit:1219217|ParserOutputAccess: don't use PoolCounter recursively (T412959)]] (duration: 08m 50s)

cscott claimed this task.

I believe this is fixed in production now.

Alternative screenshot to make sure that the drop isn’t just due to the changed log message:

image.png (394×678 px, 43 KB)

(I acknowledge that you didn’t backport that change, but IMHO it still would’ve been a better idea to not do it at the same time :P)