Page MenuHomePhabricator

Introduce InterruptMutexManager
Closed, DeclinedPublic

Description

Problem statement

Currently, if high-performance blocking locks are desired for tasks (such as cache regeneration de-duplication), the only option is PoolCounter.

In the sub-usecase of mutexes (vs semaphores), a broader set of backends could be used than those that could implement the *full* set of PoolCounter features.

For example, MySQL named locks could handle the exclusive regeneration (PC's acquireForMe case) case of PoolCounter, but not the case of multiple threads blocking on the same key (e.g. X threads can be "doing Y" at the same time).

The case of de-duplicating parsing work on edit fits into the former case (e.g. waiting on any prior edit stash request to finish for that edit) but not the later.

Proposed solution

For simplicity and flexibility, it would be nice if a new interface could be introduced for interacting with these locks. This would allow for the application to select any appropriate backend.

This would let the codebase fall back to things likes mysql named locks (when possible) but use faster, multi-DC appropriate things PoolCounter, if installed/available.

Currently, such logic would require a config variable and some if/else logic for the named lock vs PoolCounter case. A common interface for all native blocking locks would simplify this kind of scenario. Demanding PoolCounter is unnecessarily constraining and falling back to polling loops is suboptimal.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

We've talked about this briefly today in TechCom. We agreed we don't need an RFC for this, however the current commit is lacking a bit of design documentation. It would be good to lay out there (or on Phabricator first) a problem statement and evaluation of existing interfaces we have for locking in filebackend, bagostuff, poolcounter etc., their issues and what this would address, the guarantees it is meant to provide etc.

In addition to the interface itself, it would also be good to have one or two existing callers changed in a draft commit to better show how this would improve certain aspects.

Per last weeks' ArchCom meeting, we'll be tracking this as an RFC after all. My apologies for the confusion.

We'll try to keep the process lightweight. Mainly this means the problem statement and proposed approach should (also) be described in this task's description instead of only in Gerrit.

aaron triaged this task as Lowest priority.May 26 2017, 5:07 PM

This RFC has been scheduled for public discussion on IRC next Wednesday, September 20.

As always, the discussion will take place in the IRC channel #wikimedia-office on Wednesday 21:00 UTC (2pm PDT, 23:00 CEST).

I asked @aaron about an example use for this interface, in preparation for the IRC meeting tonight..

<Krinkle> what would be a good example in core or an extension that would make use of this interface (e.g. something that currently does something less nice)
<AaronSchulz> I think the edit stash locks that use always GET_LOCK atm. This is less flexible and is slower for postgres and sqlite since that just does a looping check.
<AaronSchulz> the getScopedLock calls in AuthManager/MessageCache too
<AaronSchulz> ChronologyProtector too

Existing abstract lock managers in MediaWiki core:

InterfaceImplementationsConfigurationObtainBatchingShared Write (Semaphore)Exclusive WriteShared ReadComment
PoolCounterStub, Redis, PoolCounter_Client$wgPoolCounterConfPoolCounter::factory?Obtained and configured by purpose (e.g. "ArticleView"), not by backend, and lacks default or fallback.
LockManagerDB, FileSystem, Memcached, Postgres, Redis, Scoped$wgLockManagersLockManagerGroup
IDatabase::lockMySQL (GET_LOCK), MSSQL (memory), Postgres (ADVISORY_LOCK), Oracle, SQLite (FSLockManager)(N/A)MediaWikiServices?Not for generic use.
BagOStuff::lock(WaitConditionLoop)(N/A)MediaWikiServices?Not for generic use.

I'm still very confused about the name. "InterruptMutexManager" sounds like a manager of mutexes of interrupts. If I understand correctly, "Interrupt" in the name refers to the guarantee that the lock will be released automatically when the current request terminates, no matter how it terminates. So perhaps "AutoReleaseMutexManager" would be more descriptive. But then, will we have MutexManagers that do not guarantee this? Why not just call the interface "MutexManager"?

This RFC was discussed on IRC on September 20th. Log: https://tools.wmflabs.org/meetbot/wikimedia-office/2017/wikimedia-office.2017-09-20-21.01.log.html

During that meeting, it was agreed to let this RFC enter the Last Call period. Should no pertinent concerns remain unaddressed by October 5, this RFC will be approved for implementation.

ksmith added a subscriber: ksmith.

This was approved in the 2017-10-04 TechCom private meeting.

Change 332950 had a related patch set uploaded (by Aaron Schulz; owner: Aaron Schulz):
[mediawiki/core@master] [WIP] Add InterruptMutexManager interface

https://gerrit.wikimedia.org/r/332950

Code is written, needs code review -- the gerrit patches are in the task description. @tstarling Is someone from your team able to take that on?

The use case for this is to allow Edit Stashing to use PoolCounter, instead of database locks (which means that we could do stashing in the secondary data center). We're open to solving this in another way instead.

Change 332951 had a related patch set uploaded (by Aaron Schulz; owner: Aaron Schulz):
[mediawiki/extensions/PoolCounter@master] [WIP] Implement InterruptMutexManager methods in PoolCounter_Client

https://gerrit.wikimedia.org/r/332951

aaron removed aaron as the assignee of this task.Jul 3 2018, 3:44 PM

Change 532869 had a related patch set uploaded (by Krinkle; owner: Aaron Schulz):
[mediawiki/core@master] lockmanager: rewrite MySqlLockManager to use GET_LOCK() and friends

https://gerrit.wikimedia.org/r/532869

Just flying by here to say that having this abstracted and standardised interface for a dc-local non-blocking mutex lock manager, would also help with reducing use of the internal getLocalClusterInstance object. Apart from one or two core use cases, use of that are mostly for wanting a cheap lock that works both by default in stock MW and at WMF. By having this service class, we'd be able to transition more use away from the internal getLocalClusterInstance and to something with a proper service interface. (The alternative to using getLocalClusterInstance today, are basically to either hardcode against DB master locks for stock MW and for WMF; or to use WMF-only poolcounter and otherwise be unprotected by default; hence getLocalClusterInstance is currently a common thing to reach for).

See also T243233.

Random observation because this poped up on my radar again: I'd like to reiterate that the word "Interrupt" in the name sends me down the wrong path of thought every time. Why not just "MutexManager" or "MutexLockManager"?

Closing due to lack of interest.

Change 332950 abandoned by Aaron Schulz:

[mediawiki/core@master] [WIP] Add InterruptMutexManager interface

Reason:

https://gerrit.wikimedia.org/r/332950

Change 332951 abandoned by Aaron Schulz:

[mediawiki/extensions/PoolCounter@master] [WIP] Implement InterruptMutexManager methods in PoolCounter_Client

Reason:

https://gerrit.wikimedia.org/r/332951

Change 532869 abandoned by Aaron Schulz:

[mediawiki/core@master] lockmanager: rewrite MySqlLockManager to use GET_LOCK() and friends

Reason:

https://gerrit.wikimedia.org/r/532869

Change 532869 restored by Aaron Schulz:

[mediawiki/core@master] lockmanager: rewrite MySqlLockManager to use GET_LOCK() and friends

https://gerrit.wikimedia.org/r/532869

Change 532869 abandoned by Aaron Schulz:

[mediawiki/core@master] lockmanager: rewrite MySqlLockManager to use GET_LOCK() and friends

Reason:

PoC; going with removal

https://gerrit.wikimedia.org/r/532869