Page MenuHomePhabricator

Introduce InterruptMutexManager
Open, LowestPublic

Description

Problem statement

Currently, if high-performance blocking locks are desired for tasks (such as cache regeneration de-duplication), the only option is PoolCounter.

In the sub-usecase of mutexes (vs semaphores), a broader set of backends could be used than those that could implement the *full* set of PoolCounter features.

For example, MySQL named locks could handle the exclusive regeneration (PC's acquireForMe case) case of PoolCounter, but not the case of multiple threads blocking on the same key (e.g. X threads can be "doing Y" at the same time).

The case of de-duplicating parsing work on edit fits into the former case (e.g. waiting on any prior edit stash request to finish for that edit) but not the later.

Proposed solution

For simplicity and flexibility, it would be nice if a new interface could be introduced for interacting with these locks. This would allow for the application to select any appropriate backend.

This would let the codebase fall back to things likes mysql named locks (when possible) but use faster, multi-DC appropriate things PoolCounter, if installed/available.

Currently, such logic would require a config variable and some if/else logic for the named lock vs PoolCounter case. A common interface for all native blocking locks would simplify this kind of scenario. Demanding PoolCounter is unnecessarily constraining and falling back to polling loops is suboptimal.

Details

Related Gerrit Patches:

Event Timeline

Krinkle created this task.Mar 29 2017, 8:39 PM
Restricted Application added projects: Multimedia, Commons. · View Herald TranscriptMar 29 2017, 8:39 PM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript

We've talked about this briefly today in TechCom. We agreed we don't need an RFC for this, however the current commit is lacking a bit of design documentation. It would be good to lay out there (or on Phabricator first) a problem statement and evaluation of existing interfaces we have for locking in filebackend, bagostuff, poolcounter etc., their issues and what this would address, the guarantees it is meant to provide etc.

In addition to the interface itself, it would also be good to have one or two existing callers changed in a draft commit to better show how this would improve certain aspects.

Restricted Application added a project: Multimedia. · View Herald TranscriptMar 29 2017, 8:46 PM
Krinkle moved this task from External to Backlog on the Architecture board.Mar 29 2017, 8:48 PM
Krinkle edited projects, added TechCom-RFC; removed Architecture.Apr 12 2017, 8:03 PM

Per last weeks' ArchCom meeting, we'll be tracking this as an RFC after all. My apologies for the confusion.

We'll try to keep the process lightweight. Mainly this means the problem statement and proposed approach should (also) be described in this task's description instead of only in Gerrit.

Krinkle moved this task from Inbox to Under discussion on the TechCom-RFC board.Apr 12 2017, 8:05 PM
Krinkle moved this task from Under discussion to Backlog on the TechCom-RFC board.Apr 12 2017, 8:11 PM
aaron updated the task description. (Show Details)May 5 2017, 1:21 AM
aaron triaged this task as Lowest priority.May 26 2017, 5:07 PM
Joe added a subscriber: Joe.Sep 13 2017, 8:47 PM
Krinkle moved this task from Backlog to Under discussion on the TechCom-RFC board.Sep 13 2017, 8:53 PM
Krinkle updated the task description. (Show Details)Sep 13 2017, 8:57 PM

This RFC has been scheduled for public discussion on IRC next Wednesday, September 20.

As always, the discussion will take place in the IRC channel #wikimedia-office on Wednesday 21:00 UTC (2pm PDT, 23:00 CEST).

I asked @aaron about an example use for this interface, in preparation for the IRC meeting tonight..

<Krinkle> what would be a good example in core or an extension that would make use of this interface (e.g. something that currently does something less nice)
<AaronSchulz> I think the edit stash locks that use always GET_LOCK atm. This is less flexible and is slower for postgres and sqlite since that just does a looping check.
<AaronSchulz> the getScopedLock calls in AuthManager/MessageCache too
<AaronSchulz> ChronologyProtector too

Krinkle added a comment.EditedSep 20 2017, 10:00 PM

Existing abstract lock managers in MediaWiki core:

InterfaceImplementationsConfigurationObtainBatchingShared Write (Semaphore)Exclusive WriteShared ReadComment
PoolCounterStub, Redis, PoolCounter_Client$wgPoolCounterConfPoolCounter::factory?Obtained and configured by purpose (e.g. "ArticleView"), not by backend, and lacks default or fallback.
LockManagerDB, FileSystem, Memcached, Postgres, Redis, Scoped$wgLockManagersLockManagerGroup
IDatabase::lockMySQL (GET_LOCK), MSSQL (memory), Postgres (ADVISORY_LOCK), Oracle, SQLite (FSLockManager)(N/A)MediaWikiServices?Not for generic use.
BagOStuff::lock(WaitConditionLoop)(N/A)MediaWikiServices?Not for generic use.

@Krinkle thank you for the overview!

I'm still very confused about the name. "InterruptMutexManager" sounds like a manager of mutexes of interrupts. If I understand correctly, "Interrupt" in the name refers to the guarantee that the lock will be released automatically when the current request terminates, no matter how it terminates. So perhaps "AutoReleaseMutexManager" would be more descriptive. But then, will we have MutexManagers that do not guarantee this? Why not just call the interface "MutexManager"?

This RFC was discussed on IRC on September 20th. Log: https://tools.wmflabs.org/meetbot/wikimedia-office/2017/wikimedia-office.2017-09-20-21.01.log.html

During that meeting, it was agreed to let this RFC enter the Last Call period. Should no pertinent concerns remain unaddressed by October 5, this RFC will be approved for implementation.

ksmith edited projects, added TechCom-RFC (TechCom-Approved); removed TechCom-RFC.
ksmith added a subscriber: ksmith.

This was approved in the 2017-10-04 TechCom private meeting.

Change 332950 had a related patch set uploaded (by Aaron Schulz; owner: Aaron Schulz):
[mediawiki/core@master] [WIP] Add InterruptMutexManager interface

https://gerrit.wikimedia.org/r/332950

Code is written, needs code review -- the gerrit patches are in the task description. @tstarling Is someone from your team able to take that on?

The use case for this is to allow Edit Stashing to use PoolCounter, instead of database locks (which means that we could do stashing in the secondary data center). We're open to solving this in another way instead.

Change 332951 had a related patch set uploaded (by Aaron Schulz; owner: Aaron Schulz):
[mediawiki/extensions/PoolCounter@master] [WIP] Implement InterruptMutexManager methods in PoolCounter_Client

https://gerrit.wikimedia.org/r/332951

aaron removed aaron as the assignee of this task.Jul 3 2018, 3:44 PM