Maniphest T151993

Implement ChangeDispatchCoordinator based on RedisLockManager
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	daniel
	Nov 30 2016, 11:25 AM

Description

Our current implementation of ChangeDispatchCoordinator relies on global MySQL locks, which hogs connections to the master DB, and may lose locks when these connections are reset by a watchdog.

@aaron suggested to use Redis based locks instead.

When implementing ChangeDispatchCoordinator based on RedisLockManager, please take care to closely replicate the semantics of the existing SqlChangeDispatchCoordinator.

Details

	Subject	Repo	Branch	Lines +/-
	Add LockManagerSqlChangeDispatchCoordinator	mediawiki/extensions/Wikibase	master	+185 -6
	Document dispatchingLockManager option in $wgWBRepoSettings	mediawiki/extensions/Wikibase	master	+1 -0

Customize query in gerrit

Related Objects
Search...

Status	Assigned	Task
Invalid	None	T108944 [Epic] Improve change dispatching
Resolved	Ladsgroup	T151681 DispatchChanges: Avoid long-lasting connections to the master DB
Resolved	Ladsgroup	T151993 Implement ChangeDispatchCoordinator based on RedisLockManager
Resolved	Andrew	T155042 Increase quota for wikidata-dev project
Resolved	Ladsgroup	T155190 Build an environment to test change dispatching using Redis-based locking
Declined	None	T155196 Vagrant 1.9.1 provision failure on Trusty using role::labs:mediawiki_vagrant
Resolved	Ladsgroup	T157308 Create test script for ChangeDispatchCoordinator

Event Timeline

daniel created this task.Nov 30 2016, 11:25 AM

Restricted Application added a project: User-Ladsgroup. · View Herald TranscriptNov 30 2016, 11:25 AM

daniel mentioned this in T151681: DispatchChanges: Avoid long-lasting connections to the master DB .Nov 30 2016, 8:17 PM

Ladsgroup moved this task from Proposed to Doing on the Wikidata-Former-Sprint-Board board.Dec 2 2016, 4:45 PM

Ladsgroup moved this task from Incoming to In progress on the User-Ladsgroup board.Dec 5 2016, 8:39 PM

daniel added a project: User-Daniel.Dec 6 2016, 4:41 PM

Change 325640 had a related patch set uploaded (by Ladsgroup):
[very WIP] [Draft] [DNM] [I can't emphasize enough] Add RedisLockSqlChangeDispatchCoordinator

https://gerrit.wikimedia.org/r/325640

gerritbot added a project: Patch-For-Review.Dec 6 2016, 9:33 PM

@aaron: Hey, I'm working on this but it seems that RedisLockManager (and LockManagers in general) doesn't have lock lookup or I couldn't find it. Can you clarify if lock managers have something equivalent to Database::lockIsFree ?

There is no analogous method. Maybe a non-blocking engageClientLock() call can replace the isClientLockUsed() call? Seems like chd_lock is used to determine whether to do a non-blocking check first before a blocking acquisition (which could race anyway, which I guess just effects runtime?).

Without looking at the code, it seems we should be able to do without Database::lockIsFree. It's nice to do a check first to avoid a more expensive attempt to acquire a lock if we already know that that is likely to fail, but in the end, all we need is an atomic way to try to grab a lock, and then know if we got it.

One thing that is important in this context is how stale locks are handled. Does redis know if the owner of a lock has died? Or an orphaned lock stay around forever? Do locks time out?

If the owner fatals, the lock will have to expire (the TTL depends on the LockManager instance config and/or whether the context is CLI or web).

Ladsgroup moved this task from Doing to Review on the Wikidata-Former-Sprint-Board board.Dec 8 2016, 10:40 PM

Ladsgroup moved this task from In progress to Blocked on others on the User-Ladsgroup board.Dec 9 2016, 1:01 AM

daniel moved this task from Inbox to Push on the User-Daniel board.Jan 5 2017, 4:12 PM

Ladsgroup created subtask T155042: Increase quota for wikidata-dev project.Jan 10 2017, 8:02 PM

Ladsgroup created subtask T155190: Build an environment to test change dispatching using Redis-based locking.Jan 12 2017, 9:18 PM

thiemowmde moved this task from Review to Doing on the Wikidata-Former-Sprint-Board board.Jan 16 2017, 11:10 AM

Andrew closed subtask T155042: Increase quota for wikidata-dev project as Resolved.Jan 24 2017, 2:46 PM

Ladsgroup moved this task from Doing to Review on the Wikidata-Former-Sprint-Board board.Feb 13 2017, 8:10 PM

Change 325640 merged by jenkins-bot:
[mediawiki/extensions/Wikibase] Add LockManagerSqlChangeDispatchCoordinator

https://gerrit.wikimedia.org/r/325640

Ladsgroup moved this task from Blocked on others to Done on the User-Ladsgroup board.Mar 7 2017, 12:58 PM

Ladsgroup moved this task from Review to Done on the Wikidata-Former-Sprint-Board board.

For implementation details. Let's keep talking in T159828: Use redis-based lock manager for dispatchChanges on test.wikidata.org and T159826: Use redis-based lock manager in dispatch changes in production

Ladsgroup closed this task as Resolved.Mar 7 2017, 1:52 PM

Ladsgroup closed subtask T155190: Build an environment to test change dispatching using Redis-based locking as Resolved.Mar 7 2017, 3:23 PM

Reopening until this is tested and confirmed.

daniel mentioned this in T157308: Create test script for ChangeDispatchCoordinator.Mar 24 2017, 10:23 AM

Change 344750 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@master] Document dispatchingLockManager option in $wgWBRepoSettings

https://gerrit.wikimedia.org/r/344750

It just occurred to me an extra reason to avoid using a db master- master failover is a relative frequent operation: it will happen every time the master mysql is upgraded, or when there is a datacenter failover (2 of those will happen on April/May)- probably there wasn't any safe guard to avoid issues when that happened.

Closing this particular task (which only holds some of the things we need to do in order to actually improve the situation considerably).

Implement ChangeDispatchCoordinator based on RedisLockManagerClosed, ResolvedPublicActions

Description

Details

Related ObjectsSearch...

Event Timeline

Implement ChangeDispatchCoordinator based on RedisLockManager
Closed, ResolvedPublic
Actions

Related Objects
Search...