⚓ T97620 Convert various core/extension cache users to ReplicatedBagOStuff

Subject	Repo	Branch	Lines +/-
Conversion to using getMainStashInstance()	mediawiki/extensions/CentralAuth	wmf/1.26wmf20	+22 -7
Conversion to using getMainStashInstance()	mediawiki/extensions/CentralAuth	wmf/1.26wmf19	+22 -7
Conversion to using getMainStashInstance()	mediawiki/extensions/CentralAuth	master	+22 -7
Conversion to using getMainStashInstance()	mediawiki/extensions/OAuth	master	+33 -5
Conversion to using getMainStashInstance()	mediawiki/extensions/AbuseFilter	master	+11 -13
Conversion to using getMainStashInstance()	mediawiki/extensions/ConfirmEdit	master	+9 -9
Conversion to using getMainStashInstance()	mediawiki/extensions/FlaggedRevs	master	+72 -51
Added ObjectCache::getMainStashInstance() and $wgMainStash	mediawiki/core	master	+49 -2

		Status	Subtype	Assigned	Task
		Resolved		aaron	T88445 MediaWiki active/active datacenter investigation and work (tracking)
		Resolved		aaron	T97620 Convert various core/extension cache users to ReplicatedBagOStuff

aaron created this task.Apr 30 2015, 1:39 AM

aaron claimed this task.

aaron raised the priority of this task from to Medium.

aaron updated the task description. (Show Details)

aaron added projects: Patch-For-Review, Sustainability, Epic.

aaron removed a project: Patch-For-Review.

aaron set Security to None.

aaron added subscribers: • Gilles, • GWicke, mark and 7 others.

Change 207718 had a related patch set uploaded (by Aaron Schulz):
Added ObjectStash factory class and $wgMainStash/$wgObjectStashes

https://gerrit.wikimedia.org/r/207718

gerritbot added a project: Patch-For-Review.Apr 30 2015, 2:37 AM

aaron updated the task description. (Show Details)Apr 30 2015, 3:51 AM

aaron removed a project: Epic.

Change 207718 merged by jenkins-bot:
Added ObjectCache::getMainStashInstance() and $wgMainStash

https://gerrit.wikimedia.org/r/207718

aaron mentioned this in rMW793d01401c97: Added ObjectCache::getMainStashInstance() and $wgMainStash.May 19 2015, 7:44 AM

aaron moved this task from Tag to Doing on the Sustainability board.May 21 2015, 7:42 PM

Change 212715 had a related patch set uploaded (by Aaron Schulz):
Conversion to using getMainStashInstance()

https://gerrit.wikimedia.org/r/212715

Change 212717 had a related patch set uploaded (by Aaron Schulz):
[WIP] Conversion to using getMainStashInstance()

https://gerrit.wikimedia.org/r/212717

aaron updated the task description. (Show Details)May 21 2015, 9:54 PM

Change 213762 had a related patch set uploaded (by Aaron Schulz):
[WIP] Conversion to using getMainStashInstance()

https://gerrit.wikimedia.org/r/213762

aaron updated the task description. (Show Details)May 25 2015, 11:43 PM

aaron updated the task description. (Show Details)May 25 2015, 11:50 PM

Just to clarify, this would be for things where it is important for all instances to see the data, i.e., sort of like a brief key-value store? (Just curious because OATHAuth might need to use this for storing token expiration data to prevent replay attacks.)

Yes, although note that get() can have lag. I was actually thinking about having a flag to avoid lag for the replicated cache class.

noonce tokens are interesting (I was also thinking about that a lot today). If the GET/POST distinction is complete and only the later really mutates anything (like edits/comments) then noonce cache could be dc-local for performance, since POSTs would all go to one DC and have full deduplication and GETS would only allow one extra replay in the other DC (if fast enough and if HTTP was being used) and would not do anything. Of course if one is paranoid they can use add() on the stash bagostuff :)

Change 223790 had a related patch set uploaded (by Aaron Schulz):
Conversion to using getMainStashInstance()

https://gerrit.wikimedia.org/r/223790

aaron mentioned this in rEFLR6d3d341893fa: Conversion to using getMainStashInstance().Jul 11 2015, 1:18 AM

Change 212715 merged by jenkins-bot:
Conversion to using getMainStashInstance()

https://gerrit.wikimedia.org/r/212715

Diffusion mentioned this in rMEXT7bfd903e36d2: Updated mediawiki/extensions Project: mediawiki/extensions/FlaggedRevs….Jul 11 2015, 1:18 AM

• Forrestbot added a project: WMF-deploy-2015-07-14_(1.26wmf14).Jul 11 2015, 2:00 AM

Change 212717 merged by jenkins-bot:
Conversion to using getMainStashInstance()

https://gerrit.wikimedia.org/r/212717

Diffusion mentioned this in rMEXT300f5c01bbc3: Updated mediawiki/extensions Project: mediawiki/extensions/AbuseFilter….Jul 15 2015, 8:24 AM

aaron mentioned this in rEABF9ffa4003226c: Conversion to using getMainStashInstance().Jul 15 2015, 8:24 AM

Change 213762 merged by jenkins-bot:
Conversion to using getMainStashInstance()

https://gerrit.wikimedia.org/r/213762

Diffusion mentioned this in rMEXTc87f32c7d57a: Updated mediawiki/extensions Project: mediawiki/extensions/ConfirmEdit….Jul 15 2015, 8:31 AM

aaron mentioned this in rECOE63d0bc66e070: Conversion to using getMainStashInstance().Jul 15 2015, 8:31 AM

• Forrestbot added a project: WMF-deploy-2015-07-21_(1.26wmf15).Jul 15 2015, 9:00 AM

Change 221994 had a related patch set uploaded (by Aaron Schulz):
Conversion to using getMainStashInstance()

https://gerrit.wikimedia.org/r/221994

aaron mentioned this in T88493: Devise stashing strategy for multi-DC mediawiki.Jul 21 2015, 8:50 PM

• chasemp mentioned this in T106986: High number of (session) redis connection failures.Jul 28 2015, 7:12 PM

@aaron do we really want to rely on redis replication cross-datacenter? Or did I get that wrong? Redis replication is know not to be extraordiarily reliable even in a small-lag, same DC setup, did you do some experiments to see how reliable it would be?

If not, I'd love to help.

In T97620#1492791, @Joe wrote:

@aaron do we really want to rely on redis replication cross-datacenter? Or did I get that wrong? Redis replication is know not to be extraordiarily reliable even in a small-lag, same DC setup, did you do some experiments to see how reliable it would be?

If not, I'd love to help.

We used replication from tampa => eqiad during the switchover (though I don't think the consistent hashing was done correctly MW side). I was assuming we have replication likewise between eqiad => codfw.

@aaron at the moment we don't, as replicating redis would result in the codfw jobqueues processing again the same jobs as the eqiad ones, for instance.

Also, while I can think of that as a solution for a "definitive" switchover, I don't think it's a good idea long-term. But I'll look into options for making it as reliable as possible.

I'm just talking about the mc* redis instances (this bug is just about BagOStuff). We've done replication for that before.

I think the queues could be replicated in any case (though only duplicate jobs would be those who did not have the ACK replicated before switchover, which is tolerable), but that discussion could go on another task.

Change 221994 merged by jenkins-bot:
Conversion to using getMainStashInstance()

https://gerrit.wikimedia.org/r/221994

Diffusion mentioned this in rMEXTca10db91b0ec: Updated mediawiki/extensions Project: mediawiki/extensions/OAuth….Aug 6 2015, 10:10 PM

aaron mentioned this in rEOAUf4f212a9658a: Conversion to using getMainStashInstance().Aug 6 2015, 10:10 PM

• Forrestbot added a project: WMF-deploy-2015-08-11_(1.26wmf18).Aug 6 2015, 11:00 PM

Change 234190 had a related patch set uploaded (by Aaron Schulz):
Conversion to using getMainStashInstance()

https://gerrit.wikimedia.org/r/234190

Change 234191 had a related patch set uploaded (by Aaron Schulz):
Conversion to using getMainStashInstance()

https://gerrit.wikimedia.org/r/234191

Change 223790 merged by jenkins-bot:
Conversion to using getMainStashInstance()

https://gerrit.wikimedia.org/r/223790

aaron mentioned this in rMEXT1b54d28553b6: Updated mediawiki/extensions Project: mediawiki/extensions/CentralAuth….Aug 27 2015, 12:02 AM