Page MenuHomePhabricator

[Epic] SPOF: Replace ActiveMQ donation queues with a more robust software stack
Closed, ResolvedPublic

Description

There are a dozen good reasons to get off of ActiveMQ, but primarily that it's a huge, flaming bag of SPOF. This task is complete once ActiveMQ is uninstalled everywhere, and the replacement system is high-availability or at least redundant.

Must complete this work before roughly the end of August, in order to leave time to stabilize ahead of Big English.

We've done a lot of the groundwork already by wrapping php-queue under the DonationQueue module in DonationInterface, so the work here is mostly in generalizing, applying to the remaining components, and then deploying the thing.

Deployment outline:

  • Producers will begin mirroring messages to both ActiveMQ and the new queue. Messages will be set to expire in a month or less, so intermittent failures while we start the new consumers are fine.
  • Incrementally switch consumers to consume from the new queue, beginning with the non-critical queues and tools. Keep an eye on queue usage and set the old message archiver to dump legacy data if storage issues arise.
  • Deactivate each mirror to ActiveMQ and finally retire the box only once we're 100% certain about the new stuff.

Related Objects

StatusAssignedTask
ResolvedNone
ResolvedNone
Resolvedawight
Resolvedawight
InvalidJgreen
ResolvedJgreen
ResolvedNone
ResolvedNone
ResolvedEjegg
ResolvedEjegg
ResolvedNone
Declinedawight
Invalidawight
OpenNone
OpenNone
Invalidcwdent
DeclinedNone
ResolvedNone
DuplicateNone
OpenNone
ResolvedXenoRyet
ResolvedEjegg
ResolvedEjegg
ResolvedNone
Resolvedawight
ResolvedNone
ResolvedEjegg
InvalidNone
ResolvedXenoRyet
DuplicateNone
ResolvedNone
Resolvedawight
ResolvedNone
Resolvedawight
Resolvedawight
DuplicateNone
ResolvedNone
OpenNone
Resolvedawight
Resolvedawight
OpenNone
Resolvedawight
OpenEjegg
ResolvedNone
ResolvedEjegg
ResolvedJgreen
ResolvedEjegg
ResolvedEjegg
InvalidNone
ResolvedJgreen
ResolvedCmjohnson
ResolvedJgreen
ResolvedJgreen
ResolvedJgreen
ResolvedJgreen

Event Timeline

awight created this task.Aug 6 2015, 8:04 PM
awight updated the task description. (Show Details)
awight raised the priority of this task from to Needs Triage.
awight added a subscriber: awight.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptAug 6 2015, 8:04 PM
Jgreen triaged this task as Normal priority.Sep 10 2015, 7:33 PM
Jgreen set Security to None.
awight edited a custom field.Nov 3 2015, 6:17 PM
awight renamed this task from Replace ActiveMQ donation queues with Redis to [Epic] SPOF: Replace ActiveMQ donation queues with Redis.Nov 24 2015, 9:02 PM
awight updated the task description. (Show Details)
awight raised the priority of this task from Normal to High.
atgo added a subscriber: atgo.Nov 25 2015, 6:59 PM

Hey @awight can you work on breaking out the coding tasks for this at some point? No huge rush, but if we want to aim for just that piece for Q3, we should know what it is :)

@atgo I outlined the coding in the task description, would it be more helpful if I created subtasks or something?

atgo added a comment.Nov 25 2015, 8:27 PM

thanks @awight. That's probably fine for now, but as we get closer we should definitely make the subtasks. Given we're still weeks out from even starting Q3, no big deal

awight added a parent task: Restricted Task.Dec 4 2015, 9:23 PM
awight updated the task description. (Show Details)Dec 5 2015, 12:46 AM
awight updated the task description. (Show Details)
Jgreen added a subscriber: Jgreen.Dec 7 2015, 5:18 PM

(12:14:26 PM) _joe_: I'd suggest kafka warmly if you can't lose messages

awight added a subscriber: Joe.Dec 7 2015, 10:44 PM

Kafka is an interesting suggestion, it looks like high availability is more mature.

I'm curious to know what @Joe meant by "if you can't lose messages"... Our throughput is pretty low, maximum is about 10/second, and I'm prejudiced towards Redis due to the perceived easy admin and the variety of data structures it supports, but I'd love to hear more about the alternatives.

DStrine updated the task description. (Show Details)Feb 16 2016, 6:00 PM
DStrine edited a custom field.

I should mention that I used a similar migration strategy for the orphan processing queue (T92915), and it failed spectacularly. I eventually had to accept losing the orphan messages for weeks, which is not an option in this case.

There were several problems around queue mirroring, where we failed to mirror in some places, and over-aggressively deleted in others.

awight renamed this task from [Epic] SPOF: Replace ActiveMQ donation queues with Redis to [Epic] SPOF: Replace ActiveMQ donation queues with a better software stack.Mar 24 2016, 9:21 PM
awight renamed this task from [Epic] SPOF: Replace ActiveMQ donation queues with a better software stack to [Epic] SPOF: Replace ActiveMQ donation queues with a more robust software stack.
atgo removed a subscriber: atgo.Mar 30 2016, 7:11 PM
Restricted Application added a subscriber: TerraCodes. · View Herald TranscriptApr 19 2016, 6:39 PM
awight removed a parent task: Restricted Task.Jul 15 2016, 1:12 AM
awight updated the task description. (Show Details)Jul 15 2016, 1:49 AM
awight updated the task description. (Show Details)Jul 15 2016, 2:00 AM
Jgreen closed this task as Resolved.Nov 29 2016, 6:21 PM
mmodell removed a subscriber: awight.Jun 22 2017, 9:34 PM