Page MenuHomePhabricator

[Epic] SPOF: Replace ActiveMQ donation queues with a more robust software stack
Closed, ResolvedPublic

Description

There are a dozen good reasons to get off of ActiveMQ, but primarily that it's a huge, flaming bag of SPOF. This task is complete once ActiveMQ is uninstalled everywhere, and the replacement system is high-availability or at least redundant.

Must complete this work before roughly the end of August, in order to leave time to stabilize ahead of Big English.

We've done a lot of the groundwork already by wrapping php-queue under the DonationQueue module in DonationInterface, so the work here is mostly in generalizing, applying to the remaining components, and then deploying the thing.

Deployment outline:

  • Producers will begin mirroring messages to both ActiveMQ and the new queue. Messages will be set to expire in a month or less, so intermittent failures while we start the new consumers are fine.
  • Incrementally switch consumers to consume from the new queue, beginning with the non-critical queues and tools. Keep an eye on queue usage and set the old message archiver to dump legacy data if storage issues arise.
  • Deactivate each mirror to ActiveMQ and finally retire the box only once we're 100% certain about the new stuff.

Related Objects

StatusSubtypeAssignedTask
ResolvedNone
ResolvedNone
Resolvedawight
Resolvedawight
InvalidJgreen
ResolvedJgreen
ResolvedNone
ResolvedNone
ResolvedEjegg
ResolvedEjegg
ResolvedNone
Declinedawight
Invalidawight
OpenNone
OpenNone
Invalid cwdent
DeclinedNone
ResolvedNone
DuplicateNone
OpenSpikeNone
ResolvedXenoRyet
ResolvedEjegg
ResolvedEjegg
ResolvedNone
Resolvedawight
ResolvedSpikeNone
ResolvedEjegg
InvalidNone
ResolvedXenoRyet
DuplicateNone
ResolvedNone
Resolvedawight
ResolvedNone
Resolvedawight
Resolvedawight
DuplicateNone
ResolvedNone
ResolvedEjegg
Resolvedawight
Resolvedawight
ResolvedEjegg
Resolvedawight
OpenSpikeNone
ResolvedNone
ResolvedEjegg
ResolvedJgreen
ResolvedEjegg
ResolvedEjegg
InvalidNone
ResolvedJgreen
Resolved Cmjohnson
ResolvedJgreen
ResolvedJgreen
ResolvedJgreen
ResolvedJgreen

Event Timeline

awight raised the priority of this task from to Needs Triage.
awight updated the task description. (Show Details)
awight added a subscriber: awight.
Jgreen triaged this task as Medium priority.Sep 10 2015, 7:33 PM
Jgreen set Security to None.
awight renamed this task from Replace ActiveMQ donation queues with Redis to [Epic] SPOF: Replace ActiveMQ donation queues with Redis.Nov 24 2015, 9:02 PM
awight raised the priority of this task from Medium to High.
awight updated the task description. (Show Details)

Hey @awight can you work on breaking out the coding tasks for this at some point? No huge rush, but if we want to aim for just that piece for Q3, we should know what it is :)

@atgo I outlined the coding in the task description, would it be more helpful if I created subtasks or something?

thanks @awight. That's probably fine for now, but as we get closer we should definitely make the subtasks. Given we're still weeks out from even starting Q3, no big deal

awight added a parent task: Restricted Task.Dec 4 2015, 9:23 PM
awight updated the task description. (Show Details)

(12:14:26 PM) _joe_: I'd suggest kafka warmly if you can't lose messages

Kafka is an interesting suggestion, it looks like high availability is more mature.

I'm curious to know what @Joe meant by "if you can't lose messages"... Our throughput is pretty low, maximum is about 10/second, and I'm prejudiced towards Redis due to the perceived easy admin and the variety of data structures it supports, but I'd love to hear more about the alternatives.

I should mention that I used a similar migration strategy for the orphan processing queue (T92915), and it failed spectacularly. I eventually had to accept losing the orphan messages for weeks, which is not an option in this case.

There were several problems around queue mirroring, where we failed to mirror in some places, and over-aggressively deleted in others.

awight renamed this task from [Epic] SPOF: Replace ActiveMQ donation queues with Redis to [Epic] SPOF: Replace ActiveMQ donation queues with a better software stack.Mar 24 2016, 9:21 PM
awight renamed this task from [Epic] SPOF: Replace ActiveMQ donation queues with a better software stack to [Epic] SPOF: Replace ActiveMQ donation queues with a more robust software stack.
awight removed a parent task: Restricted Task.Jul 15 2016, 1:12 AM