Page MenuHomePhabricator

Modify DonationInterface limbo code for high availability deployment
Closed, ResolvedPublic2 Story Points

Description

Implement plan set out in T103206. Make changes to our code and configuration so that we are minimally affected by the following types of failure:

  • One payments box becomes unreachable for a few minutes.
  • A payments box dies and can be rebuilt with intact data.
  • A payments box dies and is rebuilt with no data.

See updated documentation at https://wikitech.wikimedia.org/wiki/Fundraising#Message_queues

Rework code to handle limbo queues as an object rather than a global, and add logic to choose which backends we connect to in each case.

  • Frontend code writes to the queue on localhost, aka. payments100[1-3]. No code change, just configuration.
  • Orphan slayer connects to these three queues in round-robin order. Configure.
  • If the orphan slayer fails to connect to a server, eliminate it from this batch run and hobble on through.
  • Configuration for a single queue backend should behave as it did before.

Related Objects

Event Timeline

awight created this task.Jul 1 2015, 9:21 PM
awight claimed this task.
awight raised the priority of this task from to High.
awight updated the task description. (Show Details)
awight added subscribers: atgo, awight.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJul 1 2015, 9:21 PM
atgo added a comment.Jul 15 2015, 4:54 PM

@awight is this blocked on something? where should this fit in our workflow?

awight set Security to None.Jul 23 2015, 9:47 PM
awight edited a custom field.
awight added a project: Unplanned-Sprint-Work.
awight updated the task description. (Show Details)Jul 25 2015, 1:55 AM

Change 226948 had a related patch set uploaded (by Awight):
WIP Implement high-availability queue pool

https://gerrit.wikimedia.org/r/226948

Change 226948 abandoned by Awight:
WIP Implement high-availability queue pool

Reason:
squashed.

https://gerrit.wikimedia.org/r/226948

awight closed this task as Resolved.Aug 4 2015, 12:13 AM
mmodell removed a subscriber: awight.Jun 22 2017, 9:38 PM