Page MenuHomePhabricator

Fix cheesiness: be more transactional when popping messages from the limbo queue
Closed, ResolvedPublic

Description

If the orphan slayer crashes mid-process, it can cause the message being processed to be lost forever. This is a new weakness I accidentally introduced with the Redis changes.

Don't pop the top message on the queue... "peek" instead. Delete it only once the output has been committed.

Details

Related Gerrit Patches:
mediawiki/extensions/DonationInterface : masterRectify orphans inside the fetch loop
mediawiki/extensions/DonationInterface : masterDon't delete the message until we're done with it
mediawiki/extensions/DonationInterface : masterDon't delete limbo messages until we've finished processing

Related Objects

Event Timeline

awight created this task.Aug 4 2015, 6:40 AM
awight claimed this task.
awight raised the priority of this task from to Normal.
awight updated the task description. (Show Details)
awight added a subscriber: awight.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptAug 4 2015, 6:40 AM

Change 229987 had a related patch set uploaded (by Awight):
Provide a method to peek at the top element of a queue

https://gerrit.wikimedia.org/r/229987

Change 230175 had a related patch set uploaded (by Awight):
WIP Don't delete limbo messages until we've finished processing

https://gerrit.wikimedia.org/r/230175

awight moved this task from Doing to Review on the Fundraising Sprint Queen board.Aug 7 2015, 10:50 PM

Change 229987 merged by Ejegg:
Provide a method to peek at the top element of a queue

https://gerrit.wikimedia.org/r/229987

Change 230175 merged by jenkins-bot:
Don't delete limbo messages until we've finished processing

https://gerrit.wikimedia.org/r/230175

Change 230948 had a related patch set uploaded (by Awight):
Don't fetch data if we're just testing for existence

https://gerrit.wikimedia.org/r/230948

Change 230948 merged by Ejegg:
Don't fetch data if we're just testing for existence

https://gerrit.wikimedia.org/r/230948

Ejegg added a subscriber: Ejegg.Aug 13 2015, 4:43 PM

Sorry @awight, I didn't review this as thoroughly as I should have. We're still deleting the messages inside the loop that fetches the orphans, before doing anything to rectify them, leaving us open to data loss if the process loop fails or times out. Is there any benefit to fetching in a batch before rectifying, or can we do a peek-rectify-delete loop?

Change 231441 had a related patch set uploaded (by Awight):
Don't delete the message until we're done with it

https://gerrit.wikimedia.org/r/231441

Change 231441 merged by jenkins-bot:
Don't delete the message until we're done with it

https://gerrit.wikimedia.org/r/231441

Change 231452 had a related patch set uploaded (by Awight):
Rectify orphans inside the fetch loop

https://gerrit.wikimedia.org/r/231452

Change 231452 merged by jenkins-bot:
Rectify orphans inside the fetch loop

https://gerrit.wikimedia.org/r/231452

awight closed this task as Resolved.Aug 18 2015, 9:27 PM
awight moved this task from Pending Deployment to Done on the Fundraising Sprint Queen board.
awight set Security to None.
mmodell removed a subscriber: awight.Jun 22 2017, 9:38 PM