Page MenuHomePhabricator

Handle race-conditions between storage and cleanup logic
Closed, ResolvedPublic

Description

When storing new terms, while cleaning up others in parallel, it can happen that some of those terms may overlap and this can result into an inconsistent state of records in new schema tables, esp. that we do not use foreign keys constraints in this schema.

BDD
Given some terms exist in db
And two requests in-parallel
And one of the requests is storing new terms
And the other request is cleaning up other terms
And the new terms and other terms intersect over some terms that exist in db

When the two in-parallel requests are done
And all post-request updates for them have finished
Then I should see in db all new terms, from the first request
And I should not see in db terms that are part of other terms but are not part of new terms

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMay 24 2019, 9:22 AM
alaa_wmde renamed this task from Handle race-conditions with clean up logic in DatabseTermIdsAcquirer to Handle race-conditions between storage and cleanup logic.May 24 2019, 9:51 AM
alaa_wmde updated the task description. (Show Details)
alaa_wmde updated the task description. (Show Details)May 24 2019, 1:21 PM
alaa_wmde updated the task description. (Show Details)

Change 512873 had a related patch set uploaded (by Alaa Sarhan; owner: Alaa Sarhan):
[mediawiki/extensions/Wikibase@master] WIP - restore acquired term ids when cleaned due to race-condition with cleaner

https://gerrit.wikimedia.org/r/512873

alaa_wmde updated the task description. (Show Details)May 28 2019, 12:25 PM

Current proposed solution in (WIP) https://gerrit.wikimedia.org/r/512873 is to:

  • Acquirer would return acuired ids through a callback, instead of as a return value
  • After the callback is done (consumers link to those acuired Ids), Acquirer would then recheck if they all still exist
  • If any of those ids got removed (by a cleanup process that was running in-parallel), then the acquirer re-acquire those lost terms (term_in_lang records) by reusing the same ids that got deleted.
hoo added a subscriber: hoo.May 31 2019, 11:10 AM

As also briefly noted in Gerrit, relying on application logic only to ensure that rows are (still) in place seems to be a fairly weak consistency pledge. Exceptions, DB failures, sudden HW failures, …, can prevent these (re-)checks from happening, leaving the tables in an inconsistent state… unlike in transactions, this wont result in a rollback of all changes, but leave us in that inconsistent state.
That shouldn't block us from doing this, but it's something we need to be very well aware of… doing this means we can never be sure all tables are consistent (although I guess we can't ever be really sure about that, thus shouldn't make that assumption).

alaa_wmde added a subscriber: Marajozkee.EditedMay 31 2019, 11:30 AM

Agree to what you've just said @hoo. And certainly we have to wrap our storage logic within transactions. The higher up we put this transaction control, for instance, as you've pointed, inside DatabasePropertyTermStore and DatabaseItemTermStore the easier to change later.

As for race-conditions, it seems fairly complicated, if not impossible, to avoid them using transactions alone, thus we thought of this solution on top for cases where transactions weren't applicable directly.


Generally, the way race-conditions are surfacing up here between acquirer and cleaner suggests that there's probably a design issue that we do not yet see clearly.

It might turn out that maybe we shouldn't allow in-parallel updates and clean up the way we do now. How to achieve that isn't yet clear to me.

Or it might turn out to be that some concept is needed to be set explicitly to handle those situations, say UpdateCleaUpOrchestrator or so that actually collects requests for updates and clean ups and concerns itself only with orchestrating execution of updates and clean-ups, re-ordering them or even filtering them as necessary. That's my wild imagination at work here though, but there are enough hints so far that we will proabably need to implement some thing a little more sophisticated to handle this problem in a way that is simpler to understand and separates this concern from the Acquirer and Cleaner themselves somehow.

alaa_wmde added a comment.EditedMay 31 2019, 11:35 AM

Anyway I went too far. For now, I'd include all changes to different classes that are for the sake of avoiding race-conditions to this task.

@hoo for wrapping DatabasePropertyTermStore/DatabaseItemTermStore logic (storeTerms() and deleteTerms() methods) inside transactions, I'd probably create a separate task for that (and merge the changes that introduce the classes for now). Make sense to you?

hoo added a comment.Jun 4 2019, 10:30 AM

@hoo for wrapping DatabasePropertyTermStore/DatabaseItemTermStore logic (storeTerms() and deleteTerms() methods) inside transactions, I'd probably create a separate task for that (and merge the changes that introduce the classes for now). Make sense to you?

Sounds good to me.

Change 512873 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@master] Restore acquired term ids when cleaned up by cleaner running in-parallel

https://gerrit.wikimedia.org/r/512873