Page MenuHomePhabricator

Add label and description collision detectors for new terms store
Closed, ResolvedPublic

Description

Context
TermSqlIndex is still the only implementation of LabelConflictFinder that is used to find conflicts in labels and descriptions when creating properties and items, that is used in few places such as LabelUniquenessValidator.

Problem
We will not be able to stop writing to both term stores before we provide an implementation that looks up possible conflicts in the new store.

Solution
Provide a new means (interfaces + implementations) that allows writing uniqueness validators using the new store

Check if there are indexes on new tables to support this.

Todo

  • allow validating on ChangeOpResult for optimized validation on only changing parts patch for review
  • add ChangeOpFingerprint and ChangeOpFingerprintResult to allow identifying and attaching validators to only fingerprint changes patch for review
  • use ChangeOpFingerprint everywhere to wrap change ops to terms patch for review
  • wire up PrefetchingItemTermLookup
  • add implementation for detecting label and label+descripton collisions patch for review
  • add validator implementations and wire them up when writing to new store (migration values anything other than MIGRATE_OLD) patch for review
  • fix SingleEntitySourceServices::getPrefetchingTermLookup to respect item terms migration stages WIP patch
  • investigate the usage of SingleEntitySourceServices::getTermSearchInteractorFactory and, if necessary, update it to respect property and item term migration stages
  • change SpecialNewEntity so that it uses ChangeOp instead of creating an entity and saving it directly using EditEntity.
  • call ChangeOpResult::validate in EditEntity right after obtaining it patch for review

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
alaa_wmde updated the task description. (Show Details)

first patch for collision detector interfance+implementations is up for review

first patch ready for review again .. left some comments where I'd love to have a second (or more) thought

@Ladsgroup @Addshore @Lucas_Werkmeister_WMDE @hoo

For the new store, we are saving terms in a deferred update. This makes uniqueness checks:
a) if done during request, they might not catch possible collisions with values that are in deferred updates at the moment but haven't been written to db yet.
b) if done as part of the deferred update, they don't have the chance to report back collisions to the user.

Our options are:

  1. live with (a) and believe that those race conditions are rare. Pretty bad as it introduces uniqueness-breaches defying the whole purpose of those checks.
  2. live with (b) and believe that those race conditions are rare. This is pretty bad too, as it won't report collisions with already persisted terms.
  3. do both (a) and (b) .. extra load, can solve both issues with (a) and (b) alone, except for when a collision is going to happen between current edit and some deferred update. In that case, we only avoid breaking uniqueness but the user won't know.
  4. move saving terms to the request, instead of doing it in a deferred update. not sure how bad that is yet. can we tell already how longs those queries are taking already (from some db stats)? or will we need to setup a separate monitoring to know that?
  5. During request A, we store hashed fingerprint in suitable cache, and then deferred update A would remove it when finished. That way, request B can check during request both db and cache (with its hashed fingerprint). Probably the best option of both worlds, but need your opinions.
  6. Store hashed fingerprints in a separate table that is used only for uniqueness checks. extra storage, but is it going to be too much extra? if hash size is 512 bits then we would need for the 70m items we currently have: ( 512 bits for hash + 64 bits for entity id) * 70m ~= 5GB which sounds a lot.
  7. ... any other options you see?

For the new store, we are saving terms in a deferred update.

This is also true for the old term store, I believe. I’m inlined to go with option 1 (which as I understand it is the status quo).

  1. live with (a) and believe that those race conditions are rare. Pretty bad as it introduces uniqueness-breaches defying the whole purpose of those checks.

Well, what is the purpose of those checks? Is it really wholly defeated by even rare race conditions? I view it more as a warning mechanism to the user than an absolute invariant of the data whose integrity must be preserved under all circumstances, so I think those rare race conditions would be acceptable. (That might be a question for @Lydia_Pintscher.)

As I mentioned in person, given that DefferedUpdates happen practically at the same time of the edit, I'm fine with 1. If it was done in a job, I would be against it.

True. Option 1 is the status quo, and since those race-conditions are quite rare to begin with, and the point made by @Lucas_Werkmeister_WMDE about this validation not being a strict integrity check (@Lydia_Pintscher please correct us if we are making wrong assumptions here) sounds reasonable, I will continue with option 1 for now.

Thanks for the input!

Yeah it's not a super strict integrity check but if we can avoid Items and Properties with the same label/description combination we really should try.

Change 550703 had a related patch set uploaded (by Alaa Sarhan; owner: Alaa Sarhan):
[mediawiki/extensions/Wikibase@master] Add ChangeOpResult::validate

https://gerrit.wikimedia.org/r/550703

Change 550730 had a related patch set uploaded (by Alaa Sarhan; owner: Alaa Sarhan):
[mediawiki/extensions/Wikibase@master] Add ChangeOpFingerprint and ChangeOpFingerprintResult classes

https://gerrit.wikimedia.org/r/550730

Change 552905 had a related patch set uploaded (by Alaa Sarhan; owner: Alaa Sarhan):
[mediawiki/extensions/Wikibase@master] [WIP] Use ChangeOpFingerprint to wrap fingerprint change operations wherever created.

https://gerrit.wikimedia.org/r/552905

Change 553213 had a related patch set uploaded (by Alaa Sarhan; owner: Alaa Sarhan):
[mediawiki/extensions/Wikibase@master] Add Fingerprint uniqueness validation based on new term store

https://gerrit.wikimedia.org/r/553213

Change 550703 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@master] Add ChangeOpResult::validate

https://gerrit.wikimedia.org/r/550703

Change 553524 had a related patch set uploaded (by Alaa Sarhan; owner: Alaa Sarhan):
[mediawiki/extensions/Wikibase@master] Validate ChangeOpResult in EditEntity

https://gerrit.wikimedia.org/r/553524

Change 550730 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@master] Add ChangeOpFingerprint and ChangeOpFingerprintResult classes

https://gerrit.wikimedia.org/r/550730

Change 552905 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@master] Use ChangeOpFingerprint to wrap fingerprint (terms) change operations

https://gerrit.wikimedia.org/r/552905

Change 546898 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@master] Add TermsCollisionDetector and db implementations for properties and items

https://gerrit.wikimedia.org/r/546898

Change 556063 had a related patch set uploaded (by Sarhan; owner: Alaa Sarhan):
[mediawiki/extensions/Wikibase@master] Add Fingerprint uniqueness validation based on new term store

https://gerrit.wikimedia.org/r/556063

Change 556063 abandoned by Sarhan:
Add Fingerprint uniqueness validation based on new term store

Reason:
duplicate

https://gerrit.wikimedia.org/r/556063

Change 553213 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@master] Add Fingerprint uniqueness validation based on new term store

https://gerrit.wikimedia.org/r/553213

Last patch turning on validation of uniqueness in new store is ready for review now
https://gerrit.wikimedia.org/r/c/mediawiki/extensions/Wikibase/+/553524
It has 3 little fixes below it too

Not sure @Ladsgroup your comments in the review do make sense.. can you please double check them and read through the code again? I may of course have misunderstood them, maybe little more elaboration will help

@Addshore @Ladsgroup
Are we waiting on this until we start reading a range (first 1000 items iirc) from new store? Is there more code review to be done?

In case there are any questions or concerns that are easier to be done directly in a call/meeting just let me know and I can be available.

If reviewing the patch in the current state is too much effort or difficult (it can be very much the case), also let me know I can try to split it up further into couple of smaller patches

I'm on vacation until next Monday.

While reviewing https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/Wikibase/+/553524/ I noticed that both at least 2 more things need fixing:

  • SingleEntitySourceServices::getTermSearchInteractorFactory
  • SingleEntitySourceServices::getPrefetchingTermLookup

Until that happens its hard to really verify the final patch etc.

While reviewing https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/Wikibase/+/553524/ I noticed that both at least 2 more things need fixing:

  • SingleEntitySourceServices::getTermSearchInteractorFactory
  • SingleEntitySourceServices::getPrefetchingTermLookup

Until that happens its hard to really verify the final patch etc.

I will look into this tomorrow

I just filed a chain of tickets under T241971 to finally finish switching over to the new service containers too.

I added two test cases to the integration test editentity fingerprint uniqueness. They pass for API calls, which I could also confirm sending API requests manually.

But, I can also reproduce it through special pages: I can create conflicting properties (didn't test items). I will next investigate the mentioned SingleEntitySourceServices::getTermSearchInteractorFactory and SingleEntitySourceServices::getPrefetchingTermLookup.

Same goes for items. API calls fail manually as expected, but I can create conflicting items through special pages. I also confirm that SingleEntitySourceServices::getTermSearchInteractorFactory and SingleEntitySourceServices::getPrefetchingTermLookup need fixing. Will be working on that this evening.

note: Seems like that API calls somehow are configure to respect federation, using PerReopsitoryServiceWiring that was updated to respect item migration phase too in https://gerrit.wikimedia.org/r/c/mediawiki/extensions/Wikibase/+/554562, but Special Page would use SingleEntitySourceServices. Though conflicting property creation should have failed as SingleEntitySourceServices was updated to respect property migration stage. So there could be more buggy code down the call stack.

Change 562320 had a related patch set uploaded (by Sarhan; owner: Sarhan):
[mediawiki/extensions/Wikibase@master] Update SingleEntitySourceServices to respect item terms migration stages

https://gerrit.wikimedia.org/r/562320

Oh, apparently SpecialNewEntity creates entities without even creating change ops. That would explain it, as then it will by-pass the whole validation of fignreprint uniqueness that is only executed inside ChangeOpFingerprintResult::validate(). Guess that needs to change too.

Addshore claimed this task.
Addshore added a subscriber: sarhan.alaa.

Added the 2 things we found as sub tasks as they need to be done to be able to test this code in WRITE_NEW only

Change 563204 had a related patch set uploaded (by Addshore; owner: Addshore):
[mediawiki/extensions/Wikibase@master] Update SingleEntitySourceServices to respect item terms migration stages

https://gerrit.wikimedia.org/r/563204

Change 563204 abandoned by Addshore:
Update SingleEntitySourceServices to respect item terms migration stages

https://gerrit.wikimedia.org/r/563204

Change 562320 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@master] wbterms: Update SingleEntitySourceServices PrefetchingTermLookup

https://gerrit.wikimedia.org/r/562320

Change 563958 had a related patch set uploaded (by Ladsgroup; owner: Addshore):
[mediawiki/extensions/Wikibase@wmf/1.35.0-wmf.14] wbterms: Update SingleEntitySourceServices PrefetchingTermLookup

https://gerrit.wikimedia.org/r/563958

Change 553524 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@master] Validate ChangeOpResult in ModifyEntity

https://gerrit.wikimedia.org/r/553524

Change 563958 abandoned by Addshore:
wbterms: Update SingleEntitySourceServices PrefetchingTermLookup

Reason:
Abandoning as I figure we are not backporting this!

https://gerrit.wikimedia.org/r/563958