Page MenuHomePhabricator

Add label and description collision detectors for new terms store
Open, HighPublic

Description

Context
TermSqlIndex is still the only implementation of LabelConflictFinder that is used to find conflicts in labels and descriptions when creating properties and items, that is used in few places such as LabelUniquenessValidator.

Problem
We will not be able to stop writing to both term stores before we provide an implementation that looks up possible conflicts in the new store.

Solution
Provide a new means (interfaces + implementations) that allows writing uniqueness validators using the new store

Check if there are indexes on new tables to support this.

Todo

  • allow validating on ChangeOpResult for optimized validation on only changing parts patch for review
  • add ChangeOpFingerprint and ChangeOpFingerprintResult to allow identifying and attaching validators to only fingerprint changes patch for review
  • use ChangeOpFingerprint everywhere to wrap change ops to terms patch for review
  • wire up PrefetchingItemTermLookup
  • add implementation for detecting label and label+descripton collisions patch for review
  • add validator implementations and wire them up when writing to new store (migration values anything other than MIGRATE_OLD) patch for review
  • call ChangeOpResult::validate in EditEntity right after obtaining it patch for review

Details

Related Gerrit Patches:

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptSep 4 2019, 8:23 PM
alaa_wmde renamed this task from Add LabelConflictFinder implementation for new Store to Add LabelConflictFinder implementation for new terms store.Sep 4 2019, 8:48 PM
alaa_wmde triaged this task as High priority.
alaa_wmde updated the task description. (Show Details)Sep 4 2019, 8:50 PM

Change 546898 had a related patch set uploaded (by Alaa Sarhan; owner: Alaa Sarhan):
[mediawiki/extensions/Wikibase@master] Add TermsCollisionDetector and db implementations for properties and items

https://gerrit.wikimedia.org/r/546898

alaa_wmde renamed this task from Add LabelConflictFinder implementation for new terms store to Add label and description collision detectors for new terms store.Oct 30 2019, 12:53 PM
alaa_wmde removed alaa_wmde as the assignee of this task.
alaa_wmde updated the task description. (Show Details)
alaa_wmde moved this task from Incoming to Ready to pick up on the Wikidata-Campsite board.
alaa_wmde updated the task description. (Show Details)

first patch for collision detector interfance+implementations is up for review

First patch reworked and ready for peer review again

I'm doing some cleanup work on first patch now

first patch ready for review again .. left some comments where I'd love to have a second (or more) thought

@Ladsgroup @Addshore @Lucas_Werkmeister_WMDE @hoo

For the new store, we are saving terms in a deferred update. This makes uniqueness checks:
a) if done during request, they might not catch possible collisions with values that are in deferred updates at the moment but haven't been written to db yet.
b) if done as part of the deferred update, they don't have the chance to report back collisions to the user.

Our options are:

  1. live with (a) and believe that those race conditions are rare. Pretty bad as it introduces uniqueness-breaches defying the whole purpose of those checks.
  2. live with (b) and believe that those race conditions are rare. This is pretty bad too, as it won't report collisions with already persisted terms.
  3. do both (a) and (b) .. extra load, can solve both issues with (a) and (b) alone, except for when a collision is going to happen between current edit and some deferred update. In that case, we only avoid breaking uniqueness but the user won't know.
  4. move saving terms to the request, instead of doing it in a deferred update. not sure how bad that is yet. can we tell already how longs those queries are taking already (from some db stats)? or will we need to setup a separate monitoring to know that?
  5. During request A, we store hashed fingerprint in suitable cache, and then deferred update A would remove it when finished. That way, request B can check during request both db and cache (with its hashed fingerprint). Probably the best option of both worlds, but need your opinions.
  6. Store hashed fingerprints in a separate table that is used only for uniqueness checks. extra storage, but is it going to be too much extra? if hash size is 512 bits then we would need for the 70m items we currently have: ( 512 bits for hash + 64 bits for entity id) * 70m ~= 5GB which sounds a lot.
  7. ... any other options you see?

For the new store, we are saving terms in a deferred update.

This is also true for the old term store, I believe. I’m inlined to go with option 1 (which as I understand it is the status quo).

  1. live with (a) and believe that those race conditions are rare. Pretty bad as it introduces uniqueness-breaches defying the whole purpose of those checks.

Well, what is the purpose of those checks? Is it really wholly defeated by even rare race conditions? I view it more as a warning mechanism to the user than an absolute invariant of the data whose integrity must be preserved under all circumstances, so I think those rare race conditions would be acceptable. (That might be a question for @Lydia_Pintscher.)

As I mentioned in person, given that DefferedUpdates happen practically at the same time of the edit, I'm fine with 1. If it was done in a job, I would be against it.

True. Option 1 is the status quo, and since those race-conditions are quite rare to begin with, and the point made by @Lucas_Werkmeister_WMDE about this validation not being a strict integrity check (@Lydia_Pintscher please correct us if we are making wrong assumptions here) sounds reasonable, I will continue with option 1 for now.

Thanks for the input!

Yeah it's not a super strict integrity check but if we can avoid Items and Properties with the same label/description combination we really should try.

Change 550703 had a related patch set uploaded (by Alaa Sarhan; owner: Alaa Sarhan):
[mediawiki/extensions/Wikibase@master] Add ChangeOpResult::validate

https://gerrit.wikimedia.org/r/550703

Change 550730 had a related patch set uploaded (by Alaa Sarhan; owner: Alaa Sarhan):
[mediawiki/extensions/Wikibase@master] Add ChangeOpFingerprint and ChangeOpFingerprintResult classes

https://gerrit.wikimedia.org/r/550730

First patch is ready for review again

Change 552905 had a related patch set uploaded (by Alaa Sarhan; owner: Alaa Sarhan):
[mediawiki/extensions/Wikibase@master] [WIP] Use ChangeOpFingerprint to wrap fingerprint change operations wherever created.

https://gerrit.wikimedia.org/r/552905

Change 553213 had a related patch set uploaded (by Alaa Sarhan; owner: Alaa Sarhan):
[mediawiki/extensions/Wikibase@master] Add Fingerprint uniqueness validation based on new term store

https://gerrit.wikimedia.org/r/553213

Change 550703 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@master] Add ChangeOpResult::validate

https://gerrit.wikimedia.org/r/550703

Change 553524 had a related patch set uploaded (by Alaa Sarhan; owner: Alaa Sarhan):
[mediawiki/extensions/Wikibase@master] Validate ChangeOpResult in EditEntity

https://gerrit.wikimedia.org/r/553524

alaa_wmde updated the task description. (Show Details)

Change 550730 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@master] Add ChangeOpFingerprint and ChangeOpFingerprintResult classes

https://gerrit.wikimedia.org/r/550730

Change 552905 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@master] Use ChangeOpFingerprint to wrap fingerprint (terms) change operations

https://gerrit.wikimedia.org/r/552905

alaa_wmde updated the task description. (Show Details)Wed, Dec 4, 4:36 PM
alaa_wmde updated the task description. (Show Details)Wed, Dec 4, 6:10 PM

Change 546898 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@master] Add TermsCollisionDetector and db implementations for properties and items

https://gerrit.wikimedia.org/r/546898

alaa_wmde updated the task description. (Show Details)Mon, Dec 9, 1:10 PM

Change 556063 had a related patch set uploaded (by Sarhan; owner: Alaa Sarhan):
[mediawiki/extensions/Wikibase@master] Add Fingerprint uniqueness validation based on new term store

https://gerrit.wikimedia.org/r/556063

Change 556063 abandoned by Sarhan:
Add Fingerprint uniqueness validation based on new term store

Reason:
duplicate

https://gerrit.wikimedia.org/r/556063