Page MenuHomePhabricator

[Bug] Term constraints not working
Closed, ResolvedPublic

Description

It is possible to add items having the same description and label on wikidata.org as you can see here:

https://www.wikidata.org/wiki/Q4115189

https://www.wikidata.org/wiki/Q3737270

Also it is possible to add pipes '|' to aliases.

Event Timeline

Jonas raised the priority of this task from to Needs Triage.
Jonas updated the task description. (Show Details)
Jonas added subscribers: Jonas, daniel, aude, hoo.
Jonas set Security to None.
Lydia_Pintscher moved this task from incoming to consider for next sprint on the Wikidata board.
Lydia_Pintscher subscribed.

Both items have their labels tracked correctly in the wb_terms table.

select * from wb_terms where term_entity_id in (3737270, 4115189) and term_language = 'en'

*sigh* not, again...

I strongly recommend we implement T74430: Re-implement uniqueness constraint in a consistent and efficient way instead of trying to fix this based on the terms table.

i suspect this is (unfortunately) a case where we run in to limitations of TermSqlIndex and maxConflicts it finds:

* @note: This implementation does not guarantee that all matches are returned.
* The maximum number of conflicts returned is controlled by $this->maxConflicts.

https://github.com/wikimedia/mediawiki-extensions-Wikibase/blob/master/lib/includes/store/sql/TermSqlIndex.php#L830-L831

i would like to add test cases for this and poke some more, though not sure how much we can 'fix' the current wb_terms and TermSqlIndex for this vs. what daniel suggests.

i can reproduce the issue and looking into this

problem is that (in some cases) we are validating an old version of an entity, with a specific change (label or description) patched into the entity.

the api call keeps passing the same baserevid even after subsequent edits in the term box. ModifyEntity uses baserevid when looking up the entity.

e.g. I change the en label to "FOO", click save (with e.g. baserevid = 1), it validates "FOO" (label) + "old description"

then change en description to "Wikipedia disambiguation page", click save (still baserevid = 1), it validates "old label" + "Wikipedia disambiguation page".

then have an edit conflict, but it is able to be resolved. no further validation is done at this point, at least for label + description uniqueness.

We should:

  1. fix the UI to pass the correct baserevid
  2. fix the backend so that validation is done when an entity gets patched due to edit conflict. (otherwise, there still is a way around the constraint if someone is directly using the api)

to fix in the backend, one possible solution might be to generate changeops for the edit conflict patch, and apply validation at this point.

https://github.com/wikimedia/mediawiki-extensions-Wikibase/blob/master/repo/includes/EditEntity.php#L440

(probably some of this code could also be factored out and be made nicer, but that's aside from the bug fix)

Change 260585 had a related patch set uploaded (by Aude):
Use common baserevid for label and description changes [WIP]

https://gerrit.wikimedia.org/r/260585

We discussed this during todays story time. Suspicions:

  1. With one API request (or two with no explicit base revision), both ChangeOps are first validated independently, and then applied. The second is validated without the first being applied.
  2. With two API requests, both ChangeOps are validated against the same base revision. The second edit does not "see" the first edit because it uses the same base revision as the first edit for comparison, but merges the edit into the latest revision.

Expected behavior when the UI does two API requests: Q1 does have label/description "A/A". Q2 does have "B/A". You want to change Q2 to "A/B". This means: First edit changes Q2 to "A/A". BOOM! Second edit changes Q2 to "A/B", which is fine.

Suspicion (1) seems to be unfounded:

  • api/EditEntity calls ModifyEntity::applyChangeOp on a ChangeOps object.
  • ModifyEntity::applyChangeOp calls ChangeOp(s)::validate
  • For each ChangeOp a ChangeOps instances contains, ChangeOps::validate will first call validate on that change op, then apply it.
  • So if there is a LabelChangeOp and a DescriptionChangeOp for the same language, in any order, the second op would check against the entity with the first op applied, detecting any conflict.
  • Note that validation of the first op could fail due to a false positive, because of a conflict that would be resolved by applying the second op.

This should work. A better way to do this would perhaps be to use a combined LabelAndDescriptionChangeOp. That would also remove the overhead of running the same constraint check twice.

Suspicion (2) seems to be the problem, especially since our own UI always hits this edge case when editing the label and description for a given language at the same time:

  • User 1 is viewing revision B of Q5 in their browser
  • User 1 enters a German label and description for Q5. These conflict with the German label and description of Q7, but we don't know that yet.
  • The UI sends two API requests, once for changing the label (E1), and one for changing the description (E2). Both edits give B as the base revision.
  • When E1 is processed, validation is performed against B, no conflict is found because Q5 doesn't have a German description in revision B. E1 is applied to B, resulting in B'.
  • When E2 is processed, validation is performed against B, no conflict is found because Q5 doesn't have a German label in revision B. E2 is applied to B and then patched into B'' via the conflict resolution mechanism, resulting in B''.
  • B'' now contains the combination of German label and description that conflicts with Q7.

Jan suggested that the simplest way to avoid this is to always use the current revision, not the edit's base revision, when validating. The above would then change as follows:

  • When E2 is processed, validation is performed against B', and a conflict is found with Q7, because B'+E2 contains the conflicting combination of label and description.

@daniel using the latest revision for validating sounds ok.

also, per T121395#1898302 (and really wonder how I was unclear there?), if there is an edit conflict, then the patch / result of resolving the conflict should be validated (perhaps by creating change op for it)

and it would be nice if saving multiple labels + descriptions + aliases could be one api request, with one base revision. Right now, saving multiple terms *always* produces an edit conflict and a warning saying such in the browser console.

@aude: if we check the changeop against the latest revision for validation, there should be no need to validate the result of the merge. Because of the way ChangeOps (the plural version) does validation, we then already do validate against what we would get after patching, assuming the patch applies cleanly. But really, we should get rid of the patching logic altogether, see T126231: [RFC] Detect edit conflicts in ChangeOp instead of using diff-and-patch in EditEntity.

@daniel hmm... Using the latest base revision would definitely help, though wonder if relying on that, we still might have a race condition.

@daniel hmm... Using the latest base revision would definitely help, though wonder if relying on that, we still might have a race condition.

For two concurrent API requests, there is a race condition even if you do the check after the patch/merge. The only way to avoid this is to supply the expected current revision (the one we used for validation) when doing the actual saving. WikiPage has a mechanism for erroring out if the current revision was changed by a concurrent request. I'm afraid getting this right will require us to rewrite some of the logic in EditPage.

Change 269461 had a related patch set uploaded (by Daniel Kinzler):
Make ChangeOp::validate run against the current revision.

https://gerrit.wikimedia.org/r/269461

Change 269461 merged by jenkins-bot:
Make ChangeOp::validate run against the current revision.

https://gerrit.wikimedia.org/r/269461

Change 269718 had a related patch set uploaded (by Daniel Kinzler):
Make SpecialSetLabelDescriptionAliases use ChangeOps.

https://gerrit.wikimedia.org/r/269718

daniel moved this task from Doing to Review on the Wikidata-Sprint-2016-02-02 board.

Turns out SpecialSetLabelDescriptionAliases also needed fixing

Change 260585 abandoned by Aude:
Use common baserevid for label and description changes [WIP]

Reason:
we probably want to ultimately solve this by having one api request (wbeditentity), but then the edit summaries might not be as fine-grained as they are now.

https://gerrit.wikimedia.org/r/260585

Change 269718 merged by jenkins-bot:
Make SpecialSetLabelDescriptionAliases use ChangeOps.

https://gerrit.wikimedia.org/r/269718

daniel claimed this task.
daniel moved this task from Review to Done on the Wikidata-Sprint-2016-02-02 board.

both patches are merged now.

Change 270777 had a related patch set uploaded (by Thiemo Mättig (WMDE)):
Simplify SpecialSetLabelDescriptionAliases after ChangeOps fix

https://gerrit.wikimedia.org/r/270777

Change 270777 merged by jenkins-bot:
Simplify SpecialSetLabelDescriptionAliases after ChangeOps fix

https://gerrit.wikimedia.org/r/270777