Page MenuHomePhabricator

[M] Don't create a duplicate depicts statement upon label approval
Closed, ResolvedPublic

Description

Wikibase does not appear to prevent the addition of multiple identical statements to an entity. It will happily allow, e.g., the statement "depicts (P180) Douglas Adams (Q42)" to be applied to an image twice.

We should prevent the CAT tool from adding duplicate depicts statements upon label approval by checking existing statements and only adding a new statement if the one to be added doesn't already exist.

Implementation details: add new boolean param to wbsetclaim in wikibase that defaults to false, that will prevent a duplicate mainsnak being created if set to true

Acceptance criteria

  • If an image already has "depicts:X" and a user adds "depicts:X" via SuggestedTags then the second "depicts:X" will not be added
  • the user should still be able to add multiple "depicts:X" tags via the File page
  • nothing else breaks

COVID-19 Deployment Criteria

  • Can you roll back this change without lasting impact?
    1. A recovery plan is required as this will help identify our capacity for recovering from the failure
    2. THIS IS A KEY QUESTION, if you can’t answer it, you shouldn’t deploy
  • Is specialized knowledge required to support this change in production? If so, are there multiple people with this knowledge?
  • Is there a way to increase confidence about the correctness of this change?
    1. Reviews (Design, Code, etc)
    2. Testing coverage (unit tests, integration tests)
    3. Manual testing (e.g. Beta, vagrant, docker)

Event Timeline

Mholloway created this task.Oct 2 2019, 5:22 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptOct 2 2019, 5:22 PM
Mholloway triaged this task as Medium priority.Oct 2 2019, 5:22 PM
MSantos claimed this task.Oct 17 2019, 9:04 PM
MSantos moved this task from To Do to Doing on the Product-Infrastructure-Team-Backlog (Kanban) board.

Change 544027 had a related patch set uploaded (by MSantos; owner: MSantos):
[mediawiki/extensions/MachineVision@master] Skip duplicate depicts

https://gerrit.wikimedia.org/r/544027

Change 544027 merged by jenkins-bot:
[mediawiki/extensions/MachineVision@master] Skip duplicate depicts

https://gerrit.wikimedia.org/r/544027

MSantos moved this task from In development to Done on the MachineVision board.Oct 22 2019, 6:40 PM
Mholloway closed this task as Resolved.Nov 8 2019, 2:05 PM
Ramsey-WMF reopened this task as Open.Mar 24 2020, 6:30 PM
Ramsey-WMF added a subscriber: Ramsey-WMF.

@Mholloway reopening this one because it seems there might have been a regression and CAT can again add statements that already exist. 😢

See the March 23rd edit here as an example: https://commons.wikimedia.org/w/index.php?title=File:Feu_d%27artifice_-_316.jpg&action=history

Restricted Application added a project: Structured-Data-Backlog. · View Herald TranscriptMar 24 2020, 6:30 PM

Yes, this can once again happen as a result of updating depicts setting to use the wbsetclaim API to resolve T241242.

Mholloway removed MSantos as the assignee of this task.Mar 24 2020, 6:35 PM
Mholloway added a subscriber: MSantos.

Is there a potential workaround to prevent this specific problem when using 'wbsetclaim'?

Yes, this can once again happen as a result of updating depicts setting to use the wbsetclaim API to resolve T241242.

Mholloway added a comment.EditedMar 24 2020, 6:59 PM

As things stand today, I think the frontend will have to request and examine the existing claims before conditionally sending the wbsetclaim requests.

A better solution would be for wbsetclaim to provide an affordance to set a claim conditionally on it not already existing. But that would require support from the Wikibase devs, for code review at the very least.

Ramsey-WMF added a subscriber: Cparle.

Pinging @Cparle so he can have some context in prep for tomorrow's estimation meeting ☑

CBogen renamed this task from Don't create a duplicate depicts statement upon label approval to [M] Don't create a duplicate depicts statement upon label approval.Mar 25 2020, 4:35 PM

A better solution would be for wbsetclaim to provide an affordance to set a claim conditionally on it not already existing. But that would require support from the Wikibase devs, for code review at the very least.

@Addshore thoughts on how difficult this would be to implement on the Wikibase side? Thanks! 🙂

CBogen updated the task description. (Show Details)Mar 25 2020, 6:09 PM

A better solution would be for wbsetclaim to provide an affordance to set a claim conditionally on it not already existing. But that would require support from the Wikibase devs, for code review at the very least.

@Addshore thoughts on how difficult this would be to implement on the Wikibase side? Thanks! 🙂

Should be fairly easy, and also I think this will make sense for usecases outside of mediainfo / sdoc

In general this makes sense to me but the semantics need to be very clear. I fear the following scenario might cause different people to expect different things to happen: if a statement with value a, qualifier b, and reference c already exists and then someone tries to store value a again, does this count as a new statement or not? Strictly speaking in the WB sense no it does not. But I can see arguments coming up why it should.

@Addshore we talked about a potential for this solution at last week's WMDE-WMF meeting. Any further thoughts from that discussion?

@Addshore 's proposed solution was a new param in wbsetclaim that says "don't create a duplicate mainsnak".

Cparle claimed this task.Apr 27 2020, 4:56 PM
Cparle updated the task description. (Show Details)
Cparle updated the task description. (Show Details)

@Addshore 's proposed solution was a new param in wbsetclaim that says "don't create a duplicate mainsnak".

So I think if this is only currently needed for wbsetclaim then a new parameter could make sense.
As we discussed in the call there might be the need for different flavours of uniqueness, so we should consider that when creating the new parameter ie not call it unique=1 but rather uniquecheck=mainsnak or something like this.

The logic would then be, if the API is called with this level of uniqueness check, that the revision is checked for a statement for the property that is being used for creation, and when looking at all statements if the mainsnak is the same then bail out in some defined way (some error & failure saying that the uniqueness check failed)
We might be able to get some inspiration for the wording of this error message from other wikibase uniqueness checks such as for sitelinks and labels.

The "easy" way to implement this would be to write the code and check into SetClaims itself.
The "right" way to implement this would be to write this code anc check into the ChangeOp (so it could then be reused in other code paths if needed)

Within the SetClaim API module this is where most of the magic happens. Getting a change op and then applying it, (which also calls the validate step)

		$changeop = $this->statementChangeOpFactory->newSetStatementOp( $statement, $index );
		$this->modificationHelper->applyChangeOp( $changeop, $entity, $summary );

The ChangeOp in question is a ChangeOpStatement and this would need to take the uniqueness requirement as a constructor parameter.
So the first line of the code above would pass that in in some form.
The second line of the code above calls the validate method, which we will have to add the extra validation to (this happens before the change is applied).
And this extra validation would make sure that the "mainsnak" of whatever condition that the changeop has been constructed with is okay and the changeop can continue.

Sorry it apparently took me weeks to write this comment since it came up in the call!
Any questions you know where to poke me @Cparle !

Abbe98 added a subscriber: Abbe98.Apr 27 2020, 7:02 PM

Change 595508 had a related patch set uploaded (by Cparle; owner: Cparle):
[mediawiki/extensions/Wikibase@master] Introduce ignoreDuplicateMainSnak param to wbsetclaim

https://gerrit.wikimedia.org/r/595508

Change 595510 had a related patch set uploaded (by Cparle; owner: Cparle):
[mediawiki/extensions/MachineVision@master] Prevent duplicate statements

https://gerrit.wikimedia.org/r/595510

Restricted Application added a project: Wikidata. · View Herald TranscriptMay 12 2020, 5:47 PM

Change 595508 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@master] Introduce ignoreDuplicateMainSnak param to wbsetclaim

https://gerrit.wikimedia.org/r/595508

Cparle updated the task description. (Show Details)May 20 2020, 9:14 AM

Change 595510 merged by jenkins-bot:
[mediawiki/extensions/MachineVision@master] Prevent duplicate statements

https://gerrit.wikimedia.org/r/595510

I'll check this on production when I get the chance.

Ramsey-WMF closed this task as Resolved.Jun 3 2020, 8:52 PM

Works on production.