Page MenuHomePhabricator

[M] Don't create a duplicate depicts statement upon label approval
Closed, ResolvedPublic

Description

Wikibase does not appear to prevent the addition of multiple identical statements to an entity. It will happily allow, e.g., the statement "depicts (P180) Douglas Adams (Q42)" to be applied to an image twice.

We should prevent the CAT tool from adding duplicate depicts statements upon label approval by checking existing statements and only adding a new statement if the one to be added doesn't already exist.

Implementation details: add new boolean param to wbsetclaim in wikibase that defaults to false, that will prevent a duplicate mainsnak being created if set to true

Acceptance criteria

  • If an image already has "depicts:X" and a user adds "depicts:X" via SuggestedTags then the second "depicts:X" will not be added
  • the user should still be able to add multiple "depicts:X" tags via the File page
  • nothing else breaks

COVID-19 Deployment Criteria

  • Can you roll back this change without lasting impact?
    1. A recovery plan is required as this will help identify our capacity for recovering from the failure
    2. THIS IS A KEY QUESTION, if you can’t answer it, you shouldn’t deploy
  • Is specialized knowledge required to support this change in production? If so, are there multiple people with this knowledge?
  • Is there a way to increase confidence about the correctness of this change?
    1. Reviews (Design, Code, etc)
    2. Testing coverage (unit tests, integration tests)
    3. Manual testing (e.g. Beta, vagrant, docker)

Event Timeline

Mholloway triaged this task as Medium priority.Oct 2 2019, 5:22 PM

Change 544027 had a related patch set uploaded (by MSantos; owner: MSantos):
[mediawiki/extensions/MachineVision@master] Skip duplicate depicts

https://gerrit.wikimedia.org/r/544027

Change 544027 merged by jenkins-bot:
[mediawiki/extensions/MachineVision@master] Skip duplicate depicts

https://gerrit.wikimedia.org/r/544027

Ramsey-WMF added a subscriber: Ramsey-WMF.

@Mholloway reopening this one because it seems there might have been a regression and CAT can again add statements that already exist. 😢

See the March 23rd edit here as an example: https://commons.wikimedia.org/w/index.php?title=File:Feu_d%27artifice_-_316.jpg&action=history

Yes, this can once again happen as a result of updating depicts setting to use the wbsetclaim API to resolve T241242.

Mholloway added a subscriber: MSantos.

Is there a potential workaround to prevent this specific problem when using 'wbsetclaim'?

Yes, this can once again happen as a result of updating depicts setting to use the wbsetclaim API to resolve T241242.

As things stand today, I think the frontend will have to request and examine the existing claims before conditionally sending the wbsetclaim requests.

A better solution would be for wbsetclaim to provide an affordance to set a claim conditionally on it not already existing. But that would require support from the Wikibase devs, for code review at the very least.

Ramsey-WMF added a subscriber: Cparle.

Pinging @Cparle so he can have some context in prep for tomorrow's estimation meeting ☑

CBogen renamed this task from Don't create a duplicate depicts statement upon label approval to [M] Don't create a duplicate depicts statement upon label approval.Mar 25 2020, 4:35 PM

A better solution would be for wbsetclaim to provide an affordance to set a claim conditionally on it not already existing. But that would require support from the Wikibase devs, for code review at the very least.

@Addshore thoughts on how difficult this would be to implement on the Wikibase side? Thanks! 🙂

A better solution would be for wbsetclaim to provide an affordance to set a claim conditionally on it not already existing. But that would require support from the Wikibase devs, for code review at the very least.

@Addshore thoughts on how difficult this would be to implement on the Wikibase side? Thanks! 🙂

Should be fairly easy, and also I think this will make sense for usecases outside of mediainfo / sdoc

In general this makes sense to me but the semantics need to be very clear. I fear the following scenario might cause different people to expect different things to happen: if a statement with value a, qualifier b, and reference c already exists and then someone tries to store value a again, does this count as a new statement or not? Strictly speaking in the WB sense no it does not. But I can see arguments coming up why it should.

@Addshore we talked about a potential for this solution at last week's WMDE-WMF meeting. Any further thoughts from that discussion?

@Addshore 's proposed solution was a new param in wbsetclaim that says "don't create a duplicate mainsnak".

Cparle updated the task description. (Show Details)
Cparle updated the task description. (Show Details)

@Addshore 's proposed solution was a new param in wbsetclaim that says "don't create a duplicate mainsnak".

So I think if this is only currently needed for wbsetclaim then a new parameter could make sense.
As we discussed in the call there might be the need for different flavours of uniqueness, so we should consider that when creating the new parameter ie not call it unique=1 but rather uniquecheck=mainsnak or something like this.

The logic would then be, if the API is called with this level of uniqueness check, that the revision is checked for a statement for the property that is being used for creation, and when looking at all statements if the mainsnak is the same then bail out in some defined way (some error & failure saying that the uniqueness check failed)
We might be able to get some inspiration for the wording of this error message from other wikibase uniqueness checks such as for sitelinks and labels.

The "easy" way to implement this would be to write the code and check into SetClaims itself.
The "right" way to implement this would be to write this code anc check into the ChangeOp (so it could then be reused in other code paths if needed)

Within the SetClaim API module this is where most of the magic happens. Getting a change op and then applying it, (which also calls the validate step)

		$changeop = $this->statementChangeOpFactory->newSetStatementOp( $statement, $index );
		$this->modificationHelper->applyChangeOp( $changeop, $entity, $summary );

The ChangeOp in question is a ChangeOpStatement and this would need to take the uniqueness requirement as a constructor parameter.
So the first line of the code above would pass that in in some form.
The second line of the code above calls the validate method, which we will have to add the extra validation to (this happens before the change is applied).
And this extra validation would make sure that the "mainsnak" of whatever condition that the changeop has been constructed with is okay and the changeop can continue.

Sorry it apparently took me weeks to write this comment since it came up in the call!
Any questions you know where to poke me @Cparle !

Change 595508 had a related patch set uploaded (by Cparle; owner: Cparle):
[mediawiki/extensions/Wikibase@master] Introduce ignoreDuplicateMainSnak param to wbsetclaim

https://gerrit.wikimedia.org/r/595508

Change 595510 had a related patch set uploaded (by Cparle; owner: Cparle):
[mediawiki/extensions/MachineVision@master] Prevent duplicate statements

https://gerrit.wikimedia.org/r/595510

Change 595508 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@master] Introduce ignoreDuplicateMainSnak param to wbsetclaim

https://gerrit.wikimedia.org/r/595508

Change 595510 merged by jenkins-bot:
[mediawiki/extensions/MachineVision@master] Prevent duplicate statements

https://gerrit.wikimedia.org/r/595510

I'll check this on production when I get the chance.

Works on production.