Page MenuHomePhabricator

[Bug] False positive error in detecting same label-description in Wikidata
Closed, ResolvedPublic

Description

When I try to replace the description by only "film" on the item Q47494730 I've got the error:

Could not save due to an error.
Item Q489675 already has label "Charlie Says" associated with language code en, using the same description text.

This is the usual error to avoid two item to have the exact same pair label-description *except* that here, the description are different and the error shouldn't occurs. Maybe the system has been changed to not only detect exact same pair (in that case, the error message should say "similar" and not "same") or maybe there is an other bug. Could someone please look into it.

Event Timeline

VIGNERON created this task.May 15 2018, 3:15 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMay 15 2018, 3:15 PM
Lea_Lacroix_WMDE renamed this task from False positive error in dtecting same label-description in Wikidata to False positive error in detecting same label-description in Wikidata.May 15 2018, 3:24 PM
Lea_Lacroix_WMDE updated the task description. (Show Details)
Lea_Lacroix_WMDE renamed this task from False positive error in detecting same label-description in Wikidata to [Bug} False positive error in detecting same label-description in Wikidata.
Lea_Lacroix_WMDE added a subscriber: Lydia_Pintscher.

@Ladsgroup could you have a look? The description is very different so shouldn't be detected as a duplicate. Maybe there is a broken check since the recent work in that area?

We actually made it more strict so this should not happen, I will check this ASAP

It looks like this happens when the target item has the same description in a different language (Q489675 is described as just “film” in Albanian) – I was able to reproduce this on testwikidata with Q160805 and Q160806. Is this actually a change in behavior? I don’t know what the previous code did.

Nikki added a subscriber: Nikki.Jun 29 2018, 1:34 PM

I think this is the same as T171708, if so, it doesn't seem like it broke due to any recent changes.

Vvjjkkii renamed this task from [Bug} False positive error in detecting same label-description in Wikidata to 1wcaaaaaaa.Jul 1 2018, 1:09 AM
Vvjjkkii triaged this task as High priority.
Vvjjkkii updated the task description. (Show Details)
Vvjjkkii removed a subscriber: Aklapper.
matej_suchanek renamed this task from 1wcaaaaaaa to [Bug] False positive error in detecting same label-description in Wikidata.Jul 2 2018, 7:40 AM
matej_suchanek raised the priority of this task from High to Needs Triage.
matej_suchanek updated the task description. (Show Details)
CommunityTechBot renamed this task from [Bug] False positive error in detecting same label-description in Wikidata to [Bug} False positive error in detecting same label-description in Wikidata.Jul 5 2018, 6:42 PM
Lucas_Werkmeister_WMDE renamed this task from [Bug} False positive error in detecting same label-description in Wikidata to [Bug] False positive error in detecting same label-description in Wikidata.Jul 9 2018, 9:55 AM
matej_suchanek added subscribers: daniel, matej_suchanek.

It looks like this happens when the target item has the same description in a different language (Q489675 is described as just “film” in Albanian) – I was able to reproduce this on testwikidata with Q160805 and Q160806. Is this actually a change in behavior? I don’t know what the previous code did.

This is certainly the cause. @daniel, is it possible that it was dropped intentionally in rEWBAa872ae062070c0d415cafd62dc08269319498fae?

Change 447384 had a related patch set uploaded (by Matěj Suchánek; owner: Matěj Suchánek):
[mediawiki/extensions/Wikibase@master] Select description in appropriate language during label-description conflict check

https://gerrit.wikimedia.org/r/447384

daniel added a comment.EditedJul 23 2018, 9:24 AM

This is certainly the cause. @daniel, is it possible that it was dropped intentionally in rEWBAa872ae062070c0d415cafd62dc08269319498fae?

Not intentionally, no. Looks like a bug. As far as I can tell from briefly looking at the code, the problem is in TermSqlIndex::getLabelWithDescriptionConflicts:

$matchConditions = [
    'L.term_language' => $lang,
];

should instead be

$matchConditions = [
    'L.term_language' => $lang,
    'D.term_language' => $lang,
];

Note that this will very likely change the query plan and may change query performance significantly. In theory, it should make performance better, but this is definitely something to keep an eye on.

Side note: Unit tests for this method are skipped for MySQL, because MySQL doesn't support self-joins on temporary tables. That may be the reason this bug was not spotted, and should be taken into account when doing regression testing. I hear the self-join problem is fixed in MariaDB, so perhaps you can find a way to run the test in CI.

Change 447384 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@master] Select description in appropriate language during label-description conflict check

https://gerrit.wikimedia.org/r/447384

Addshore triaged this task as Normal priority.
Addshore moved this task from incoming to in progress on the Wikidata board.