record reviewer decision and associated metadata for a specific mismatch
As a mismatch store admin
I want to be able to receive and record review decisions and their associated metadata
in order to not serve reviewed mismatches again and to enable analysis of the review work being done

We need to record the decisions reviewers make for the individual mismatches. For a mismatch we want to record each decision as well as some metadata.

What we want to record:

  • which decision was taken, which will be one of
    • Wikidata was wrong
    • the other side was wrong
    • the mismatch is intentionally kept
    • neither was wrong/there was some error in the mismatch
  • who took the decision
  • when was the decision taken


For consistency with Help:Ranking, in addition to the outcome "wrong", shouldn't there be "preferred" as well?

I am not sure how ranks would play into it this way tbh.

In addition to:

  • "Wikidata was wrong"
  • "the other side was wrong"

it could be:

  • Wikidata's statement is preferred, statement in mismatch file should have normal rank
  • Wikidata's statement should have normal rank, the other side's is preferred

Maybe this is covered by (or this means both should have normal rank):

  • "the mismatch is intentionally kept"

Also maybe there should be an explicit way to record error in keys, the following isn't that clear:

  • "neither was wrong/there was some error in the mismatch"

Maybe the following 9 choices can summarize what is usually found at Wikidata:

BothMismatch fileWikidataSample: dobSample: pob
[ ] correct - other wrong/deprecated[ ] correct - other wrong/deprecated2012, 2009Eimsbüttel, Berlin
[ ] preferred - other normal[ ] preferred - other normal2012-10-31, 2012Eimsbüttel, Hamburg
[ ] both normal rank2012-10-31, 2012-10-30Germany, West-Germany
[ ] key mismatch[ ] key not applicable [ ] key not applicable2012, 1920 Eimsbüttel, Stockholm
[ ] conflation[ ] conflation
[ ] other problem

Samples dob and pob are partially based on Q2013#P571 and Q567#P19

"Key mismatch" means the key used to join Wikidata and the other file is on the wrong item. Possibly this could be in Wikidata or in the Mismatch file.

Ahhh ok. Thanks! That makes sense.
So I think initially we'd want mismatch providers to consider a statement as matching regardless of it's rank if it's the same in both places. That'd have the benefit of being able to concentrate on the mismatches that are more severe. I'd then concentrate on the cases where ranks play a role later once we've got a better understanding how people use it etc.

I'm not sure if I could add the samples I gave above to the options #3 (the mismatch is intentionally kept) and #4 (neither was wrong/there was some error in the mismatch) currently in the task description with certainty. Were these options discussed or described somewhere?

Maybe we should just record a qid that describes the decision. This would keep the system flexible.

In separate steps one could determine

  • how these qids should be available in the interface of selection (e.g. click somewhere or select from list gives this option)
  • what further actions should be taken (close mismatch, added several statements, change ranks on statements, etc.)

I asked for input at Project_chat#Fun_with_Mismatches:_typology.

Items can now be found with by query. The list on project chat tries to order them by importance (frequency?).

We could also add support in the API for external reviewers decisions - maybe a flag for a clue with higher trust. Lesson learned in Wikitree land is that many people care about WikiTree data but are less interested in correct on other platforms...

In WikiTree they scan 22 000 000 profiles weekly and create reports and on every profile they add suggestions example Q42 = WikiTree Adams-32825 has 23 suggestions on 282 related profiles.

  • Wikidata has about 20 errors in those reports 541 <-> 567

Example how the feedback in WikiTree is for a suggested father - "Suggestion 541 Wikidata - Clue for Father"

Another aspect is that we should also get the sources of external source and best would be to have a "quality ranking" and this should be machine readable see T222142: WikidataCon 2019: We need a better model communicating quality/relevance of sources in Wikidata / Provenance