Page MenuHomePhabricator

allow reporting of mismatches with empty Wikidata value
Open, HighPublic8 Estimated Story Points

Description

As a mismatch provider I want to be able to report missing data in order to improve the completeness of Wikidata.

Problem:
We currently don't allow reporting of mismatches where the value on Wikidata is empty. We should. We assume this means we will accept empty Wikidata values in the upload CSV. This also means we will not get a statement guid in the upload for the non-existent statement. We will likely have to adapt the upload CSV format to require the Item ID as well to be able to tell which Item the missing statement belongs to.

Example:

item_id,statement_guid,property_id,wikidata_value,external_value,external_url
Q42,,P3373,,Shoshanna Adams,example.com

Acceptance criteria:

  • mismatches with empty Wikidata values and empty statement guid are accepted from the upload CSV
  • mismatches with empty Wikidata values shown in the Mismatch Finder website as "none" for the Wikidata value
  • the upload CSV should now accept an Item ID for all mismatches
  • documentation for mismatch providers is updated
  • mismatch reviewers can report that data is missing in Wikidata by selecting the option "Missing data on Wikidata" in the Status column
  • the upload (decision) statistics have been adjusted to take the new "Missing data on Wikidata" option into account

Notes:

  • Mismatch providers will have to adapt their CSV once this is merged. Old formats will be rejected as they are no longer considered valid.
  • The item_id must be provided for all mismatches, including ones that have a statement_guid.
  • If the statement_guid is nonempty, the item ID in it must match the item_id.

Event Timeline

Lydia_Pintscher moved this task from Backlog to Needs work on the Mismatch Finder board.

@Sarai-WMDE will look into improving the wording in the status column to make sure it is not confusing for an empty value on the Wikidata side:

image.png (425×336 px, 24 KB)

@Sarai-WMDE will look into improving the wording in the status column to make sure it is not confusing for an empty value on the Wikidata side:

During yesterday's daily, we discussed that the fastest way to adjust the product to this new use case could be to simply update the copy of the second option in the Status select component: from "Wrong data on Wikidata" to "Wrong or missing data on Wikidata". Nevertheless, doing that would imply combining separate statements with different implications, which would make the review too ambiguous. For that reason, I'd encourage us to provide a separate option to the dropdown menu: "Missing data on Wikidata". The option should be placed right after "Wrong data on Wikidata".

If this makes sense, we should then add an acceptance criterion to this ticket, saying something in the lines of: Users can report that data is missing in Wikidata by selecting the option "Missing data on Wikidata" in the Status column.

Task breakdown notes:

  • subtask 1 - add new column Item_id, which has to match the statement guid. both are required (for now. once we've accomplished this, this can be merged together with T321165). T323203
  • subtask 2 - Add a new review status to represent data is missing in Wikidata by selecting the option "Missing data on Wikidata" in the Status column. T323204
  • subtask 3 - allow statement guid to be empty (optional) and show it as "None" in the UI T323206
  • subtask 4 - update documentation to include new column and info for mismatch providers T323354
  • subtask 5 - Adjust the upload statistics to take the new "Missing data on Wikidata" option into account (TBC by @Michael )

Open Question:
@Lydia_Pintscher - which statistics are to be adjusted? in what way should they reflect missing data?

Open Question:
@Lydia_Pintscher - which statistics are to be adjusted? in what way should they reflect missing data?

The CSV available at https://mismatch-finder.toolforge.org/store/imports ("Download statistics" button at the top right).