Page MenuHomePhabricator

Concept for feeding back data quality issues to data providers and getting feedback from data consumers
Open, MediumPublic

Description

Wikidata has a large amount of data. Its quality is important. In order to keep it that way we need to build two feedback loops:

  • data from Wikidata is used somewhere and users there find issues. They should be able to report errors to us easily.
  • data from somewhere else is imported into Wikidata or compared with data in Wikidata and we find issues in the source data. We need ways to feed back these errors to their source.

Research questions:

  • What does a good workflow look like?
  • What do Wikidata editors and data consumers want?
  • How does this all play together with existing tools like the Wikidata Quality extension's check against 3rd party databases and the Primary Sources Tool?

Event Timeline

Lydia_Pintscher raised the priority of this task from to Medium.
Lydia_Pintscher updated the task description. (Show Details)
Lydia_Pintscher added a project: Wikidata.
Lydia_Pintscher added a subscriber: Lydia_Pintscher.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptAug 7 2015, 7:58 AM
Lydia_Pintscher moved this task from incoming to ready to go on the Wikidata board.Aug 7 2015, 4:27 PM
Jonas renamed this task from create concept for feeding back data quality issues to data providers and getting feedback from data consumers to [Task] create concept for feeding back data quality issues to data providers and getting feedback from data consumers.Aug 13 2015, 4:06 PM
Jonas set Security to None.
TomT0m added a subscriber: TomT0m.Jun 11 2016, 1:50 PM
Rical added a subscriber: Rical.Aug 12 2016, 1:27 PM
Sumit added a subscriber: Sumit.Aug 18 2016, 9:11 PM

This seems to be closely related to or complementary to T127470.

I start to do the research and concept work for this

  • You could collect, and count to estimate synthesises, some diffs types:
  • Diffs of items between pages corresponding in inter-languages wikis in a same wikiproject and not corresponding in wikibase
  • Diffs of properties used in pages corresponding in inter-languages wikis and not corresponding in wikibase
  • Diffs of properties used in pages corresponding in wikibase wikis and not corresponding in inter-languages
  • Diffs of properties values in pages corresponding in wikibase wikis and in inter-languages
  • Use existence of diffs to display a short alert for any user who change any query of data from wikibase.
  • For users who want try to correct, display detailed synthesis of diffs and their types.
  • If the number of diff is small ( n<4 for simple diffs? or n<2 for complex diffs? ) display detailed diffs themselves.

Use existence of diffs to give at scribunto modules:

  • Detailed diffs and synthesises at all levels for users.
  • In transtable i18n keys to any languages.

I have finished the concept work and handed it over to @Lydia_Pintscher

Rical added a comment.Nov 22 2016, 8:07 AM

Where to find documents and articles related this task?

Glorian_Yapinus added a comment.EditedDec 6 2016, 9:25 AM

@Rical, I have created a presentation for explaining the concept work. You can find it in the attachment. Do let me know if you have any questions!

Rical removed a subscriber: Rical.Mar 20 2018, 11:00 AM
Lydia_Pintscher removed Glorian_Yapinus as the assignee of this task.Apr 7 2018, 11:20 AM

This post is out of scope, but I search the right place for this goal:
I'm assigned to the task T141177: "Wikipedia main content losts sources because too reverts, try to preserve them".
Then I need to know if the estimation of Wikipedia main content sources are already take in account in ORES. Thanks in advance for your attention.

Rical added a subscriber: Rical.Apr 29 2018, 12:44 PM
Lydia_Pintscher renamed this task from [Task] create concept for feeding back data quality issues to data providers and getting feedback from data consumers to Concept for feeding back data quality issues to data providers and getting feedback from data consumers.Jul 10 2019, 6:11 PM