Page MenuHomePhabricator

Finding biases and content gaps in Wikidata
Open, HighPublic


Wikidata is (and never will be) complete. However we should be aware of gaps and biases in our data to make informed decisions about where to put more effort.

The task is to find ways to find and surface biases and gaps in the data in Wikidata.

Some interesting questions:

  • What are underrepresented topics? Languages?
  • Are there biases in the kinds of statements we make about topics?

Event Timeline

Lydia_Pintscher added a project: Wikidata.
Lydia_Pintscher removed a subscriber: hoo.
Lydia_Pintscher moved this task from incoming to ready to go on the Wikidata board.

Shouldn't his be split into biases and content gaps? The first is hard to measure (involves the community of editors personally - e.g. who they are groupwise, such as gender, nationality, age, education etc) and the second is a bit easier (coverage of geo-locations, topics per expert ontology, neutral POV for breaking news, etc.)

@Jane023: Sorry maybe the wording is a bit confusing. It is about biases in the content which lead to gaps. Maybe it is not the best choice of word.

Yes I would drop the word "bias" then. It is about identifying and
measuring gaps, no? There should be a link somewhere to the bias side of
things (not sure where that is - diversity?)

Lydia_Pintscher renamed this task from Finding biases and content gaps to Finding biases and content gaps in Wikidata.Apr 7 2018, 11:30 AM

Are we talking about the completeness of the entities within an arbitrary group or type of items, or potentially missing entities altogether?

Note that biases, or gaps, can only be quantified when compared to some given models that we consider perfect. As a first step, we should define what 'perfect' means to us and make sure this condition is achievable. This definition would be subjective and possibly controversial, so I think it would need broad discussion.