[AKG] A taxonomy of content gaps in Wikipedia and their causes
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	• Capt_Swing
	Oct 15 2019, 5:43 PM

Description

Research page: https://meta.wikimedia.org/wiki/Research:Content_gaps_on_Wikipedia

Problem statement
Wikipedia is incomplete by design. The opportunity to share new information with the world is a major motivating factor among both new and established Wikipedia contributors. However, when important information about a topic is absent, incomplete, biased, or otherwise inaccessible to readers, these content gaps can undermine Wikipedia’s ability to serve the needs of its global audience. Although a great deal of research has been done to identify different types of gaps, and the characteristics of those gaps, there has not yet been an attempt to synthesize this body of work into actionable guidance for identifying, prioritizing, and measuring content gaps.

This literature review will follow the three-part classification of knowledge gap type outlined in the associated Wikimedia Research 2030 white paper: selection, extent, and framing gaps. The literature review will also:

identify methods and metrics used to identify these kinds of gaps in previous research, and compare and contrast the benefits and limitations of these methods
Identify potential causes of content gaps, and evaluate the evidence provided for these proposed causes in previous research

By organizing previous research according to thematic categories related to gap type, methods/metrics, and proposed causes, we will be able to provide the first draft of a taxonomy of content gaps on Wikipedia.

This literature review will focus on content gaps in information represented in text (e.g. Wikipedia articles, or other textual entities) or hypertext (e.g. links between articles, categories, and other text-based metadata). Multimedia gaps, and gaps specific to WikiData and other Wikimedia projects (e.g. Wiktionary) are beyond the scope of this literature review, and may be addressed in a separate study.

Research goals

Summarize findings from a body of relevant academic and industry research focused on content gaps related to the selection, extent, and framing of hypertextual Wikipedia content (e.g. text, links and citations, structured meta data, but not multimedia)
Identify the empirical methods used in these various studies, and their advantages and limitations with respect to their general applicability for large-scale analysis of content gaps across different languages of Wikipedia and for different forms of hypertext-based information
Identify the potential causes of content gaps described in these various studies, and the supporting evidence for each
Develop a taxonomy of content gaps
Provide recommendations for topic-, language-, and format-agnostic metrics and measurement techniques that can support the evaluation of both technological and programmatic interventions to close content gaps.

Hypotheses | Questions

What are the selection, extent, and framing gaps that have been identified in previous literature?
Which of the proposed causes for these gaps are best supported by currently available evidence?
What are the characteristics of previous programmatic and technological interventions that have shown some success at addressing these content gaps?
What metrics have been used to quantify extent or change over time in content gaps, and which of these metrics show most promise for general applicability—beyond a specific topic, language, or type of content?

Approach
Phase 1: Gathering a set of literature for review

Perform a Google scholar search for terms related to content gaps
Perform a search on the Research and Grants namespaces on Meta.Wik for terms related to content gaps
Ask subject matter experts for recommendations of previous research related to content gaps
Scan all articles and remove those that aren’t directly relevant

Phase 2: Review literature

Analyze and summarize goals and methods of all research papers
Analyze and summarize findings and recommendations of all research papers
Supplement existing bibliography to review with previous research that is cited within these papers

Phase 3: Develop taxonomies and recommendations

Organize goals and methods of papers into a taxonomy that includes themes (e.g. selection, extent, and framing) and sub-themes (e.g. topic-specific gaps, gaps related to contributor gaps)
Identify and describe a set of likely causes of content gaps
Develop a set of recommendations for general-purpose methodologies and metrics that may be effective at identifying new content gaps and/or tracking the impact of interventions aimed at addressing a range of different content gaps

Related Objects
Search...

		Status	Subtype	Assigned	Task
		Resolved		leila	T242172 Taxonomy of Knowledge Gaps
		Resolved		• Capt_Swing	T235544 [AKG] A taxonomy of content gaps in Wikipedia and their causes

Event Timeline

• Capt_Swing created this task.Oct 15 2019, 5:43 PM

• Capt_Swing renamed this task from [AKG] Literature review of identified content gaps in Wikipedia to [AKG] A taxonomy of content gaps in Wikipedia and their causes.Oct 15 2019, 5:49 PM

• Capt_Swing triaged this task as High priority.

leila added a project: DONOTUSE-address-knowledge-gaps.Oct 15 2019, 10:24 PM

Aklapper removed a project: Research.Nov 7 2019, 3:44 PM

leila edited projects, added Research; removed DONOTUSE-address-knowledge-gaps.Nov 7 2019, 10:12 PM

• Capt_Swing moved this task from Backlog to In Progress on the Research board.Nov 25 2019, 11:56 PM

• Capt_Swing updated the task description. (Show Details)Nov 27 2019, 9:32 PM

Isaac added a parent task: T242172: Taxonomy of Knowledge Gaps.Jan 7 2020, 10:48 PM

MGerlach mentioned this in T242595: Develop metrics for quantifying knowledge gaps.Jan 13 2020, 12:25 PM

This project is complete, or at least this phase of it. Any additional work will be linked from the project page: https://meta.wikimedia.org/wiki/Research:Content_gaps_on_Wikipedia

• Capt_Swing closed this task as Resolved.Jan 13 2020, 4:48 PM

[AKG] A taxonomy of content gaps in Wikipedia and their causesClosed, ResolvedPublicActions

Description

Related ObjectsSearch...

Event Timeline

[AKG] A taxonomy of content gaps in Wikipedia and their causes
Closed, ResolvedPublic
Actions

Related Objects
Search...