Wikipedia is incomplete by design. The opportunity to share new information with the world is a major motivating factor among both new and established Wikipedia contributors. However, when important information about a topic is absent, incomplete, biased, or otherwise inaccessible to readers, these content gaps can undermine Wikipedia’s ability to serve the needs of its global audience. Although a great deal of research has been done to identify different types of gaps, and the characteristics of those gaps, there has not yet been an attempt to synthesize this body of work into actionable guidance for identifying, prioritizing, and measuring content gaps.
This literature review will follow the three-part classification of knowledge gap type outlined in the associated Wikimedia Research 2030 white paper: selection, extent, and framing gaps. The literature review will also:
- identify methods and metrics used to identify these kinds of gaps in previous research, and compare and contrast the benefits and limitations of these methods
- Identify potential causes of content gaps, and evaluate the evidence provided for these proposed causes in previous research
By organizing previous research according to thematic categories related to gap type, methods/metrics, and proposed causes, we will be able to provide the first draft of a taxonomy of content gaps on Wikipedia.
This literature review will focus on content gaps in information represented in text (e.g. Wikipedia articles, or other textual entities) or hypertext (e.g. links between articles, categories, and other text-based metadata). Multimedia gaps, and gaps specific to WikiData and other Wikimedia projects (e.g. Wiktionary) are beyond the scope of this literature review, and may be addressed in a separate study.
- Summarize findings from a body of relevant academic and industry research focused on content gaps related to the selection, extent, and framing of hypertextual Wikipedia content (e.g. text, links and citations, structured meta data, but not multimedia)
- Identify the empirical methods used in these various studies, and their advantages and limitations with respect to their general applicability for large-scale analysis of content gaps across different languages of Wikipedia and for different forms of hypertext-based information
- Identify the potential causes of content gaps described in these various studies, and the supporting evidence for each
- Develop a taxonomy of content gaps
- Provide recommendations for topic-, language-, and format-agnostic metrics and measurement techniques that can support the evaluation of both technological and programmatic interventions to close content gaps.
Hypotheses | Questions
- What are the selection, extent, and framing gaps that have been identified in previous literature?
- Which of the proposed causes for these gaps are best supported by currently available evidence?
- What are the characteristics of previous programmatic and technological interventions that have shown some success at addressing these content gaps?
- What metrics have been used to quantify extent or change over time in content gaps, and which of these metrics show most promise for general applicability—beyond a specific topic, language, or type of content?
Phase 1: Gathering a set of literature for review
- Perform a Google scholar search for terms related to content gaps
- Perform a search on the Research and Grants namespaces on Meta.Wik for terms related to content gaps
- Ask subject matter experts for recommendations of previous research related to content gaps
- Scan all articles and remove those that aren’t directly relevant
Phase 2: Review literature
- Analyze and summarize goals and methods of all research papers
- Analyze and summarize findings and recommendations of all research papers
- Supplement existing bibliography to review with previous research that is cited within these papers
Phase 3: Develop taxonomies and recommendations
- Organize goals and methods of papers into a taxonomy that includes themes (e.g. selection, extent, and framing) and sub-themes (e.g. topic-specific gaps, gaps related to contributor gaps)
- Identify and describe a set of likely causes of content gaps
- Develop a set of recommendations for general-purpose methodologies and metrics that may be effective at identifying new content gaps and/or tracking the impact of interventions aimed at addressing a range of different content gaps