Problem Statement
In contrast to Wikipedia’s editor population, little is known about its readers; in large parts due to the challenges and restrictions when dealing with privacy-sensitive data. Only recently have we started to characterize wikipedia’s readership. For example, recent studies (Singer, Lemmerich, et al. 2017; Lemmerich et al. 2019) approached the question why we read wikipedia in order to identify the motivation, information need, and prior knowledge of different users. Here, we investigate whether and to what degree this is reflected in how we use wikipedia. That is, instead of looking at page views as isolated events, we consider user’s full reading session in order to characterize patterns of navigation within and across wikimedia projects, and as a result, better understand the context of usage.
Goals
Empirical characterization of navigation paths of users on Wikipedia.
- Quantify difference across Wikipedia editions, geographical location, mobile/user access, topical content, etc.
- Identification of navigation patterns related to the motivation (work, learning, etc.), information need (overview, fact, etc.), and the prior experience (familiar, unfamiliar) of the user.
Approach
- Collect sample data for navigation paths/trees from webrequest logs for different wikimedia-projects; define consistent methodology for pre-processing and filtering .
- Exploratory analysis of navigation paths
- empirical characterization of paths and quantifying their differences across projects, geography, access-method, etc.
- supervised/unsupervised clustering to identify and quantify prevalence of different types of navigation
- compile list of use-cases for applicability across departments (e.g. product) and coordinate possible efforts.
Based on the research brief
A summary of the results in the first phase of exploratory analysis can be found on meta