Page MenuHomePhabricator

[SPIKE] Does site visit frequency serve as a meaningful distinction between various groups of readers?
Closed, ResolvedPublic

Assigned To
Authored By
ppelberg
Dec 7 2022, 7:49 PM
Referenced Files
F36782505: Regularity-vs-Familiarity-Intensity.png
Feb 7 2023, 10:56 AM
F36782503: intensity-vs-familiarity.png
Feb 7 2023, 10:56 AM
F36782486: familiarity.png
Feb 7 2023, 10:56 AM
F36760464: regularity-vs-intensity.png
Feb 6 2023, 4:07 PM
F36525561: familiarity.png
Jan 27 2023, 4:45 PM
F36525558: regularity.png
Jan 27 2023, 4:45 PM
F36525556: intensity.png
Jan 27 2023, 4:45 PM
Tokens
"Barnstar" token, awarded by ppelberg.

Description

This task involves the work of doing an initial analysis to learn whether the frequency with which people visit Wikipedia serves as a meaningful distinction between various groups of readers.

Research Questions

  • 1. Of the people who visit a single Wikipedia page within a given day, what percentage of them:
    • Did not visit Wikipedia before
    • Visited Wikipedia within in the last month
    • Visited Wikipedia within in the last week
    • Visited Wikipedia yesterday
  • 2. Of the people who visited multiple Wikipedia pages within a given day, what percentage of them:
    • Did not visit Wikipedia before
    • Visited Wikipedia within in the last month
    • Visited Wikipedia within in the last week
    • Visited Wikipedia yesterday
  • 3. How – if at all – are the distributions questions "1." + "2." will have produced impacted when we limit visits that include clicks visits to any non-main namespace page/content?

Decisions To Be Made

Knowing the answer to the Research Questions above will help the Core Experiences Product Group form a more accurate "mental model" for the quantitative characteristics that define and distinguish the various groups of Wikipedia readers we hypothesize there to be.

Assumption(s) to Be Investigated

The "Research Question" described above serves as an effort to understand the extent to which the following assumption holds true: The frequency with which people visit Wikipedia is an effective way/dimension to identify various groups of readers.

Background

This investigation emerged from within the Journey Transitions Design Research Project which seeks to:

  1. Elucidate the key experiences that can cause people to deepen and expand their use of, and contributions to, Wikipedia
  2. Identify the phases people travel through along their journeys to deepening and expanding their use of, and contributions to, Wikipedia

Done

  • Answers to all "Research Question(s)" are documented

Event Timeline

ppelberg renamed this task from [SPIKE] Do we to [SPIKE] Does site visit frequency serve as a meaningful distinction between various groups of readers?.Dec 7 2022, 7:49 PM
ppelberg created this task.

META
I've updated the groupings in the task description's Research Questions section based on the feedback @MGerlach shared offline.

Weekly update:

  • generated first dataset of readers of English Wikipedia on a single day capturing for each actor: i) the number of pageloads to articles in the main namespace; ii) the number of days since the last access; iii) how many other namespaces (beyond main namespace) were visited.
  • will start exploratory analysis of the data in the next week(s)

weekly update:

  • started first exploratory analysis of a 1% sample of reading sessions from a single day (~438k sessions)
  • percentage of reading sessions with single or multiple pageviews:

intensity.png (348×454 px, 17 KB)

  • percentage of reading sessions with last-access: 1 day, 1 week (but more than 1 day), 1 month (but more than 1 week), non-returning:

regularity.png (341×475 px, 26 KB)

  • percentage of reading sessions only accessing main namespace or also accessing pages in any other namespace:

familiarity.png (340×458 px, 21 KB)

  • next step: look at combination of different facets

answering questions 1./2.:

  • we separate all readers into two subgroups i) people who visit a single Wikipedia page and ii) people who visit multiple Wikipedia pages on a single day.
  • for each group separately, we calculate what percentage:
    • Did not visit Wikipedia before (non-recurring)
    • Visited Wikipedia within in the last month (7<d<=31)
    • Visited Wikipedia within in the last week (1<d<=7)
    • Visited Wikipedia yesterday (d=1)

regularity-vs-intensity.png (352×483 px, 30 KB)

Summary: For readers who visit multiple Wikipedia pages we find that a substantially larger fraction visited the previous day (40% vs 25% in comparison to readers who visit only a single Wikipedia article on a given day)

answering questions 3. (how do above distributions change when considering the reader visited articles beyond the main namespace).

  • First, the fraction of readers visiting articles beyond the main namespace ("ns>0") is much smaller than those visiting only articles in the main namespace ("ns=0"): It is roughly around 1%

familiarity.png (338×461 px, 19 KB)

  • Second, the small group of readers who visit articles beyond the main namespace ("ns>0"), is much more likely have visited multiple articles in the main namespace as well. In fact, the ratio between single and multiple pageviews is inverted: i) for the majority of readers (those that visit only the main namespace "ns=0") around 2/3 visit a single article and 1/3 visit multiple articles; ii) in contrast, for the non-main namespace readers, around 1/3 visit a single article and 2/3 visit multiple articles (in the main namespace).

intensity-vs-familiarity.png (364×482 px, 20 KB)

  • Third, we find that the regularity of the visit changes only slightly when considering readers who visit only the main namespace ("ns=0") or also visit other namespaces ("ns>0"). Two main observations: i) the readers who visit other namespaces and visit multiple pages ("ns>0 / multi") have the highest fraction of readers whose previous visit was on the previous day ("d=1"); around 45%. ii) However, this is only slightly higher compared to readers who only visit the main namespace but visited multiple articles ("ns=0 / multi"); thus, the main distinguishing feature is between readers who visit only a single article ("ns=0 / single" and "ns>0 / single") and those who visit multiple articles in the main namespace ("ns=0 / multi" and "ns>0 / multi")

Regularity-vs-Familiarity-Intensity.png (360×485 px, 30 KB)

Per what @MGerlach and I talked about offline, we consider work on this task to be done.

We'll explore any follow-up questions in yet-to-be filed task(s).