Page MenuHomePhabricator

Quantitative Survey Analysis for Reader Foundational Research
Closed, ResolvedPublic

Description

This task outlines the analysis plan for the Stage II, Stage III, and Stage IV quantitative surveys as described in T398379.

Current use case framework (developed based on Stage I open-end surveys) can be found here

Stage II: fielding dates: Oct 2 - Oct 6

Analysis tasks (as part of T393802):

  • Use case prevalence (descriptive statistics)
  • Use case x demographics (descriptive statistics, regression analysis if appropriate to communicate non-causal relationships)
  • Use case x content interaction (descriptive statistics)
    • Use case x session length
    • Use case x topics read

Analysis tasks (optional and *not included* in T393802)

  • Additional content interaction metrics
    • Reading level of pages read
    • Network analysis of pages read
Stage III / Stage IV: fielding dates: TBD, must field after introduction of temp accounts on enwiki

Analysis tasks (as part of T393802):

  • Comparative use case prevalence for Wikipedia between non-readers, less-frequent readers, frequent readers; self-reported vs. verified readers
  • Comparative demographic analysis (as above)
  • Comparative analysis of (comparisons as above):
    • Trust of Wikipedia vs. other platforms
    • Use case scenarios for Wikipedia vs. other platforms

Details

Due Date
Oct 31 2025, 4:00 AM
Other Assignee
CMyrick-WMF

Event Timeline

YLiou_WMF renamed this task from Quantitative Survey Analysis Plan to Quantitative Survey Analysis.Oct 1 2025, 8:51 PM
YLiou_WMF set Due Date to Oct 31 2025, 4:00 AM.

THanks @YLiou_WMF for creating this task. I co-assigned @CMyrick-WMF as she'll do a large portion of the analysis.

Miriam renamed this task from Quantitative Survey Analysis to Quantitative Survey Analysis for Reader Foundational.Oct 2 2025, 10:45 AM
Miriam renamed this task from Quantitative Survey Analysis for Reader Foundational to Quantitative Survey Analysis for Reader Foundational Research.
YLiou_WMF updated the task description. (Show Details)

Weekly update:

  • Queried the QS data
  • Wrangled response data for analysis next week: cleaned; accounted for dupes, click-backs; reshaped
  • Next week: join with session data; begin stage ii analysis

Weekly update:

  • Preliminary survey response results shared with T393802 research team.
  • Queried and joined session data with survey responses
  • Queried and joined topics data
  • Began Stage II Analyses

Weekly update:

  • Survey data has been weighted by OS family, browser family, referrer type, geography (by country), browser language preference, whether user was reading an article in the top 0.05% of enwiki articles by traffic (to adjust for "spiky" traffic)
  • Session length and article topic data analysis begun
  • Phase II survey analysis incorporated into draft readers foundational report deck

Weekly update:

  • Debugging
  • Determined thresholds and other rules for sessions
    • max 20 pages per session (99% percentile for random sample session length is 20 pages)
    • max 28800 seconds per session (8 hours; since avg online time is 7 hours worldwide and 7+ hours in U.S.)
    • exclude immediately-sequential pageview of prior page_id if same referrer class
    • if multiple access methods, browsers, or devices logged, take MIN timestamp-wise
  • Queried pageview data for top 0.05% top-popular-pages per day; used to code pageviews as a view of a top-popular page or not
  • Finished coding rules related to language preference
  • Began compiling control data based on random sample of pageview traffic
  • Began updating the weighting code
  • QCing in progress

Weekly update:

  • Finished QCing
  • Finished compiled control data based on random sample of pageview traffic
  • Calculated and applied weights to the respondent data, using the control data for the targets
  • Finished documenting all thresholds, rules, logic, etc applied during the data cleaning

Weekly update:

  • survey data analysis is continuing to be added to deck for stakeholders
  • preliminary presentations of survey data conducted for stakeholders this week
  • final deck expected to be shared next week (and hypothesis for T393802 closed)

Weekly update:

  • Queried enwiki QS data
  • Wrangled response data for analysis next week
  • Next week: join with session data; begin stage iii/iv analyses

Weekly update:

  • For Global Readers Survey English
    • Joined with session data
    • Completed state iii/iv analyses
    • Completed calculation and application of weights
    • Currently investigating possible data loss (between QS and LS)
  • For Global Readers Survey non-English (T410918)
    • Queried QS data for subset
    • Wrangled response data for subset
    • Pulled control data (for weighting) for subset
    • Next week: complete queries and wrangling for all surveyed wikis; join with session data; begin stage iii/iv analyses