Page MenuHomePhabricator

Vet and explore new readership engagement metric
Closed, ResolvedPublic

Description

(Task intended to form part of a possible Outreachy internship in data analysis with the WMF reading team)

Vet and explore a new privacy-friendly desktop readership engagement metric (planned to be based on T155639 ), and build a reporting mechanism for it.
Deliverables:

  1. (~ 2 weeks) A report examining various possible specifications of this metric (e.g. choice of percentile, etc.), their possible data quality issues and suggestions how to fix or mitigate them, and an assessment of their sensitivity and robustness
  2. (~ 1 week) An exploratory analysis showing how the chosen metric differs across various dimensions, e.g. project language or geographical region
  3. (~ 1 week) A workflow or an automated tool to regularly inform the Reading team and the Wikimedia movement on how this metric is developing

Event Timeline

Change 338966 had a related patch set uploaded (by Phuedx):
Enable ReadingDepth instrumentation

https://gerrit.wikimedia.org/r/338966

Change 338966 merged by jenkins-bot:
Enable ReadingDepth logging on Wikipedias

https://gerrit.wikimedia.org/r/338966

Mentioned in SAL (#wikimedia-operations) [2017-02-21T14:08:21Z] <hashar@tin> Synchronized wmf-config/InitialiseSettings.php: Enable ReadingDepth logging on Wikipedias - T148262 T155639 (duration: 00m 45s)

This project is currently in progress. I've completed initial data quality checks on the table and developed the queries for average and percentiles.

JKatzWMF added a subscriber: Zareenf.

the next step here is for @Tbayer to break up the remaining items on this task as necessary for someone to work on

Resetting task assignee as the user is not active here anymore.

kzimmerman claimed this task.
kzimmerman subscribed.

Draft of the report from this work is here: https://meta.wikimedia.org/wiki/Research:Reading_time/Draft_Report

With the current limitations on mobile tracking, as well as the high load on event logging, we're proposing removing event logging for the reading depth schema (T229042) and replacing it with a session length schema (https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Traffic/SessionLength).