Page MenuHomePhabricator

Schema QA : mediawiki_reading_depth
Closed, ResolvedPublic

Description

As one of the instrumentations for talk page, Web team has deployed the instrumentation to track read depth on talk page. T294777

Instrumentation note

The related events will be stored in a new schema mediawiki_reading_depth. Sample rate is 0.1% on English Wikipedia.

QA summary

What has been checkedStatusNote
Daily events✅passThe events are available sine 12-20-2021
Events by wiki projects✅passEnabled only on english wikipedia
Events by namespace✅passnamespace id is captured
Events by user type✅passBoth anonymous user and logged in user are captured
Events by action✅passTwo types of actions are recorded: pageLoaded, pageUnloaded
Events by page_length✅passPage length is recorded. Round down to the 1st digit
Events by access method✅passTwo types of access methods are recorded: desktop, mobile web
Events by agent type✅passWe can identify and exclude spider in analysis
Read length✅passReading time is captured. For all events in 2021, the average time length is 1178408, range [0, 2603264579]

Please see QA notebook for details.

Bugs/Potential Issues
IssuesStatusNote
dt field doesn't match with the partition year, month day✅Not a bugT292586#7581979

Event Timeline

@ovasileva , similar to T292586#7576004, The data in dt field doesn't match with the partition year, month day. It stored the date as early as 1986. It also stored future dates, in 2022, etc.

A few examples:

date_timeyearmonthdayeventssessions
1986-08-222021112321
2021-12-142021121211
2021-12-142021121611
2021-12-152021121530083371457
2021-12-1620211214152
2021-12-182021121232
2021-12-182021121632
2021-12-192021121521
2021-12-242021112521
2022-01-02202112521

Query code

SELECT TO_DATE(dt),  year, month,day, COUNT(1) AS events, 
COUNT(DISTINCT session_token) AS sessions
FROM event.mediawiki_reading_depth
WHERE year=2021
GROUP BY TO_DATE(dt),year, month,day

@ovasileva I completed post-deployment QA of the instrumentation added to track read depth. All events appear to be recording as expected.

What has been checkedStatusNote
Daily events✅passThe events are available sine 12-20-2021
Events by wiki projects✅passEnabled only on english wikipedia
Events by namespace✅passnamespace id is captured
Events by user type✅passBoth anonymous user and logged in user are captured
Events by action✅passTwo types of actions are recorded: pageLoaded, pageUnloaded
Events by page_length✅passPage length is recorded. Round down to the 1st digit
Events by access method✅passTwo types of access methods are recorded: desktop, mobile web
Events by agent type✅passWe can identify and exclude spider in analysis
Read length✅passReading time is captured. For all events in 2021, the average time length is 1178408, range [0, 2603264579]

Please see QA notebook for details.

Perfect, thank you @jwang! Resolving this one