Page MenuHomePhabricator

Restore reading depth schema
Closed, ResolvedPublic5 Estimated Story Points

Description

Background

One of the main questions from T294503: [EPIC] Measuring the impact of exposing talk pages to mobile web anons is whether readers understand the purpose of a talk page on mobile once exposed to it. We are gathering information on this through a few different factors. This task will focus on one of those factors - the amount of time an anonymous user spends on a talk page.

Instrumentation spec: https://docs.google.com/spreadsheets/d/11o7ZBtFff2Bi2L0kD-qC1c4XS82QqVs8fjcms824aO0/edit#gid=0

Acceptance criteria

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

I looked into this on Friday, and given this was not migrated this is not a simple case of a revert. We’d need to make the schema as if it was a new one. The code looks more or less ready to go though if we can do that. https://gerrit.wikimedia.org/r/c/mediawiki/extensions/WikimediaEvents/+/735690

Are we wanting to collect data from all skins or only particular ones?

Are we wanting to collect data from all skins or only particular ones?

Currently, just Minerva and Vector (no need to distinguish skin version)

Note, for mainspaces, we will be looking at all different talk namespaces: https://en.wikipedia.org/wiki/Wikipedia:Namespace#Talk_namespaces

Note that access_method is being upstreamed per T294246

Change 737527 had a related patch set uploaded (by Jdlrobson; author: Jdlrobson):

[schemas/event/secondary@master] Restore ReadingDepth schema

https://gerrit.wikimedia.org/r/737527

Change 735690 had a related patch set uploaded (by Jdlrobson; author: Jdlrobson):

[mediawiki/extensions/WikimediaEvents@master] Restore ReadingDepth instrument

https://gerrit.wikimedia.org/r/735690

Change 737530 had a related patch set uploaded (by Jdlrobson; author: Jdlrobson):

[mediawiki/extensions/WikimediaEvents@master] Make ReadingDepth run on Minerva

https://gerrit.wikimedia.org/r/737530

LGoto updated Other Assignee, removed: cjming.

Hi, based on the conversation in old ticket T229042, it seems that the instrumentation will generate a high volume of data. Will we enable a sampling rate in the event logging?

Yes. For now, sampling rate is 0. When we are ready to enable it, we'll need to define a sampling rate. @jwang do you know what sampling rate we used previously? If not, I can dig that out.

Change 737796 had a related patch set uploaded (by Jdlrobson; author: Jdlrobson):

[mediawiki/extensions/WikimediaEvents@master] Modernize performance metric collection

https://gerrit.wikimedia.org/r/737796

Change 737527 merged by jenkins-bot:

[schemas/event/secondary@master] Restore ReadingDepth schema

https://gerrit.wikimedia.org/r/737527

Yes. For now, sampling rate is 0. When we are ready to enable it, we'll need to define a sampling rate. @jwang do you know what sampling rate we used previously? If not, I can dig that out.

From the log on talk page (https://meta.wikimedia.org/w/index.php?title=Schema_talk%3AReadingDepth&type=revision&diff=19319623&oldid=19319613), the sampling rate was 0.1% at some point and then changed to 0. Previously, schema was enabled on all wikis. If we only enable the schema on enwiki this time, we probably can have a sampling rate higher than 0.1%.

Change 737796 abandoned by Nray:

[mediawiki/extensions/WikimediaEvents@master] Modernize performance metric collection

Reason:

Squashed into Ib94c6019f004fe3fcd877f62ff5208ec5e07d2a1

https://gerrit.wikimedia.org/r/737796

Change 739016 had a related patch set uploaded (by Clare Ming; author: Clare Ming):

[schemas/event/secondary@master] Update web_ui_reading_depth schema

https://gerrit.wikimedia.org/r/739016

Change 739016 merged by jenkins-bot:

[schemas/event/secondary@master] Update web_ui_reading_depth schema

https://gerrit.wikimedia.org/r/739016

hi @jwang -- for the page_length property, will it work for you if we calculate height of the viewport (total scroll height of the page)?

This may be problematic given the different port widths of devices so in conjunction with height, we could also include screen width.

Can you run your queries accordingly to include/exclude based on height (or height + width) of the view port?

Hi @cjming, page_length doesn't mean the physical length. It means content length. It is stored at page table, and documented as below:

page_len
Uncompressed length in bytes of the page's current source text.

This however, does not apply to images which still have records in this table. Instead, the uncompressed length in bytes of the description for the file is used as the latter is in the text.old_text field.

The Wikipage class in includes/WikiPage.php has two methods, viz., insertOn() and updateRevisionOn() that are responsible for populating these details.

page_length doesn't mean the physical length. It means content length. It is stored at page table

thanks for the clarification @jwang - duly noted - i'll work on getting this value populated for the page_length property

Change 739661 had a related patch set uploaded (by Clare Ming; author: Clare Ming):

[schemas/event/secondary@master] Update web_ui_reading_depth schema

https://gerrit.wikimedia.org/r/739661

Change 739661 abandoned by Clare Ming:

[schemas/event/secondary@master] Update web_ui_reading_depth schema

Reason:

margins got effed - starting over

https://gerrit.wikimedia.org/r/739661

Change 739666 had a related patch set uploaded (by Clare Ming; author: Clare Ming):

[schemas/event/secondary@master] Update web_ui_reading_depth schema

https://gerrit.wikimedia.org/r/739666

Change 739666 merged by jenkins-bot:

[schemas/event/secondary@master] Update web_ui_reading_depth schema

https://gerrit.wikimedia.org/r/739666

Change 739678 had a related patch set uploaded (by Nray; author: Nray):

[schemas/event/secondary@master] Update web_ui_reading_depth schema

https://gerrit.wikimedia.org/r/739678

Change 739678 merged by jenkins-bot:

[schemas/event/secondary@master] Update web_ui_reading_depth schema

https://gerrit.wikimedia.org/r/739678

hi @jwang - one more question -- during code review, a privacy concern (in case of a data leak) was raised around page_length. Confirming that rounding off to the first digit is ok (and possibly even makes your analysis easier?) for assessing page length?

Hi @chming, rounding off to the first digit is OK!

Change 737530 abandoned by Nray:

[mediawiki/extensions/WikimediaEvents@master] Make ReadingDepth run on Vector

Reason:

Merged into Ib94c6019f004fe3fcd877f62ff5208ec5e07d2a1

https://gerrit.wikimedia.org/r/737530

Change 740287 had a related patch set uploaded (by Nray; author: Nray):

[schemas/event/secondary@master] Elaborate on reading depth schema fields

https://gerrit.wikimedia.org/r/740287

Change 740287 merged by jenkins-bot:

[schemas/event/secondary@master] Elaborate on reading depth schema fields

https://gerrit.wikimedia.org/r/740287

Change 740667 had a related patch set uploaded (by Nray; author: Nray):

[operations/mediawiki-config@master] Enable reading depth instrumentation at low sampling rate

https://gerrit.wikimedia.org/r/740667

Change 735690 merged by jenkins-bot:

[mediawiki/extensions/WikimediaEvents@master] Restore ReadingDepth instrument

https://gerrit.wikimedia.org/r/735690

Change 740613 had a related patch set uploaded (by Nray; author: Jdlrobson):

[mediawiki/extensions/WikimediaEvents@wmf/1.38.0-wmf.9] Restore ReadingDepth instrument

https://gerrit.wikimedia.org/r/740613

Change 740686 had a related patch set uploaded (by Clare Ming; author: Clare Ming):

[mediawiki/extensions/WikimediaEvents@master] Update access_method value in reading depth instrument

https://gerrit.wikimedia.org/r/740686

Change 740686 merged by jenkins-bot:

[mediawiki/extensions/WikimediaEvents@master] Update access_method value in reading depth instrument

https://gerrit.wikimedia.org/r/740686

Change 740690 had a related patch set uploaded (by Nray; author: Clare Ming):

[mediawiki/extensions/WikimediaEvents@wmf/1.38.0-wmf.9] Update access_method value in reading depth instrument

https://gerrit.wikimedia.org/r/740690

Change 740613 merged by jenkins-bot:

[mediawiki/extensions/WikimediaEvents@wmf/1.38.0-wmf.9] Restore ReadingDepth instrument

https://gerrit.wikimedia.org/r/740613

Change 740690 merged by jenkins-bot:

[mediawiki/extensions/WikimediaEvents@wmf/1.38.0-wmf.9] Update access_method value in reading depth instrument

https://gerrit.wikimedia.org/r/740690

Mentioned in SAL (#wikimedia-operations) [2021-11-23T00:30:25Z] <urbanecm@deploy1002> Synchronized php-1.38.0-wmf.9/extensions/WikimediaEvents: 3f860c7: fa9fbf1: WikimediaEvents bbackports (2/2; T294777) (duration: 00m 55s)

Change 740667 merged by jenkins-bot:

[operations/mediawiki-config@master] Enable reading depth instrumentation at low sampling rate

https://gerrit.wikimedia.org/r/740667

Mentioned in SAL (#wikimedia-operations) [2021-11-23T00:41:38Z] <urbanecm@deploy1002> Synchronized wmf-config/InitialiseSettings.php: b9209433dfc8b1f81a165ec75867337800db24b1: Enable reading depth instrumentation at low sampling rate (T294777) (duration: 00m 56s)

Change 740892 had a related patch set uploaded (by Nray; author: Nray):

[operations/mediawiki-config@master] Increase reading depth sampling rate to .1%

https://gerrit.wikimedia.org/r/740892

Change 740892 merged by jenkins-bot:

[operations/mediawiki-config@master] Increase reading depth sampling rate to .1%

https://gerrit.wikimedia.org/r/740892

Mentioned in SAL (#wikimedia-operations) [2021-11-23T19:08:49Z] <urbanecm@deploy1002> Synchronized wmf-config/InitialiseSettings.php: 3993aacbfdbbfb6cdcc198ce369bf08b32ace865: Increase reading depth sampling rate to .1% (T294777) (duration: 00m 57s)

Edtadros subscribed.

@cjming I'm not sure what testable acceptance criteria are here.

thanks @Edtadros -- I think this can move to sign off since the stream is live and we're seeing reading depth events piping into Grafana - https://grafana-rw.wikimedia.org/d/000000566/overview?viewPanel=28&orgId=1

cjming removed cjming as the assignee of this task.Dec 1 2021, 9:10 PM
cjming updated the task description. (Show Details)

Verified that reading depth instrument is logging events and data is being stored in hue:

Screen Shot 2021-12-07 at 8.16.02 PM.png (1×1 px, 293 KB)