Page MenuHomePhabricator

Mobile page issues - instrument page issues
Closed, ResolvedPublic5 Estimated Story Points

Description

Update [08/08]
We're almost done with this task. It's proved a little trickier than the 5 points we originally hoped as it's required several changes to the A/B test infrastructure/ReadingDepth in addition to the implementation of the mobile issues instrumentation.


Draft schema: https://meta.wikimedia.org/wiki/Schema:PageIssues

Background

We have a variety of questions we would like to answer around the page issues feature. We plan on running qualitative testing as well as on instrumenting the feature and running an A/B test.

Research questions

for A/B test:
Does the new treatment for page issues increase the awareness among readers of page issues?

  • Is there an increase in clickthrough based on the new issue treatments (from the article page to the issues modal, from the issues modal to anywhere else - details about issues type, modal dismissed, etc, i.e. where do people go after the modal)?
  • Does clickthrough depend on the severity of each issue?
  • Do mobile edits increase with page issues as referrer?

Rough schema outline:

Actions:

  • Page loaded (for any view of an article with issues, as a baseline in order to be able to calculate clickthrough rates)
  • issue clicked - user clicks issue to open modal
  • edit button clicked - user starts editing the page
  • modal:
    • internal links (usually to the wikipedia or help namespace)
    • edit link ("improve this article")
    • X (close)

Fields to be logged with each action:

  • page title
  • issue type(s), e.g. "speedy", "style", "protection" (could be several for pageloaded and edit events, but only one for "issue clicked" and "modal")
  • Issue severity level(s), i.e. one of "severe", "high", "low", "notice" (could be several for pageloaded and edit events, but only one for "issue clicked" and "modal". May only be reliable on enwiki)
  • anon/logged-in
  • for logged-in users, their edit count (bucketed for privacy reasons)
  • For each issue on the page, the number of the section it is located in (T202098)

Technically, page issues are (for the purposes of this instrumentation) defined and detected via the CSS class table.ambox.

Future research questions

  • Does the new issue treatment changes affect issue removal rates? (would require tracking further down the edit funnel, possible creating an edit tag or logging user IDs)
  • How does the new treatment of page issues affect reading depth (time spent on page)?
  • How does placement of the issue notice on the page (top vs. bottom of lead paragraph) affect the click-through rates?

Acceptance Criteria

  • Create the page issues schema as described above.
  • Events will be sampled by user (session)
  • Only fire events for mainspace pages which have page issues on them (defined via ambox templates, see above). Do not fire events if they don't.
  • Events are defined inside the schema under 'action'
  • We are likely to need to use a higher sampling rate for the page issues AB test compared with the existing ReadingDepth instrumentation to ensure we get a large enough dataset. We also want to track the metrics from the ReadingDepth schema for users opted into this experiment. Given this would dilute the existing sample set of the ReadingDepth schema, we will need to distinguish between events captured via the existing ReadingDepth instrumentation and events that have been requested by the page issues AB test. To do this we will add one or more new fields to ReadingDepth which describe whether the event is from the default reading depth schema OR if something has explicitly opted into the schema via the hook we recently added. Option 1.: A string field sampleGroup set to "default", "page-issues-a" , "page-issues-b" - downside: can't handle the case of overlapping samples. Option 2: Add a new boolean field for every possible sample group (e.g. default_sample, page-issues-a_sample , page-issues-b_sample) - advantages: can handle overlapping samples, makes queries faster and arguably easier, downside: will need additional schema changes for each new sample/data source added in the future (although that could also have advantages regarding transparency).
  • Fill out the schema talk page with the SchemaDoc template (including schema maintainers and whitelist)
  • Submit whitelist patch --> no longer necessary for PageIssues per T203596: Flip blacklist for MySQL eventlogging consumer to be a whitelist of allowed schemas , done for ReadingDepth in T203596#4577552

Open questions

- Do we want to track how many different issues the page has?
For pageLoaded events, this information can be derived as the length of the array in the sectionNumbers (or isssuesSeverity) field.

Note

This builds on the A/B test infrastructure provided in T193584.

FOUC issues will be dealt with separately. They've been discussed and are probably not a problem from a A/B test POV but this should be more of a design concern and will be handled as part of design review of T191303

QA

Testing AB+ReadingDepth sample

Ask a developer to configure staging like so:

$wgWMEReadingDepthSamplingRate = 1;
$wgMinervaABSamplingRate = 1;

Visit any page with issues.
Check the ReadingDepth schema is working correctly:

  • Go to a page with issues (old treatment), issues-a_sample":true,"default_sample":true should be present in all ReadingDepth EventLogging requests
  • Go to a page with issues (new treatment), issues-b_sample":true,"default_sample":true should be present in all ReadingDepth

EventLogging requests

  • Go to a page without issues, "default_sample":true should be present in all ReadingDepth EventLogging requests

There are two kinds of ReadingDepth events: pageLoaded is supposed to be sent on the initial page load, and pageUnloaded when the pageview ends (when the user navigates to a different page in the same tab or - harder to test - closes the tab).

Check the PageIssues schema is working:

  • A page without issues should log no events to the PageIssues schema
  • A page with issues should log events to the PageIssues schema
  • Check there are unique events for
    • The page loading
    • Edit icon at top of page clicked
    • Edit icon in subsection clicked
    • A issues banner being clicked
    • Edit ink inside IssuesOverlay being clicked
    • Link inside IssuesOverlay being clicked
    • Overlay is closed

Testing AB without default ReadingDepth sampling

Ask a developer to configure staging like so:

$wgWMEReadingDepthSamplingRate = 0;
$wgMinervaABSamplingRate = 1;

Visit any page with issues.
Check the ReadingDepth schema is working correctly

  • Go to a page with issues (old treatment), issues-a_sample":true should be present in all EventLogging requests. default_sample should not be present.
  • Go to a page with issues (new treatment), issues-b_sample":true should be present in all EventLogging requests. default_sample should not be present.
  • Go to a page without issues, no ReadingDepth event should be fired.

Exploratory QA

  • It should never be possible to click a banner any where in the page and see no issues. If a situation is found where this is the case please flag it! (T203386)
  • In events sectionNumbers and issuesSeverity should always be the same length. Can you find a case where this is not the case? (T203050)

Latvian

Ask a developer to point staging at Latvian Wikipedia.
Run through the same testing steps above and check for consistency.

Design

Please flag any inconsistencies between the design on Latvian Wikipedia to English Wikipedia. You may also want to ping @alexhollender during this stage of QA for his input.

Sign off steps

Related Objects

StatusSubtypeAssignedTask
ResolvedJdlrobson
ResolvedJdlrobson
Resolvedovasileva
Resolved alexhollender_WMF
DuplicateNone
Resolved Niedzielski
Resolvedovasileva
Resolvedovasileva
Resolvedphuedx
Resolved Tbayer
Resolvedmforns
Resolvedovasileva
Resolved Niedzielski
Resolved Tbayer
Resolved Tbayer
Resolved Tbayer
Resolved Tbayer
Resolvedovasileva
Resolved Tbayer

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Change 443569 merged by jenkins-bot:
[mediawiki/extensions/WikimediaEvents@master] ReadingDepth test should accept a boolean field for sample groups.

https://gerrit.wikimedia.org/r/443569

This is a bit late in the game, but I want to flag that with this new schema we have an opportunity to consider whether we want to ask Analytics Engineering engineering to ingest its data into Druid, in order to potentially make it accessible as a view or dashboard in Superset. As a first step this would require checking whether the schema's format satisfies these (draft) guidelines.

Personally I think this is a nice-to-have rather than a priority (also considering that one won't be able to evaluate the results of the A/B based on Superset alone, and that e.g. ReadingDepth might be a more interesting candidate for Druid ingestion).

Update: Per discussion with the Analytics Engineering team on IRC, this should be feasible without changes to the schema (except possibly some renaming of fields). Some unsuitable fields like pageTitle will be left out. I have filed T202751: Ingest data from PageIssues EventLogging schema into Druid

Change 455220 had a related patch set uploaded (by Jdlrobson; owner: Jdlrobson):
[mediawiki/skins/MinervaNeue@master] Restore issuesSeverity in pageLoaded event

https://gerrit.wikimedia.org/r/455220

Change 455215 had a related patch set uploaded (by Jdlrobson; owner: Jdlrobson):
[mediawiki/skins/MinervaNeue@master] Page issues instrumentation: log sections as array of numbers

https://gerrit.wikimedia.org/r/455215

Change 455272 had a related patch set uploaded (by Jdlrobson; owner: Jdlrobson):
[mediawiki/skins/MinervaNeue@master] Make edit click handling consistent

https://gerrit.wikimedia.org/r/455272

Change 455220 abandoned by Jdlrobson:
Restore issuesSeverity in pageLoaded event

Reason:
Got it. Let me investigate what's going on here, as something else likely causing this bug.

you are right this shouldn't be necessary.

https://gerrit.wikimedia.org/r/455220

Change 455215 abandoned by Jdlrobson:
Page issues instrumentation: log sections as array of numbers

Reason:
Testing this on master it looks to be behaving correctly with one exception:
https://phabricator.wikimedia.org/T202940

https://gerrit.wikimedia.org/r/455215

Assigning to Tilman for time being. I'll run through this with him at 4pm today. We can then do QA with Rumana if needed! Watch this space!

We need to arrange a 3rd rubber ducky session with Tilman to confirm the instrumentation is working as planning before signing this off and starting the a/b test.

For the record, we found one new issue while testing on lvwiki today that we decided shouldn't become a blocker for the A/B test rollout, but nevertheless needs to be taken into account as it may affect the data considerably, and should ideally be fixed too:

If the page issue template contains a link to the article talk page (like https://en.wikipedia.org/wiki/Template:NPOV : "Relevant discussion may be found on the talk page."), and the talk page doesn't exist, a click on that link will be incorrectly logged as modalEditClicked (instead of modalInternalClicked as intended). Example: https://lv.m.wikipedia.org/wiki/Ivars_Zaikins

I.e. many events that we interpret as an intention to change the article will instead have come from a mere intention to learn more about the issue. Hopefully this will cancel out in the A/B test as it affects both the old and new design, but it will impact our estimate for the frequency of edit attempts in general.

Jdlrobson edited projects, added Product-QA; removed Patch-For-Review.

Over to you @Ryasmeen
I realise the QA steps are quite detailed, involves looking at EventLogging and this one might be tricky and time-consuming to test, but I want you to know I am available to support you!

For the record, we found one new issue while testing on lvwiki today that we decided shouldn't become a blocker for the A/B test rollout, but nevertheless needs to be taken into account as it may affect the data considerably, and should ideally be fixed too:

Filed as T204073: Click on modal link to nonexisting talk page should be logged as modalInternalClicked

Started Testing AB+ReadingDepth sample:
So far tested on Desktop Chrome, Desktop Firefox, Desktop Safari using these three articles:

https://reading-web-staging.wmflabs.org/wiki/Cell_Cycle
https://reading-web-staging.wmflabs.org/wiki/Misandry
http://reading-web-staging.wmflabs.org/w/index.php?title=Harry_Potter

For Chrome and Firefox, the ReadingDepth schema is working correctly

For the pages with issues (old treatment), issues-a_sample":true,"default_sample":true is present in all EventLogging requests
For the pages with issues (new treatment), issues-b_sample":true,"default_sample":true is present in all EventLogging requests
For the page without issues, "default_sample":true is present in all EventLogging requests

However, for Safari (Version 11.1 (11605.1.33.1.4)) I see no ReadingDepth schema related EventLogging requests. Already told @Jdlrobson about it.

Based on a very quick check, ReadingDepth seems to receive events from Safari users at a ballpark plausible rate:

browserevents (Sep 10)
Chrome207196
Chrome Mobile180710
Mobile Safari52491
Firefox42411
Samsung Internet21895
Edge16888
Safari11907
Chrome Mobile WebView7849
Opera6806
UC Browser4891
Yandex Browser3675
Firefox Mobile3237
Opera Mobile1538
Crosswalk989
Chrome Mobile iOS917
Amazon Silk783
Mobile Safari UI/WKWebView721
Coc Coc609
Edge Mobile490
YandexSearch486
Chromium443
Facebook356
Vivaldi348
Baiduspider-render292
Android129
Sogou Explorer108
QQ Browser104
......

Data via

SELECT useragent.browser_family AS browser, COUNT(*) AS events FROM event.readingdepth WHERE year = 2018 AND month = 9 and DAY = 10 GROUP BY useragent.browser_family ORDER BY events DESC LIMIT 50 ;

The issue impacts older versions of Safari. I can replicate it in v11.1.1 (which Rumana is also using). The point stands - it is possible in current implementation for ReadingDepth to be disabled when PageIssues is enabled (sendBeacon support is needed for ReadingDepth, but not for PageIssues)

The issue impacts older versions of Safari. I can replicate it in v11.1.1 (which Rumana is also using). The point stands - it is possible in current implementation for ReadingDepth to be disabled when PageIssues is enabled (sendBeacon support is needed for ReadingDepth, but not for PageIssues)

But per https://caniuse.com/#feat=beacon , that version of Safari should support sendBeacon .

Another issue found:

On iOS Safari,

For pages with issues (old treatment/new treatment), the only parameter that is present in all EventLogging requests is "default_sample":true.
The parameters issues-a_sample":true, and issues-b_sample":true are missing.

While checking the PageIssues schema found out another issue:

On Desktop Safari, there is no EventLogging request for clicking on a link inside the IssuesOverlay.

BTW, to clarify just in case (because it's not explicit in the QA steps listed in the task description):
For the ReadingDepth schema, the most important events to check are the pageUnloaded actions that are sent when a tab is closed or the user navigates away to a different page.

Ok never mind this one. Discussed this with @Jdlrobson , this is not an issue, just did not know that the events show up once I go back to the PageIssues overlay.

While checking the PageIssues schema found out another issue:

On Desktop Safari, there is no EventLogging request for clicking on a link inside the IssuesOverlay.

Checked the PageIssues schema today on Desktop Chrome, Desktop Firefox, Desktop Safari, iOS Safari and Android Chrome

All of the following steps are passing for all of them.

A page without issues should log no events to the PageIssues schema
A page with issues should log events to the PageIssues schema
Check there are unique events for
The page loading
Edit icon at top of page clicked
Edit icon in subsection clicked
A issues banner being clicked
Edit ink inside IssuesOverlay being clicked
Link inside IssuesOverlay being clicked
Overlay is closed

So, I am done with testing the first part of it. Going to test "AB without ReadingDepth" next, once @Jdlrobson configures staging for it.

You can move on to Testing AB without ReadingDepth, staging has been updated:

$wgWMEReadingDepthSamplingRate = 0;
$wgMinervaABSamplingRate = 1;

Done with testing testing AB without ReadingDepth. No new issues found other than the two known issues we already have: T204144 and T204143.

This concludes my testing on this issue. Over to you @ABorbaWMF for the remaining Exploratory testing per your email.

@ABorbaWMF let me know when you want to do exploratory testing on Latvian. Staging is ready for you for testing English in the mean time.

BTW, to clarify just in case (because it's not explicit in the QA steps listed in the task description):
For the ReadingDepth schema, the most important events to check are the pageUnloaded actions that are sent when a tab is closed or the user navigates away to a different page.

Added more detail to the task description in that regard (after discussion with @Ryasmeen ).

Looks good on Staging in English. Can someone please switch staging to Latvian?

image.png (984×540 px, 173 KB)

image.png (984×540 px, 173 KB)

Screen Shot 2018-09-14 at 11.24.25 AM.png (456×2 px, 79 KB)

Screen Shot 2018-09-14 at 11.16.56 AM.png (750×2 px, 158 KB)

Screen Shot 2018-09-14 at 11.24.54 AM.png (428×2 px, 82 KB)

Screen Shot 2018-09-14 at 11.26.03 AM.png (726×1 px, 243 KB)

@ABorbaWMF: I still need to check something that @Tbayer added yesterday. Moving it back to QA for now.

BTW, to clarify just in case (because it's not explicit in the QA steps listed in the task description):
For the ReadingDepth schema, the most important events to check are the pageUnloaded actions that are sent when a tab is closed or the user navigates away to a different page.

Added more detail to the task description in that regard (after discussion with @Ryasmeen ).

Checked that the 'action: pageLoaded' is sent on the initial page load, and 'action: pageUnloaded' is sent when I navigated away to a different page. However I couldn't verify the 'action:pageUnloaded' for iOS Safari and Chrome Android because the toast message disappears too quickly right before navigating away. But for desktop Chrome, Firefox it's working correctly. @Tbayer: Do you think that's an acceptable test coverage for this? :)

BTW, to clarify just in case (because it's not explicit in the QA steps listed in the task description):
For the ReadingDepth schema, the most important events to check are the pageUnloaded actions that are sent when a tab is closed or the user navigates away to a different page.

Added more detail to the task description in that regard (after discussion with @Ryasmeen ).

Checked that the 'action: pageLoaded' is sent on the initial page load, and 'action: pageUnloaded' is sent when I navigated away to a different page. However I couldn't verify the 'action:pageUnloaded' for iOS Safari and Chrome Android because the toast message disappears too quickly right before navigating away. But for desktop Chrome, Firefox it's working correctly. @Tbayer: Do you think that's an acceptable test coverage for this? :)

Cool, thanks! I think that's good enough for now. Once it's in production, we could check the 'pageUnloaded' event for these two browsers by instead looking at what is recorded in the EL table server-side. (This is usually a bit more effort than testing instrumentations directly in the browser like here, but in this case it might be preferable over trying to read that vanishing toast message.)

Is that all the QA done for this particular task?

Is that all the QA done for this particular task?

I think so! :) Thanks @Jdlrobson for helping me to test this task.
Moving it to Signoff.

ovasileva updated the task description. (Show Details)

All acceptance criteria is completed. We will be validating this on Latvian wikipedia T204609: Turn on page issues A/B test for Latvian Wikipedia, and conduct data checks. Resolving this.