We're almost done with this task. It's proved a little trickier than the 5 points we originally hoped as it's required several changes to the A/B test infrastructure/ReadingDepth in addition to the implementation of the mobile issues instrumentation.
Draft schema: https://meta.wikimedia.org/wiki/Schema:PageIssues
We have a variety of questions we would like to answer around the page issues feature. We plan on running qualitative testing as well as on instrumenting the feature and running an A/B test.
for A/B test:
Does the new treatment for page issues increase the awareness among readers of page issues?
- Is there an increase in clickthrough based on the new issue treatments (from the article page to the issues modal, from the issues modal to anywhere else - details about issues type, modal dismissed, etc, i.e. where do people go after the modal)?
- Does clickthrough depend on the severity of each issue?
- Do mobile edits increase with page issues as referrer?
Rough schema outline:
- Page loaded (for any view of an article with issues, as a baseline in order to be able to calculate clickthrough rates)
- issue clicked - user clicks issue to open modal
- edit button clicked - user starts editing the page
- internal links (usually to the wikipedia or help namespace)
- edit link ("improve this article")
- X (close)
Fields to be logged with each action:
- page title
issue type(s), e.g. "speedy", "style", "protection" (could be several for pageloaded and edit events, but only one for "issue clicked" and "modal")
- Issue severity level(s), i.e. one of "severe", "high", "low", "notice" (could be several for pageloaded and edit events, but only one for "issue clicked" and "modal". May only be reliable on enwiki)
- for logged-in users, their edit count (bucketed for privacy reasons)
- For each issue on the page, the number of the section it is located in (T202098)
Technically, page issues are (for the purposes of this instrumentation) defined and detected via the CSS class table.ambox.
Future research questions
- Does the new issue treatment changes affect issue removal rates? (would require tracking further down the edit funnel, possible creating an edit tag or logging user IDs)
- How does the new treatment of page issues affect reading depth (time spent on page)?
- How does placement of the issue notice on the page (top vs. bottom of lead paragraph) affect the click-through rates?
- Create the page issues schema as described above.
- Events will be sampled by user (session)
- Only fire events for mainspace pages which have page issues on them (defined via ambox templates, see above). Do not fire events if they don't.
- Events are defined inside the schema under 'action'
- We are likely to need to use a higher sampling rate for the page issues AB test compared with the existing ReadingDepth instrumentation to ensure we get a large enough dataset. We also want to track the metrics from the ReadingDepth schema for users opted into this experiment. Given this would dilute the existing sample set of the ReadingDepth schema, we will need to distinguish between events captured via the existing ReadingDepth instrumentation and events that have been requested by the page issues AB test. To do this we will add one or more new fields to ReadingDepth which describe whether the event is from the default reading depth schema OR if something has explicitly opted into the schema via the hook we recently added. Option 1.: A string field sampleGroup set to "default", "page-issues-a" , "page-issues-b" - downside: can't handle the case of overlapping samples. Option 2: Add a new boolean field for every possible sample group (e.g. default_sample, page-issues-a_sample , page-issues-b_sample) - advantages: can handle overlapping samples, makes queries faster and arguably easier, downside: will need additional schema changes for each new sample/data source added in the future (although that could also have advantages regarding transparency).
- Fill out the schema talk page with the SchemaDoc template (including schema maintainers and whitelist)
- Submit whitelist patch --> no longer necessary for PageIssues per T203596: Flip blacklist for MySQL eventlogging consumer to be a whitelist of allowed schemas , done for ReadingDepth in T203596#4577552
- Do we want to track how many different issues the page has?
For pageLoaded events, this information can be derived as the length of the array in the sectionNumbers (or isssuesSeverity) field.
This builds on the A/B test infrastructure provided in T193584.
FOUC issues will be dealt with separately. They've been discussed and are probably not a problem from a A/B test POV but this should be more of a design concern and will be handled as part of design review of T191303
Testing AB+ReadingDepth sample
Ask a developer to configure staging like so:
$wgWMEReadingDepthSamplingRate = 1; $wgMinervaABSamplingRate = 1;
Visit any page with issues.
Check the ReadingDepth schema is working correctly:
- Go to a page with issues (old treatment), issues-a_sample":true,"default_sample":true should be present in all ReadingDepth EventLogging requests
- Go to a page with issues (new treatment), issues-b_sample":true,"default_sample":true should be present in all ReadingDepth
- Go to a page without issues, "default_sample":true should be present in all ReadingDepth EventLogging requests
There are two kinds of ReadingDepth events: pageLoaded is supposed to be sent on the initial page load, and pageUnloaded when the pageview ends (when the user navigates to a different page in the same tab or - harder to test - closes the tab).
Check the PageIssues schema is working:
- A page without issues should log no events to the PageIssues schema
- A page with issues should log events to the PageIssues schema
- Check there are unique events for
- The page loading
- Edit icon at top of page clicked
- Edit icon in subsection clicked
- A issues banner being clicked
- Edit ink inside IssuesOverlay being clicked
- Link inside IssuesOverlay being clicked
- Overlay is closed
Testing AB without default ReadingDepth sampling
Ask a developer to configure staging like so:
$wgWMEReadingDepthSamplingRate = 0; $wgMinervaABSamplingRate = 1;
Visit any page with issues.
Check the ReadingDepth schema is working correctly
- Go to a page with issues (old treatment), issues-a_sample":true should be present in all EventLogging requests. default_sample should not be present.
- Go to a page with issues (new treatment), issues-b_sample":true should be present in all EventLogging requests. default_sample should not be present.
- Go to a page without issues, no ReadingDepth event should be fired.
- It should never be possible to click a banner any where in the page and see no issues. If a situation is found where this is the case please flag it! (T203386)
- In events sectionNumbers and issuesSeverity should always be the same length. Can you find a case where this is not the case? (T203050)
Ask a developer to point staging at Latvian Wikipedia.
Run through the same testing steps above and check for consistency.
Please flag any inconsistencies between the design on Latvian Wikipedia to English Wikipedia. You may also want to ping @alexhollender during this stage of QA for his input.
Sign off steps
- Go through and tick the acceptance criteria
- Work out if we need to turn this on for validation prior to A/B test (see https://phabricator.wikimedia.org/T200792#4492951)