Page MenuHomePhabricator

Make EditAttemptStep sampling rate consistent with MobileWebUIActions and DesktopWebUI
Closed, ResolvedPublic

Description

In T303654#7959081, we decided that it is feasible to make it so the VisualEditorFeatureUse, EditAttemptStep, MobileWebUIActions, DesktopWebUIActions, and Talk_Page_Edit schemas sample events at the same rates, using the same method.

This task represents the work involved with making it so the schemas mentioned above actually sample events at the same rates and using the same method.

Requirements

  • Sampling is turned on in the MobileWebUIActions and DesktopWebUI schemas at the nine Wikipedias the Editing Team is partnering with for the improvements they are making to mobile talk pages (T294609): ja.wiki, ar.wiki, fr.wiki, ko,wiki, vi.wiki, he.wiki, bn.wiki, zh.wiki, and ht.wiki.
  • EditingAttemptStep schema's sampling rate is adjusted such that it matches the MobileWebUIActions and DesktopWebUI schemas' current sampling rates:
'wgWMEDesktopWebUIActionsTracking' => [
        'default' => 0,
        'desktop-improvements' => 0.2, // T258058
        'officewiki' => 0,
        'testwiki' => 1, // T256992
],

'wgWMEMobileWebUIActionsTracking' => [
        'default' => 0.1, // T220016
        'enwiki' => 0.01, // T295432
],

Open Question(s)

  • 1. What – if any – concerns does the Web Team have with the ===Requirements listed above?
    • Per the guidance @Jdlrobson shared when we spoke on 26 May, the Web Team does not have any concerns with the Editing Team moving forward with implementing the requirements above.

Done

  • Answers to all ===Open Question(s) are documented
  • ===Requirements are implemented
  • Once all patches to resolve this task are merged, @MNeisler to verify sampling rates are synchronized and instrumentation is working as expected
  • Notify @jwang when the patch(es) to resolve this ticket land so that she can update relevant documentation

Related Objects

Event Timeline

FYI @jwang
The implications of this change are that more wikis will be logging events on desktop so when doing analysis against desktop improvements wikis we may need to filter these out.

@ppelberg

Re the following requirement:

Sampling is turned on in the MobileWebUIActions and DesktopWebUI schemas at the nine Wikipedias the Editing Team is partnering with for the improvements they are making to mobile talk pages (T294609): ja.wiki, ar.wiki, fr.wiki, ko,wiki, vi.wiki, he.wiki, bn.wiki, zh.wiki, and ht.wiki.

For the DiscussionTools on Mobile project, we will need sampling turned on at the wikis you identified in the requirement above specifically for the MobileWebUIActionsTracking schema.

For the planned Usability Improvements project, we will also need sampling turned on at any partner wikis identified for the T302359 and T302358 analyses for the DesktopWebUIActionsTracking schema (assuming they are different than the ones for the mobile project).

Is that clarification/addition correct?

@DLynch
It looks like the DesktopWebUI and MobileWebUI have different sampling rates (See current rates in task description). How do we want to handle those different rates within EditAttemptStep? If we have different sampling rates being applied to Editattemptstep events, we'll need some way to easily identify and filter out events being sampled at different rates, like the is_oversample field.

Otherwise, we could also see if we can make the desktop and mobilewebui sampling rate the same for all the partner wikis we've identified as wanting to be able to join with EditAttemptStemp.

FYI MobileWebUIActionsTracking is already enabled on all wikis at 10% (except English which is 1%).

@ppelberg

Re the following requirement:

Sampling is turned on in the MobileWebUIActions and DesktopWebUI schemas at the nine Wikipedias the Editing Team is partnering with for the improvements they are making to mobile talk pages (T294609): ja.wiki, ar.wiki, fr.wiki, ko,wiki, vi.wiki, he.wiki, bn.wiki, zh.wiki, and ht.wiki.

For the DiscussionTools on Mobile project, we will need sampling turned on at the wikis you identified in the requirement above specifically for the MobileWebUIActionsTracking schema.

For the planned Usability Improvements project, we will also need sampling turned on at any partner wikis identified for the T302359 and T302358 analyses for the DesktopWebUIActionsTracking schema (assuming they are different than the ones for the mobile project).

Is that clarification/addition correct?

Great spot, @MNeisler. What you described is accurate.

We will, "...also need sampling turned on at any partner wikis identified for the T302359 and T302358 analyses for the DesktopWebUIActionsTracking schema..." Tho, we have not yet finalized who those "partner wikis" are in the context of T302359 and T302358.

In line with the above, I've filed a ticket to serve as a reminder for doing what you described: T309406

Change 804022 had a related patch set uploaded (by DLynch; author: DLynch):

[operations/mediawiki-config@master] Sync sampling rates at 9 wikis DiscussionTools is testing

https://gerrit.wikimedia.org/r/804022

That patch will synchronize the sampling rate in the UIActions schemas, EditAttemptStep, and VisualEditorFeature use all at 20% on the 9 partner wikis. I picked 20% because that's the rate the desktop-improvements wikis are set to, and there's no current separate config to have a different rate between desktop and mobile for EditAttemptStep so lining everything up with the highest one was easiest. It does mean that EditAttemptStep will be sampling at higher than its normal 1/16 rate, and the MobileUIActions will be sampling at double their normal 10% rate on those wikis.

Will review patch and provide any follow-up comments/questions soon

@DLynch - with the patch, we will be introducing a third sampling rate within EditAttemptStep. Events such as discussion tool-related events are oversampled at 100%, these 9 partner wikis will be sampled at 20%, and all other events will be sampled at the normal 1/16 rate. Can you confirm if the following is the correct way to identify and distinguish these events within EditAttemptStep.

  • All events currently sampled at 100% within EditAttemptStep such as discussion tool-related events will continue to be sampled at 100% at these 9 partner wikis (and can be identified by the is_oversample field).
  • All events sampled at this higher (20% rate) within EditAttemptStep can be identified by filtering data to those 9 partner wikis (All events at these 9 wikis will be sampled at 20% and other wikis at the normal 1/16 rate).

Note: It would be worthwhile to add documentation somewhere (maybe in an associated ReadMe file to document these sampling rates). I can create a separate task to do that if that makes sense.

It does mean that EditAttemptStep will be sampling at higher than its normal 1/16 rate, and the MobileUIActions will be sampling at double their normal 10% rate on those wikis.

@jwang FYI. Let us know if you have any concerns about the sampling rate of MobileWebUIactions at these 9 wikis (a.wiki, ar.wiki, fr.wiki, ko,wiki, vi.wiki, he.wiki, bn.wiki, zh.wiki, and ht.wiki.)

Can you confirm

Yes, that sounds correct: is_oversample will be present for DT events which aren't at the regular EAS rate, and otherwise you can look for those specific wikis.

Note: It would be worthwhile to add documentation somewhere (maybe in an associated ReadMe file to document these sampling rates). I can create a separate task to do that if that makes sense.

I'm not really sure what a good place to document this is, given how scattered the definitions are. (There's the schema itself which is rate-agnostic, the config files which override them, and the extensions where default values are set...)

I wonder if a useful feature for these schemas going forward would be logging the sampling rate alongside the event? If that was included, we wouldn't have to worry about all this comparatively-complicated analysis. It'd make more sense as a platform-level feature than as something included in an individual schema, though.

I wonder if a useful feature for these schemas going forward would be logging the sampling rate alongside the event? If that was included, we wouldn't have to worry about all this comparatively-complicated analysis. It'd make more sense as a platform-level feature than as something included in an individual schema, though.

I agree this definitely makes the most sense if feasible. There was a recent conversation in the Product-Analytics slack channel about this, which resulted in the creation of T310693. In the meantime, I'll need to think through the best way to document and communicate this change to others that use this schema so they are aware.

Change 804022 merged by jenkins-bot:

[operations/mediawiki-config@master] Sync sampling rates at 9 wikis DiscussionTools is testing

https://gerrit.wikimedia.org/r/804022

Mentioned in SAL (#wikimedia-operations) [2022-06-27T20:09:33Z] <cjming@deploy1002> Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:804022|Sync sampling rates at 9 wikis DiscussionTools is testing (T309260)]] (duration: 03m 36s)

MNeisler updated the task description. (Show Details)

Moving to blocked pending deployment of patch. Once patch is deployed, I will QA to verify sampling rates are synchronized and instrumentation is working as expected.

Sorry, I failed to convey this explicitly yesterday: that comment by Stashbot was the patch being deployed.

I reviewed the aggregate data and verified that sampling rates appear synchronized and instrumentation is working as expected. See a summary of checks below:

✅ All EditAttemptStep sessions on those 9 wikis have an associated page token MobileWebUI and DesktopWebUIActions that can be used to join those events, with the exception of server-side events (Wikitext init events and save events) as expected.
✅ Events are logged for all 9 partner wikis on MobileWebUI and DesktopWebUIActions. The MobileWebUIActions sampling has increased on those wikis following the deployment of this patch as expected.
✅ All expected associated event data is logged (pageToken, session id, actions, platform, editor_interface)
✅ There can be one or multiple edit sessions logged per page token as expected. The typical number of sessions per pageToken appeared as expected.

cc @jwang so you are aware and can update any relevant documentation on the Web's team's side.

@ppelberg - Reassigning this task to you for sign-off.