Page MenuHomePhabricator

Extend the MobileWebUIActions sampling rate to A/B test wiki
Closed, ResolvedPublic

Description

This task represents the work involved with making it so the VisualEditorFeatureUse, EditAttemptStep, MobileWebUIActions, DesktopWebUIActions, and Talk_Page_Edit schemas sample events at the same rates, using the same method.

This is needed in order to complete the analysis planned in T298062.

Related: T309260.

Requirements

  • Sampling is turned on in the MobileWebUIActions and DesktopWebUI schemas at the Wikipedias that will be participating in the upcoming mobile DiscussionTools A/B test (T314950) that is happening in T298062: az.wiki, de.wiki, es.wiki, fa.wiki, hi.wiki, id.wiki, it.wiki, nl.wiki, pl.wiki, pt.wiki, ro.wiki, ru.wiki, th.wiki, tr.wiki, and uk.wiki.
  • EditingAttemptStep schema's sampling rate is adjusted such that it matches the MobileWebUIActions and DesktopWebUI schemas' current sampling rates:
'wgWMEDesktopWebUIActionsTracking' => [
        'default' => 0,
        'desktop-improvements' => 0.2, // T258058
        'officewiki' => 0,
        'testwiki' => 1, // T256992
],

'wgWMEMobileWebUIActionsTracking' => [
        'default' => 0.1, // T220016
        'enwiki' => 0.01, // T295432
],

Done

  • ===Requirements are implemented
  • Once all patches to resolve this task are merged, @MNeisler to verify sampling rates are synchronized and instrumentation is working as expected

Event Timeline

Change 851182 had a related patch set uploaded (by DLynch; author: DLynch):

[operations/mediawiki-config@master] Bump sampling rate to 0.2 for various editing schemas on a/b test wikis

https://gerrit.wikimedia.org/r/851182

@MNeisler could you confirm that 0.2 is an acceptable rate for the A/B test wikis? I made it that to be consistent with T309260 in which we rolled the mobile features out to the initial trial wikis.

@DLynch @ppelberg
Quick update:
I checked with @jwang to determine if it would be possible to increase the MobileWebUIActions sampling rate for the A/B test wikis from 0.2 to 0.4 to ensure we can gather enough AB test data to potentially run the test for just a 2-week period vs 4 weeks.

Jennifer discussed this with the Web team, who indicated that they don't have concerns except that the 0.4 sampling rate may be problematic for some of the larger wikis in the AB test since the MobileWebUIActions click tracking fires at least one event for every page view. I'm going to follow up with Data Engineering today to confirm if there are concerns with the high volume of events for these wikis.

If Data Engineering confirms that the sampling rate increase might be problematic, then I'd recommend we proceed with the 0.2 rate. I can monitor how many events are logged after a 2-week period to determine if a sufficient number of events have been logged or if we need to run the test for a longer duration.

@DLynch @ppelberg
Quick update:
I checked with @jwang to determine if it would be possible to increase the MobileWebUIActions sampling rate for the A/B test wikis from 0.2 to 0.4 to ensure we can gather enough AB test data to potentially run the test for just a 2-week period vs 4 weeks.

Jennifer discussed this with the Web team, who indicated that they don't have concerns except that the 0.4 sampling rate may be problematic for some of the larger wikis in the AB test since the MobileWebUIActions click tracking fires at least one event for every page view. I'm going to follow up with Data Engineering today to confirm if there are concerns with the high volume of events for these wikis.

If Data Engineering confirms that the sampling rate increase might be problematic, then I'd recommend we proceed with the 0.2 rate. I can monitor how many events are logged after a 2-week period to determine if a sufficient number of events have been logged or if we need to run the test for a longer duration.

All that you described above sounds great, @MNeisler. Thank you for the update.

I'm going to follow up with Data Engineering today to confirm if there are concerns with the high volume of events for these wikis.

@Ottomata confirmed that the increase to 0.4 should be ok but recommended we do the following given the potential risks from the likely high volume of events:

  • Let Data Engineering know when this rolls out
  • Make a note in the deployment change/backport schedule so the deployers know to be careful
  • Keep an eye on eventage-analytics-external in case capacity needs to be increased to support the high volume of events. Latency and saturation can be watched on this dashboard following the change.

See Slack thread.

@ppelberg
Given the potential risk of the 0.4 sampling rate, I'm leaning towards moving forward with the originally planned increase to a 0.2 sampling rate.
Thinking

  • 0.2 is consistent with the sampling rate used for the initial set of wikis, where we rolled out the mobile features T309260. Based on the number of events logged on those wikis, I think we will likely get sufficient events for the larger AB test wikis and to complete the overall analysis within a 2-week test duration with this rate. Some of the smaller wikis will only have a limited number of talk page events but these can be excluded from the per wiki analysis as needed (similar to what was done in past AB tests)
  • I can check the aggregate data after two weeks to confirm if a sufficient number of events have been logged and we will still have the option to run the test for longer if needed.
  • A 0.2 sampling rate should keep us safely under the 1000 events per second threshold[i], which is typically deemed a safe volume of events.

What are your thoughts?

[i]This is estimated based on the maximum event submission rates over the last two weeks (400 events/second) with a default 0.1 sampling rate.

MNeisler triaged this task as Medium priority.Nov 7 2022, 6:41 PM
MNeisler edited projects, added Product-Analytics (Kanban); removed Product-Analytics.
MNeisler moved this task from Next 2 weeks to Doing on the Product-Analytics (Kanban) board.

Peter said in a call today that 0.2 should be fine.

I'm going to follow up with Data Engineering today to confirm if there are concerns with the high volume of events for these wikis....

What are your thoughts?

Peter said in a call today that 0.2 should be fine.

@MNeisler: thank you for looking into this and sharing all that you have found and are thinking as a result.

Doing what you suggested, and what @DLynch mentioned above, works for me:

  1. Sampling events at 0.2 rate
  2. Check the aggregate data after two weeks to confirm if a sufficient number of events have
    • I'll file a new task for doing the above.

Change 851182 merged by jenkins-bot:

[operations/mediawiki-config@master] Bump sampling rate to 0.2 for various editing schemas on a/b test wikis

https://gerrit.wikimedia.org/r/851182

Mentioned in SAL (#wikimedia-operations) [2022-11-08T21:26:22Z] <urbanecm@deploy1002> Started scap: Backport for [[gerrit:851182|Bump sampling rate to 0.2 for various editing schemas on a/b test wikis (T321734)]], [[gerrit:854592|ThreadItemStore: Fix setting parent IDs when parent already existed (T322599)]]

Mentioned in SAL (#wikimedia-operations) [2022-11-08T21:26:41Z] <urbanecm@deploy1002> urbanecm and kemayo and matmarex: Backport for [[gerrit:851182|Bump sampling rate to 0.2 for various editing schemas on a/b test wikis (T321734)]], [[gerrit:854592|ThreadItemStore: Fix setting parent IDs when parent already existed (T322599)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet

Mentioned in SAL (#wikimedia-operations) [2022-11-08T21:32:07Z] <urbanecm@deploy1002> Finished scap: Backport for [[gerrit:851182|Bump sampling rate to 0.2 for various editing schemas on a/b test wikis (T321734)]], [[gerrit:854592|ThreadItemStore: Fix setting parent IDs when parent already existed (T322599)]] (duration: 05m 45s)

  • Once all patches to resolve this task are merged, @MNeisler to verify sampling rates are synchronized and instrumentation is working as expected

Moving this to Blocked while @MNeisler completes QA.

Doing what you suggested, and what @DavidL mentioned above, works for me:

You probably meant DLynch, but completion suggestion listed my name I guess.

Doing what you suggested, and what @DavidL mentioned above, works for me:

You probably meant DLynch, but completion suggestion listed my name I guess.

Oops, yep. Thank you for letting me know ^ _ ^

I reviewed the recorded data in MobileWebUIActions, DesktopWebUIActions, and EditAttemptStep and confirmed the sampling rate increase and synchronization across the schemas. Summary of checks

✅ All EditAttemptStep sessions on the identified AB Test wikis have an associated page token MobileWebUI and DesktopWebUIActions that can be used to join those events, with the exception of some server-side events (Wikitext init events and save events) as expected.
✅ Sampling rates appear to have increased based on a review of events logged before and after the deployment of this patch.
✅ All expected associated event data is logged
✅ There can be one or multiple edit sessions logged per page token as expected. The typical number of sessions per pageToken appeared as expected.

@ppelberg - Reassigning to you for sign-off