Page MenuHomePhabricator

Deploy ToC A/B test to remainder of desktop improvements pilot wikis
Closed, ResolvedPublic2 Estimated Story Points

Description

Background

This is the remainder of the deployments of the A/B test of the new table of contents for the desktop improvements project (first deployment tracked in T306606: Deploy ToC A/B test to euwiki, hewiki)

Acceptance criteria

  • Check the mediawiki_reading_depth schema is enabled for these pilot wikis
  • Deploy the A/B test for the ToC to anonymous and logged-in users on all remaining pilot wikis except fr wiki and ptwiki
  • Review the number of events we are getting for this schema on grafana and determine appropriate sampling rate for frwiki and ptwiki
  • Deploy to frwiki and ptwiki

Signoff criteria

  • Ensure data is being logged correctly

Event Timeline

ovasileva set the point value for this task to 2.Apr 21 2022, 5:21 PM
ovasileva renamed this task from Deploy ToC to remainder of desktop improvements pilot wikis to Deploy ToC A/B test to remainder of desktop improvements pilot wikis.May 9 2022, 5:47 PM

@jwang @ovasileva we should be able to do this during the week.
One open question is the sampling rate on $wgVectorWebABTestEnrollment
When we enabled A/B test for EU and HE wikis we did so at 100%.
This leads to 7.5 average events a second
Desktop click tracking is 100 events per second (20% sampling rate)
Virtual page views is 1k a second (We shouldn't really exceed this rate)

With the above in mind I think we could sample 100% of events in desktop improvements wikis, and see about 500 events a second.

However, a more cautious approach (particularly during an offsite week) would be to either

  1. set the sampling rate for 50% (25% in treatment, 25% not) for ptwiki and frwiki given their size (100% for all other wikis)
  2. Exclude frwiki and ptwiki from this initial deployment.

Would one of those options be okay? if not, the risk of doing 100% is we would be potentially setting someone up for a stressful day dealing with an unbreak now during the offsite week.

@jdrobson, I agree with you. As we decided to observe impact instead of statistic analysis, remove two data points ( 2 wikis) won't impact the observation analysis. Both options are OK. If we decide to go with the 2nd option, we can consider deploying on frwiki and ptwiki on stage 2 , together with Itwiki , arabic wiki, cawiki, only on logged-in users.

Checked in with Olga, and we'll deploy to all pilot wikis except French and Portuguese to begin with. :)

Change 792272 had a related patch set uploaded (by Clare Ming; author: Clare Ming):

[operations/mediawiki-config@master] Deploy TOC A/B test to pilot wikis except frwiki, ptwiki

https://gerrit.wikimedia.org/r/792272

cjming moved this task from Doing to Code Review on the Web-Team-Backlog (Kanbanana-FY-2021-22) board.
cjming subscribed.

Change 792272 merged by jenkins-bot:

[operations/mediawiki-config@master] Deploy TOC A/B test to pilot wikis except frwiki, ptwiki

https://gerrit.wikimedia.org/r/792272

Mentioned in SAL (#wikimedia-operations) [2022-05-17T20:11:46Z] <cjming@deploy1002> Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:792272|Deploy TOC A/B test to pilot wikis except frwiki, ptwiki (T306607)]] (duration: 00m 53s)

I'll keep an eye on https://grafana.wikimedia.org/d/000000566/overview?viewPanel=30 over the next few days to make sure things aren't melting

We're averaging about 25-30 events/sec so far since deploying to pilot wikis.

Looking good! Going to move this one back to ready for the frwiki and ptwiki deployment. I think we can proceed with that one on Monday

@cjming @Jdlrobson

As we discussed in the meeting, we want to have a simple estimation of sample rate on ptiwki and frwiki based on their pageview traffic level.

Here are the monthly pageviews of April 2022 by pilot wikis by access method, excluding spider traffic.

projectdesktop_pageviewsmobileweb_pageviewsapps_pageviewstotal pageviews
fr.wikipedia26785297240454889411481045683882911
pt.wikipedia878412621601666361456398249464296
tr.wikipedia45210461907706862183463138164610
ko.wikipedia418232214235958346993784652741
vi.wikipedia243097483877805935225863440065
fa.wikipedia211129941653518671546062188010923
he.wikipedia1821658943352226150794563076760
th.wikipedia134179193075458412864744301150
fr.wiktionary109577625632485NULL16590247
sr.wikipedia74871541935887518791527033944
foundation.wikimedia3813768206341NULL4020109
bn.wikipedia25086091506732226223017838161
eu.wikipedia'rGOJU206687257b74'512304111282590304
incubator.wikimedia904740128683NULL1033423
vec.wikipedia66122141135686703042
de.wikivoyage514131293993NULL808124
ary.wikipedia158145402701113199528
pl.wikinews14773920781NULL168520
fr.wikiquote13862089607NULL228227
pt.wikinews12767136229NULL163900
pt.wikiversity9779379170NULL176963
vi.wikibooks4898718183NULL67170

@Jdlrobson @jwang based on the pageviews that Jennifer just provided, does it make sense to still to apply the same sampling to frwiki + ptwiki as the rest of the pilot wikis?

fawiki (this is the 3rd highest and already deployed as part of first set of pilot wikis) at ~188M isn't too far below ptwiki at ~249M -- but frwiki is more than 3x fawiki - we could lump ptwiki with the other pilot wikis and maybe do the following for frwiki?

	'frwiki' => [
		'name' => 'skin-vector-toc-experiment',
		'enabled' => true,
		'buckets' => [
			'unsampled' => [
				'samplingRate' => 0.5
			],
			'control' => [
				'samplingRate' => 0.25
			],
			'treatment' => [
				'samplingRate' => 0.25
			],
		]
	],

and/or do the same for ptwiki?

I err'd on setting frwiki + ptwiki to 50% sampling per T306607#7931749

Feel free to let me know if it should be different and I can adjust patch accordingly.

Change 797424 had a related patch set uploaded (by Clare Ming; author: Clare Ming):

[operations/mediawiki-config@master] Deploy TOC A/B test to frwiki, ptwiki at 50%

https://gerrit.wikimedia.org/r/797424

cjming moved this task from Doing to Code Review on the Web-Team-Backlog (Kanbanana-FY-2021-22) board.

@cjming, are you talking about total pageviews? I think we should look at desktop_pageviews. fawiki :ptwiki : frwiki= 1:4:12.

@jwang oh right - makes sense. Then I think 50% sampling makes sense for frwiki + ptwiki given the 4x, 12x numbers

Change 797424 merged by jenkins-bot:

[operations/mediawiki-config@master] Deploy TOC A/B test to frwiki, ptwiki at 50%

https://gerrit.wikimedia.org/r/797424

Mentioned in SAL (#wikimedia-operations) [2022-05-23T20:47:31Z] <cjming@deploy1002> Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:797424|Deploy TOC A/B test to frwiki, ptwiki at 50% (T306607)]] (duration: 00m 52s)

Checking on the state of this one - have we confirmed the data looks okay after the deployment? @jwang, @cjming, @Jdlrobson

@ovasileva I will do this as part of sign-off. Ran out of time today.

Sorry for the delay on this one.

Checks:

  • Ran a query to check the bucketing. The groups look evenly distributed. For French for example 1591291 events were logged for treatment bucket, and 1537230 for control. [1]
  • Checked ReadingDepth was logging events [2]. For French I saw 274477 events.

I think this should suffice for now. Do we have an analysis task for Jennifer so that we have a single place to discuss anything deeper analysis flags?

[1]

SELECT  `group`, experiment_name, wiki, count(*)
FROM event.mediawiki_web_ab_test_enrollment
WHERE 
year=2022 and month=5 and day = 30
GROUP by `group`, experiment_name, wiki;

[2] `
SELECT normalized_host, count(*)
FROM mediawiki_reading_depth
WHERE
year=2022 and month=5 and day = 30
GROUP by normalized_host;