Page MenuHomePhabricator

Relaunch page previews a/b test on en and de wiki
Closed, ResolvedPublic1 Estimated Story Points

Description

Background

Due to a bug in eventlogging (see T175918), we were not able to obtain correct session data for the previous hovercards a/b test on English and German wikipedias

Pre-deployment checklist:

  • Ensure data is removed from MySQL
  • See T174815#3650853 onwards.
  • The fix for T175918 is deployed to the group2 wikis.
  • See T175918#3645697.
  • Analytics Engineering confirms that the backend for Popups events has been switched to Hadoop from MySQL.
  • See T176469#3688309

Acceptance Criteria

Questions

  • What are the bucket sizes for each group per project?

The bucket sizes that we used in T172291: Launch page previews A/B test on enwiki and dewiki resulted in an average overall rate of roughly 200 events/second, but 100 events/second were deemed sufficient earlier (cf. T176469#3688649 ). Thus we should cut the bucket sizes in half for the new experiment:

wikion:off:control$wgPopupsAnonsExperimentalGroupSize
enwiki0.03:0.03:0.94 0.015:0.15:0.970.06 0.03
dewiki0.08:0.08:0.84 0.04:0.04:0.920.16 0.08

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

FYI the EL bug affected all instrumentation that uses the subscriber protocol (ReadingDepth, RelatedArticles, Popups).

phuedx updated the task description. (Show Details)
Jdlrobson subscribed.

Doesn't look like anything needs analysis here (if there is please can you update the description with those specific questions?)

Needs analysis based on the pre-deployment checkline - we're not sure if we have a good place for the data currently. More info in T174815: Schema:Popups suddenly stopped logging events in MariaDB, but they are still being sent according to Grafana

(Following up on T174815#3665062 :)

@Nuria suggested in T174815#3627902 that this new test should directly use the Hadoop/Hive backend without going through the MySQL (master) table, to avoid the MySQL replication issues. We should discuss here what kind of preparations are necessary to do this. I assume there is nothing we need to change directly in the client side EL code?

phuedx updated the task description. (Show Details)

Change 383378 had a related patch set uploaded (by Jdlrobson; owner: Jdlrobson):
[operations/mediawiki-config@master] pagePreviews: Restart A/B test on enwiki and dewiki

https://gerrit.wikimedia.org/r/383378

Blocked on analytics giving us the green

ovasileva changed the task status from Open to Stalled.Oct 10 2017, 5:05 PM
ovasileva moved this task from 2017-18 Q1 to Upcoming on the Web-Team-Backlog board.

(Following up on T174815#3665062 :)

@Nuria suggested in T174815#3627902 that this new test should directly use the Hadoop/Hive backend without going through the MySQL (master) table, to avoid the MySQL replication issues. We should discuss here what kind of preparations are necessary to do this. I assume there is nothing we need to change directly in the client side EL code?

Ping @Nuria - are there any preparations the web team needs to take care of to implement your suggestion for this test?

Change 383389 had a related patch set uploaded (by Nuria; owner: Nuria):
[operations/puppet@production] Do not store PopUps events on MySQL

https://gerrit.wikimedia.org/r/383389

Ping @Nuria - are there any preparations the web team needs to take care of to implement your suggestion for this test?

No, the change i have just made in puppet needs to be merged, cc @mforns and @Ottomata

Ping @Nuria - are there any preparations the web team needs to take care of to implement your suggestion for this test?

No, the change i have just made in puppet needs to be merged, cc @mforns and @Ottomata

OK, thanks! What about making the resulting Hadoop data accessible via Hive - is this going to happen automatically, or will this involve a manual step on your side (such as for the nuria.popups table in T174815 )? We would want to be able to do some initial checks right after the launch.

The first import will be manual, subsequent ones will be automated via cron cc @mforns

Change 383389 abandoned by Nuria:
Do not store PopUps events on MySQL

https://gerrit.wikimedia.org/r/383389

Change 384542 had a related patch set uploaded (by Nuria; owner: Nuria):
[operations/puppet@production] Do not store PopUps events on MySQL

https://gerrit.wikimedia.org/r/384542

Change 384542 merged by Ottomata:
[operations/puppet@production] Do not store PopUps events on MySQL

https://gerrit.wikimedia.org/r/384542

@Tbayer @MBinder_WMF and all in this task,

The merged Gerrit change above blacklists the Popup schema for MySQL insertion. So from this moment on, Popup events are not inserted into MySQL any more, and are only stored in HDFS [1]. This allows for the A/B test to be launched without risk to MySQL database hosts.

Also, this Popups data in HDFS is being refined periodically (each hour) into Hive as of now, meaning the Popups Json data that is ingested from EventLogging is being parsed and copied into a partitioned Hive table[2][3] (Parquet format), that can be queried normally. Note, there's a 3 hour lag until data is available.

So, we think the A/B experiment can start :]

[1] HDFS path for Popups Json source data: /wmf/data/raw/eventlogging/eventlogging_Popups
[2] HDFS path for Popups Parquet refined data: /user/tbayer/eventlogging_refine_test/Popups
[3] Corresponding Popups Hive table: tbayer.Popups

Jdlrobson changed the task status from Stalled to Open.Oct 16 2017, 6:48 PM

@phuedx, @Niedzielski: The currently stated sampling rates should work, but I sense some confusion behind T176469#3626758 and T176469#3672668 :

@phuedx: The bucket sizes that we used in T172291: Launch page previews A/B test on enwiki and dewiki resulted in rates circa 300 events/second, which led to replication issues between the EventLogging MySQL databases (see T174815#3574066 onwards).
Please target 100 events per second! (Via @phuedx)

To briefly recap:

  • We already targeted 100 events/second (on average) for the previous test, per the consultation at T172322.
  • However, as first pointed out by Analytics Engineering at T174815#3589983, the actual average rate turned out to be higher than that - around 200 events/second over the entire two weeks. That means that the projection at T172291#3500535 had not quite been correct. In fairness to @phuedx, his calculation there had been marked as a "strawman" only, and there seem to be real differences between the wikis.
  • There's no guarantee that these new MySQL replication issues (not anticipated during the consultation) would have been avoided even if our estimates had been correct and we would only have generated around 100 events/sec. With the new test, we are going to sidestep that issue altogether by deactivating recording of the data in MySQL and relying on the Hive version only, which is understood to allow much higher event rates. In any case though, we can and should stick with the rate that was already anticipated earlier to give us enough data for the planned duration of two weeks.
  • Also, the "circa 300 events/second" link above links a Grafana board about a different schema (NavigationTiming ), and what's more, these numbers are by minute, not by second: 300 per minute = 5 per second. The correct link is always accessible via the talk page of the schema on Meta. As mentioned, the average was roughly 200 events/second in the previous test, so cutting sampling rates in half should still work.

I'm updating the task description accordingly.

To confirm,: hadoop backend has no issues with 300 events per sec for this schema so calculations do not need fine grained. Between 100 to 300 is a fine interval.

  • Also, the "circa 300 events/second" link above links a Grafana board about a different schema (NavigationTiming ), and what's more, these numbers are by minute, not by second: 300 per minute = 5 per second. The correct link is always accessible via the talk page of the schema on Meta. As mentioned, the average was roughly 200 events/second in the previous test, so cutting sampling rates in half should still work.

Sorry about the confusion with the link to the Grafana dashboard. The link was missing the schema variable. I've updated the link in the description to match the original time range for posterity (better to link to an absolute time range than a relative one!)

IIRC the 300/second number came from the peak rate on that graph, ~18k events/second, so thanks for changing it to the average rate too.

Sam will swat this tomorrow during the European SWAT window. https://gerrit.wikimedia.org/r/383378 can be used or abandoned - whatever makes most sense :)

Just to confirm T176469#3691348, I will be deploying @Jdlrobson's change during tomorrow's European SWAT window (1-2 PM UTC, Wednesday, 18th October) /cc @Nuria @mforns

Awesome, will check for incoming data during the afternoon.

Change 383378 merged by jenkins-bot:
[operations/mediawiki-config@master] pagePreviews: Restart A/B test on enwiki and dewiki

https://gerrit.wikimedia.org/r/383378

Mentioned in SAL (#wikimedia-operations) [2017-10-18T13:15:14Z] <zfilipin@tin> Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:383378|pagePreviews: Restart A/B test on enwiki and dewiki (T176469)]] (duration: 00m 51s)

I've deliberately skipped QA as we did a thorough test before and during the deploy of the first A/B test (see T172291#3560020 onwards). I'll make sure to highlight this during standup.

tested the logged-in preferences and logged-out opting in/out - looks good!