Page MenuHomePhabricator

Stop sending data for Page Previews enwiki and dewiki A/B test (again)
Closed, ResolvedPublic1 Story Points

Description

NOTE: This is blocked until Wednesday, 15 November.

v2 of the Page Previews enwiki and dewiki A/B test was started at roughly 1 PM UTC on Wednesday, 18th October. It should be stopped on Thursday, Nov 16th, after we have collected four full weeks of data (see T178500#3729981).

Per T178500#3728898, we'd prefer not to disable the feature for those logged-in users who currently have it enabled via their preferences. This is actually fairly simple as @Jdlrobson added an instrumentation killswitch to the Page Previews codebase.

Plan

  • Deploy a change setting $wgPopupsEventLogging to false.

Details

Related Gerrit Patches:
operations/mediawiki-config : masterDisable EventLogging for popups

Event Timeline

phuedx created this task.Oct 18 2017, 1:48 PM
Restricted Application added subscribers: TerraCodes, Aklapper. · View Herald TranscriptOct 18 2017, 1:48 PM
ovasileva triaged this task as Normal priority.Oct 31 2017, 2:23 PM
ovasileva added subscribers: Tbayer, ovasileva.

We were discussing extending the test to gather more data. @Tbayer, @phuedx - do you think there would be an issue with storage?

From my understanding, there is no space issue on Hadoop, so it would be no problem to continue the test for say a week or two. But that's the area of expertise of Analytics Engineering and/or Ops - I'll ask in #wikimedia-analytics to confirm.

[15:47:59] <ottomata> HaeB: no storage issues in hadoop. we are maintaining a temporary custom import/refine job for this schema, while we work on more generically supporting eventlogging data in hadoop
[15:48:23] <ottomata> i think we can keep running the custom job for yall a while longer, seems fine with me

another option would be to get a couple more weeks of data and/or drop the sampling rate for the test.

phuedx added a comment.Nov 1 2017, 5:33 PM

How much data is enough data? I feel like we should have a stopping criterion or is there more nuance to this?

My concern here is more from a product perspective. While running the test we have, basically, deployed the feature for logged-in users - having it appear in the user preferences rather than in beta. I think the best way to go would be to get 1-2 more weeks of data (@Tbayer can confirm if this is enough for analysis) and then continue running the test until full feature deployment, but with lowered sampling rates.

phuedx added a comment.Nov 2 2017, 2:12 PM

@ovasileva: It seems like what you're looking for is a way to disable the A/B test without disabling the feature for logged-in users. I'll update the description accordingly.

Can you confirm that, from a product perspective, disabling the feature for a much larger cohort of logged-out users is OK?

phuedx updated the task description. (Show Details)Nov 2 2017, 2:21 PM
phuedx added a subscriber: Jdlrobson.

@phuedx - yup, that's correct.

consulted with @Tbayer and we think Thursday, Nov 16th would be a good day to drop the sampling rate to (virtually) zero. This would give us one month of data.

phuedx updated the task description. (Show Details)Nov 2 2017, 5:07 PM
ovasileva updated the task description. (Show Details)Nov 2 2017, 5:20 PM
Pcoombe added a subscriber: Pcoombe.Nov 7 2017, 4:13 PM
Tbayer updated the task description. (Show Details)Nov 13 2017, 7:07 PM

Note that we may still want to reactivate it, probably with a lower rate, to measure a new thing after T180036: Instrument time to first user link interaction is implemented.

@ovasileva: It seems like what you're looking for is a way to disable the A/B test without disabling the feature for logged-in users. I'll update the description accordingly.
Can you confirm that, from a product perspective, disabling the feature for a much larger cohort of logged-out users is OK?

To add to @ovasileva's response: We should also keep in mind that since this is only per browser session, we are actually not disabling / keeping it enabled for a fixed cohort of users. Instead, keeping the bucketing as is means constantly adding and dropping users, anyway.

Separately though, we may need to keep in mind the tests that Fundraising plans to run.

ovasileva set the point value for this task to 1.

Change 391615 had a related patch set uploaded (by Jdlrobson; owner: Jdlrobson):
[operations/mediawiki-config@master] Disable EventLogging for popups

https://gerrit.wikimedia.org/r/391615

Change 391615 merged by jenkins-bot:
[operations/mediawiki-config@master] Disable EventLogging for popups

https://gerrit.wikimedia.org/r/391615

Mentioned in SAL (#wikimedia-operations) [2017-11-15T19:40:20Z] <catrope@tin> Synchronized wmf-config/InitialiseSettings.php: Disable EventLogging for Popups (T178500) (duration: 00m 49s)

Calendar has been updated:
https://www.mediawiki.org/wiki/Reading/Web/Release_timeline#November

Rate has decreased to about 400-700 events per minute down from 14.2k so I would assume the change had the desired impact and remaining events are cached pages/open tabs :)
https://grafana.wikimedia.org/dashboard/db/eventlogging-schema?orgId=1&from=now-24h&to=now&var-schema=Popups

Wait, this was meant to be deployed tomorrow, not today. See task description.

@Jdlrobson and I discussed this a bit more on IRC right after T178500#3764662 . Apparently the "blocked until Wednesday" note in the task description had caused some confusion, although I'm not seeing anything unclear about the subsequent, more specific wording "It should be stopped on Thursday, Nov 16th, after we have collected four full weeks of data". (For those unfamiliar with the rationale, it is much preferable to do analysis for timespans of entire weeks because of the strong weekly (and daily) seasonality of reader behavior might distort results otherwise. And after launch, the experiment took at least a day to reach the full event rate, clearly an effect of caching, which we had similarly observed in previous iteration.) Jon and I briefly discussed re-enabling it at the next opportunity a few hours afterwards, but that would not have served this purpose of addressing seasonality.

We'll survive though, just need to make do with what we have, either restricting queries to three weeks instead or accepting the inaccuracies that may be caused by the missing day.

Still, on a general note , this is another reminder (of many, quite a few in the history of this schema alone) that instrumentation changes should not be rushed. Trial and error may work better for UI changes that can easily be rolled back without lasting effects, but corrupted or missing data cannot be regenerated.

BTW, as noted by @Jdlrobson over at T179914#3764603 , there was a weird gap followed by a spike earlier on Nov 15, a few hours before the stop of the test, which similarly happened for the Print schema.

BTW, as noted by @Jdlrobson over at T179914#3764603 , there was a weird gap followed by a spike earlier on Nov 15, a few hours before the stop of the test, which similarly happened for the Print schema.

I followed up on #wikimedia-analytics; this was caused by yesterday's migration of the EventLogging master database. Fortunately it does not seem to have affected the timestamps in the eventual Hive table, i.e. that gap is affecting the Grafana graph only: https://www.irccloud.com/pastebin/HNLBpSrs/hourly%20event%20rates%20for%20Popups%20on%202017-11-15

@Tbayer - looks like all is okay. Good to resolve?

@Tbayer - looks like all is okay. Good to resolve?

Yes, the event rate has dropped to less then 20 per minute as of today November 17. (And as mentioned above it's not really possible to fix the data issues resulting from the date mixup, so we'll leave it be.) I'll tick the deployment checkbox and close the task now.

Tbayer renamed this task from Stop Page Previews enwiki and dewiki A/B test (again) to Stop sending data for Page Previews enwiki and dewiki A/B test (again).Nov 17 2017, 11:57 PM
Tbayer closed this task as Resolved.
Tbayer updated the task description. (Show Details)