Page MenuHomePhabricator

Remove A/B testing instrumentation code
Closed, ResolvedPublic3 Estimated Story Points

Description

Background

Readers Web have run numerous A/B tests for Page Previews and has been reporting their results as part of T182315: Write and publish report on enwiki and dewiki a/b test analysis for page previews and T182314: Analyze results of enwiki and dewiki page previews a/b test.

The bucketing code for A/B testing will likely not be used any more now that the feature is live for all anonymous users on all Wikipedias. The code

AC

  • The config option PopupsAnonsExperimentalGroupSize and related code is removed.
  • When PopupsEventLogging is enabled it will continue to log events for all users (we don't plan on using this, but a decision still remains about how we might use this again (see T193051)
  • The ./getBucket module is removed.
  • The corresponding unit tests

Testing criteria

  • Generic page previews testing should be done on the beta cluster.
  • Check both anonymous and a logged in user can opt in and out of page previews with predictable results

Sign off steps

The following should be done as part of the sign off process:

  • Record the assets size change by disabling the code.

The asset change was negligible. This sign off step is a bit redundant given we descoped this task to be about the A/B testing code NOT the instrumentation code.

Related Objects

StatusSubtypeAssignedTask
Resolvedovasileva
ResolvedJdlrobson
DuplicateNone
DuplicateNone
Resolvedovasileva
Resolvedovasileva
Resolvedovasileva
Resolvedphuedx
Resolvedphuedx
DuplicateNone
ResolvedJdlrobson
ResolvedJdlrobson
DuplicateNone
Duplicateovasileva
Resolvedovasileva
DuplicateNone
DeclinedNone
DuplicateJdlrobson
ResolvedMhurd
Declined JMinor
Resolvedphuedx
Resolved Pchelolo
ResolvedJdlrobson
Declined Pchelolo
Resolvedphuedx
DeclinedJdlrobson
DuplicateNone
Resolved Fjalapeno
Resolvedphuedx
Declinedpmiazga
DeclinedNone
Resolvedphuedx
DeclinedNone
Resolved Pchelolo
Resolved bearND
Resolved Mholloway
ResolvedMSantos
Resolved Mholloway
InvalidNone
ResolvedJdlrobson
InvalidNone
DuplicateNone
ResolvedJdlrobson
ResolvedJdlrobson
ResolvedJdlrobson
ResolvedJdlrobson
Resolvedphuedx
Resolved bearND
Resolved Mholloway
DuplicateNone
ResolvedJdlrobson
ResolvedJdlrobson
Resolvedphuedx
ResolvedJdlrobson
ResolvedJdlrobson
Resolved bearND
ResolvedJdlrobson
Resolved Mholloway
Resolved Mholloway
ResolvedJdlrobson
ResolvedJdlrobson
Resolved bearND
Resolved Tbayer
ResolvedNone
Resolvedovasileva
Resolved Tbayer
ResolvedNone
Resolvedmforns
Resolvedphuedx
DeclinedNone

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
Jdlrobson changed the task status from Stalled to Open.Mar 29 2018, 11:57 PM
Jdlrobson added a subscriber: ovasileva.

Sounds like this is not stalled any more...

@ovasileva it looks like the EventLogging code when enabled (it's currently disabled everywhere) runs on 100% of traffic in existing form so it doesn't feel like we'll be enabling this any time soon (and if we are it will need rewriting).

Removing it will drastically cut the size of the repo and improve performance for editors. If we're done with it, I'd suggest carefully removing it shortly after deployment. If not, we should set an expectation of how much longer it will be needed for and why.

ovasileva triaged this task as Medium priority.Apr 2 2018, 10:36 AM

Sounds good to me. @Tbayer - any concerns?

Jdlrobson set the point value for this task to 3.Apr 3 2018, 4:58 PM

From discussion, it seems like this would entail:

  • Removing:
    • reducers/eventLogging
    • changeListeners/eventLogging
    • the wiring code in index.js
    • the corresponding unit tests
  • Other ammends
    • See description AC for more items
    • Check for opportunities on simplification on the actions as they are probably adding things (like timedAction) just for the eventLogging reducer
      • This can be done by checking which action types the reducers/eventLogging was acting on, and check where they are triggered in the actions.js, and seeing if the info there is needed in other usages in other reducers.

Added the extra details to AC

Sounds good to me. @Tbayer - any concerns?

No concerns - I assume that we'll be able to re-add the instrumentation later (sans the A/B logic) without too much effort in case we want to do any further measurements (say about the use by logged-in editors, once previews have been made the default for new accounts).

BTW, why is T184793: [EPIC] Instrument page interactions marked as a subtask of this one?

(Sorry, missed this ping while preparing to head off for vacation two weeks ago.)

No concerns - I assume that we'll be able to re-add the instrumentation later (sans the A/B logic) without too much effort in case we want to do any further measurements (say about the use by logged-in editors, once previews have been made the default for new accounts).

If that is the plan, then this task needs to change shape, since we wouldn't actually be removing instrumentation code, only the AB testing part.

To be clear before we edit the task. I understand that:

  • we will need the same instrumentation and schema that we have now, with sampling rate
  • we won't need the AB testing code that buckets people into control/on/off

If it gets confusing we should discuss this in grooming or planning or some meeting.

BTW, why is T184793: [EPIC] Instrument page interactions marked as a subtask of this one?

I think that's just marking that this task is blocked by page interactions being instrumented appropriately. Saying that we wouldn't remove the event logging code without the virtual page views instrumentation being in place.

No concerns - I assume that we'll be able to re-add the instrumentation later (sans the A/B logic) without too much effort in case we want to do any further measurements (say about the use by logged-in editors, once previews have been made the default for new accounts).

If that is the plan, then this task needs to change shape, since we wouldn't actually be removing instrumentation code, only the AB testing part.

To be clear before we edit the task. I understand that:

  • we will need the same instrumentation and schema that we have now, with sampling rate
  • we won't need the AB testing code that buckets people into control/on/off

I think what Tilman was saying is that we can go ahead and remove all of the instrumentation but to make sure that we can still bring back on at some point in the future. Since the feature is already out everywhere, even in this future scenario we won't be doing any a/b tests. @Tbayer - correct me if I'm interpreting that incorrectly!

No concerns - I assume that we'll be able to re-add the instrumentation later (sans the A/B logic) without too much effort in case we want to do any further measurements (say about the use by logged-in editors, once previews have been made the default for new accounts).

If that is the plan, then this task needs to change shape, since we wouldn't actually be removing instrumentation code, only the AB testing part.

To be clear before we edit the task. I understand that:

  • we will need the same instrumentation and schema that we have now, with sampling rate
  • we won't need the AB testing code that buckets people into control/on/off

I think what Tilman was saying is that we can go ahead and remove all of the instrumentation but to make sure that we can still bring back on at some point in the future. Since the feature is already out everywhere, even in this future scenario we won't be doing any a/b tests. @Tbayer - correct me if I'm interpreting that incorrectly!

Yes, that's what I meant. (I.e. "re-add" was referring to the code.)

make sure that we can still bring back on at some point in the future.

Is it worth removing then? I worry that if removal is 3 points, adding it back will be 5 or beyond.

I'd day yes. It's a depreciating asset if it's not deployed.
https://twitter.com/ritam/status/986994497325359104?s=19

We can of course bring it back but we should be mindful that it will cost us.

If we are planning to use it in the next month I'd say it makes sense to keep it but if we are hoping to use it in say a year that's very unrealistic.

Note as the code is written we have no way to run a/b tests without disabling the feature for at least 50% of the codebase so if we do want to use this again that will require substantial effort. Collecting data is a little less effortless.

I want to make absolutely clear that it is not effortless to readd this and it is not responsible to keep this lying around.

@Tbayer with what percentage confidence can you say we are going to need to use this again say within the next year? If so what form can we expect this? A/b test? Collecting a week of data? Other?

https://twitter.com/ritam/status/986994497325359104?s=19

@Jdlrobson, I hear you but I'll also feel like Sisyphus if we put it back in and since it's already written, it's not a case of YAGNI in considering what to build. I don't object necessarily to removing it but what do you think about conditionalizing the code instead so that Uglify can identify it as dead and remove it in the build product for us? We'll still have the cognitive overhead but users will get the size benefit. The other caveat would be that we should re-enable the code in anticipation of wanting instrumentation since the size increase will affect the numbers in itself.

I'd day yes. It's a depreciating asset if it's not deployed.
https://twitter.com/ritam/status/986994497325359104?s=19

We can of course bring it back but we should be mindful that it will cost us.

I think everyone knows that this would involve some work, but it has been my assumption that it would be orders of magnitude less effort than coding a new instrumentation usually requires. Does that sound plausible?

If we are planning to use it in the next month I'd say it makes sense to keep it but if we are hoping to use it in say a year that's very unrealistic.

Unrealistic to keep the code or unrealistic to re-add it? (And is my understanding correct that the code has been live and unused on all Wikipedias except dewiki and enwiki for more than eight months already?)

Note as the code is written we have no way to run a/b tests without disabling the feature for at least 50% of the codebase

Yes, this was already discussed by @Jhernandez, @ovasileva and myself above.

so if we do want to use this again that will require substantial effort.

As noted above, we would not be bringing back the A/B test bucketing part.

Collecting data is a little less effortless.

Less effortless compared to what?

I want to make absolutely clear that it is not effortless to readd this and it is not responsible to keep this lying around.

@Tbayer with what percentage confidence can you say we are going to need to use this again say within the next year? If so what form can we expect this? A/b test? Collecting a week of data? Other?

Please (re)read the comments above, where I already described a possible example situation for illustration and said it wouldn't be an A/B test. As another example, I would also like to point out that we know very little so far about how usage of page previews differs between languages.

I totally hear you on the cognitive overhead for developers caused by unused code, but I also want us to be mindful of the cognitive overhead that has already been caused by this discussion itself and would be increased further by doing the probability estimation work suggested here (percentage of reusage likelihood over time), which would require more planning and discussion between (at least) @ovasileva and myself.

I think @Niedzielski makes a great point about separating the cognitive overhead for developers from the resource overhead for users.

I have taken a closer look at the task description, and updated and clarified it a bit.

I couldn't quite make sense of the questions, so I'm leaving them here for now with some comments:

Questions

  • When can we archive the Popups tables?

I'm not familiar with a standard process or practice of archiving EventLogging tables, can the person who added this question give some more background? (There was a one-time effort sometime last year where some large EL tables were moved to Hadoop to save size, but that doesn't really apply here considering that the recent Popups tables are already exclusively in Hadoop.)

  • What does the EventLogging instrumentation tell us that the performance instrumentation doesn't?

We can spend some time going into detail about this, but isn't it a bit late to ask that question? If it had been possible to use the performance instrumentation to answer all research questions that Popups was created for, we could have done so much earlier.

I think everyone knows that this would involve some work, but it has been my assumption that it would be orders of magnitude less effort than coding a new instrumentation usually requires. Does that sound plausible?

We have git history now, so restoring the current setup will be a lot easier than writing from scratch, yes, but I wanted to make clear that we should expect getting it up and running again to not be trivial regardless of whether we keep it or not - maybe 2-4 weeks. It sounds like that's clear so we're good.

I don't object necessarily to removing it but what do you think about conditionalizing the code instead so that Uglify can identify it as dead and remove it in the build product for us? We'll still have the cognitive overhead but users will get the size benefit. The other caveat would be that we should re-enable the code in anticipation of wanting instrumentation since the size increase will affect the numbers in itself.

From my experience, I wouldn't recommend this. Removing the code actually provides a better path, as restoring the code allows you to revert the patchset that removed it (and understand and address where code has changed that may have broken it). I honestly don't see any advantage to leaving the code in the codebase switched off and unmaintained.

@Tbayer thanks for pointing out the questions and your edits. It seems those questions date back to a year ago hence the confusion: https://phabricator.wikimedia.org/T173952#3547965 - personally this is why I don't like creating a task prematurely as it leads to confusion.

We'll split out the removal of the A/B testing code and the instrumentation code. We are still undecided about the latter.

Jdlrobson renamed this task from Remove A/B testing and instrumentation code to Remove A/B testing instrumentation code.Apr 25 2018, 4:28 PM
Jdlrobson updated the task description. (Show Details)

Change 430816 had a related patch set uploaded (by Jdlrobson; owner: Jdlrobson):
[mediawiki/extensions/Popups@master] Remove A/B testing code

https://gerrit.wikimedia.org/r/430816

Change 431759 had a related patch set uploaded (by Pmiazga; owner: Pmiazga):
[operations/mediawiki-config@master] Remove unused PopupsAnonsExperimentalGroupSize config variable

https://gerrit.wikimedia.org/r/431759

Change 430816 merged by jenkins-bot:
[mediawiki/extensions/Popups@master] Remove A/B testing code

https://gerrit.wikimedia.org/r/430816

Jdlrobson updated the task description. (Show Details)

@pmiazga can you go through the acceptance criteria and check off the boxes?

Over to you @ABorbaWMF for testing.

Working on it now. Created a new sheet in the previews smoke test: Smoke Test - Previews - May 18 Beta
https://docs.google.com/spreadsheets/d/1CdHhFDoTOfbLDCuwVViAWSFlduNE88h3mCbSKIRbKcs/edit?usp=sharing

I have gone through 2 browsers so far and mostly everything is working. Here are some issues that I am seeing.

Non-free image - on the test URL below the link to the non-free image is showing a preview with no image. However, it does seem to work elsewhere, for example, hovering over the Wikimedia Foundation link in a search results page.

Parenthesis - Again on the URL below, the link called 'page previews test ()' shows the text within the parenthesis. However, hovering over the 'PreviewsNonFreeImage/sandbox' link does not show the text within the parenthesis.

Span - Same URL, the link called 'page previews test span' shows the code in the preview (<a href="/foo"> Foo </a>)

https://en.wikipedia.beta.wmflabs.org/wiki/Category:Page_Previews

Feedback below:

Non-free image - on the test URL below the link to the non-free image is showing a preview with no image. However, it does seem to work elsewhere, for example, hovering over the Wikimedia Foundation link in a search results page.

The image is too small and purposely being excluded

Parenthesis

Have opened up T194314

Span

This is working correctly. It matches what I see on https://en.wikipedia.beta.wmflabs.org/wiki/Page_previews_test_span

ovasileva added a subscriber: ABorbaWMF.

thanks @ABorbaWMF! @Jdlrobson - over to you for some of the signoff steps.

SWAT scheduled for an hour from now... then I'll resolve.

Change 431759 merged by jenkins-bot:
[operations/mediawiki-config@master] Remove unused PopupsAnonsExperimentalGroupSize config variable

https://gerrit.wikimedia.org/r/431759

Mentioned in SAL (#wikimedia-operations) [2018-05-30T13:44:43Z] <pmiazga@tin> Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:431759|Remove unused PopupsAnonsExperimentalGroupSize config variable (T173952)]] (duration: 01m 02s)

Mentioned in SAL (#wikimedia-operations) [2018-05-30T13:46:36Z] <pmiazga@tin> Synchronized wmf-config/InitialiseSettings-labs.php: SWAT: [[gerrit:431759|Remove unused PopupsAnonsExperimentalGroupSize config variable (T173952)]] (duration: 01m 01s)

Jdlrobson updated the task description. (Show Details)