Page MenuHomePhabricator

[SPIKE 4hrs] What is technically feasible in terms of logged-in/logged-out users?
Closed, ResolvedPublicSpike

Description

Description

We would like to explore the feasibility of providing opt-out and A/B testing capabilities to logged-out users throughout the course of the desktop improvements project

User stories

As a product team, we would like to A/B test desktop improvements changes (such as changing the location of the language switcher) on all users (logged-in and logged-out), so that we can determine whether the change has any positive or negative impacts

As a product team, we would like to allow all users (logged-in and logged-out) to be able to opt-out of the new experience, so that we can measure the retention rate for the new treatment

Acceptance criteria

Research the following questions (and consult with relevant parties such as the Traffic team):

  • Is it possible and what is the relative difficulty of providing all logged-out users with an opt-out option for the proposed improvements
    • Does this change based on how many wikis we are testing on? (i.e. what is the maximum number of pages that we can split the cache for at any given time, and for how long?)
  • Is it possible and what is the relative difficulty of performing per-user or per-session A/B test on logged-out users?
    • Does this change based on how many wikis we are testing on? (i.e. what is the maximum number of pages that we can split the cache for at any given time, and for how long?)
  • Is it possible to do A/B/C testing for logged-out users?
  • Is it possible to do A/B/C testing for logged-in users?

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptSep 23 2019, 10:57 AM
ovasileva triaged this task as Medium priority.Sep 23 2019, 10:57 AM
ovasileva updated the task description. (Show Details)Sep 25 2019, 4:19 PM
Jdlrobson added subscribers: Nuria, BBlack.EditedSep 25 2019, 4:28 PM

Historically, serving different HTML in Varnish for different anonymous users has not been possible and not very advisable. I'm not sure if that's changed. I think the only place we vary in the site is the mobile beta mode, which varies by a cookie for anonymous users but on a very tiny audience (in the thousands).

To provide different experiences for different anonymous users we'd need to split the Varnish cache for each change - @BBlack would probably be the person to talk to about what's feasible and what's not and provide you background on why. In respect to A/B testing, @Nuria did some exploratory work with Brandon on A/B tests on anonymous users a while back, so I'd advise chatting to her also (see T135762). I'd suggest meeting with them both outlining what we'd like to do, and reporting back here what you've learned.

I'd suggest some reading of https://www.mediawiki.org/wiki/Manual:Varnish_caching and https://wikitech.wikimedia.org/wiki/Varnish (and anything else you can dig up) before going into that meeting.

ovasileva renamed this task from [SPIKE] What is technically feasible in terms of logged-in/logged-out users? to [SPIKE 4hrs] What is technically feasible in terms of logged-in/logged-out users?.Sep 25 2019, 4:29 PM
ovasileva updated the task description. (Show Details)Oct 10 2019, 9:21 AM

@ovasileva

  • do we want to A/B test it per page or per the whole wiki?
  • do we know what is going to be our population size/which wikis do we want to A/B test? (like do we want to test on 100 users, or do we want to test on multiple wikis on 100 thousand anon users/sessions).
CDanis added a subscriber: CDanis.Oct 10 2019, 7:07 PM
Nuria added a comment.Oct 10 2019, 7:45 PM

I am not sure i get the premise of @ovasileva request. Do we want to serve different treatments to logged in/loggedout users? Thus we do two AB tests. One for logged in users and one for logged out users (four treatments in total). Is that the question?

Some things:

The ability to do AB testing does not depend on whether you are logged in. Sampling for AB tests can be per page (best) or per mediawiki session Id (second best) but in no case would you want to have an AB test in which A group is logged in users and B group is logged out users. That would not work statistically cause assignations to A and B groups are not random.

Is it possible and what is the relative difficulty of performing per-user or per-session A/B test on logged-out users?

Equal to performing AB testing on logged in users, see comment above.

Is it possible and what is the relative difficulty of providing all logged-out users with an opt-out option for the proposed improvements

For logged out users you will likely be asking them more than once as the opt-out cannot be persisted forever (it can be a cookie or a setting on local storage, both of which can be deleted by the user)

Is it possible to do A/B/C testing for logged-out users?
Is it possible to do A/B/C testing for logged-in users?

Yes and yes

@Nuria to be clear, I believe the question being asked here, is can we serve two different sets of HTML to users in desktop

Is it possible and what is the relative difficulty of providing all logged-out users with an opt-out option for the proposed improvements
Does this change based on how many wikis we are testing on? (i.e. what is the maximum number of pages that we can split the cache for at any given time, and for how long?)

This implies opted-out anonymous users would be served different HTML from normal anonymous users.

Is it possible and what is the relative difficulty of performing per-user or per-session A/B test on logged-out users?

Does this change based on how many wikis we are testing on? (i.e. what is the maximum number of pages that we can split the cache for at any given time, and for how long?)

This implies serving different HTML to logged out users in the A/B test. As you point out this could be done on a per page basis but I'm not sure it can be done per session unless you know something I don't?

Nuria added a comment.Oct 10 2019, 8:34 PM

I believe the question being asked here, is can we serve two different sets of HTML to users in desktop

oohhhhh, i see. Let me rephrase: we will be serving treatment A to logged in and logged out users and treatment B to logeged and logged out users. so wether user is logged in has little to do with test.

we would like to A/B test desktop improvements changes (such as changing the location of the language switcher

Going with this example, in our ecosystem the only way i see is to load the language switcher lazily and position once treatment has been determined. So, before doing the AB test you need to refactor how the location switcher works, so you serve to both users the same "core" page but you "dress" it differently for either treatment. This, has a lot more to do with mediawiki client side code than AB testing so you all probably have thought about this already.

This, has a lot more to do with mediawiki client side code than AB testing so you all probably have thought about this already.

Totally. My understanding of this spike however was to investigate whether we can make server side changes to avoid the complexity that comes with these kind of client changes and split the Varnish cache to serve different HTML and that @pmiazga is going to reach out to yourself and @BBlack to see what's feasible (or how far we are from that being feasible). While moving a language switcher via client side might be possible for a temporary A/B test, if the idea is to allow users to opt out of this change permanently it would be better done on the server.

Nuria added a comment.EditedOct 10 2019, 8:56 PM

if the idea is to allow users to opt out of this change permanently it would be better done on the server.

That is not doable at any layer of our stack, for it to be permanent users would need to be authenticated and this setting be one on their account. Cookies and local storage are both temporary measures.

@ovasileva

  • do we want to A/B test it per page or per the whole wiki?

per the whole wiki would be preferable, but if we're restricted to per page, we can work with that as well

  • do we know what is going to be our population size/which wikis do we want to A/B test? (like do we want to test on 100 users, or do we want to test on multiple wikis on 100 thousand anon users/sessions).

Similar to above - we'd like to begin testing on smaller wikis with a limited size, but would like to get an idea of the maximum number of users we can test on for any wiki

Nuria added a comment.Oct 11 2019, 5:02 PM

@Jdlrobson I wouldn't consider Varnish your "server" here. Varnish t is the caching layer and as such I would go towards the path to do application changes in the application layer, namely, mediawiki.

@Jdlrobson I wouldn't consider Varnish your "server" here. Varnish t is the caching layer and as such I would go towards the path to do application changes in the application layer, namely, mediawiki.

Sure. However if we are making changes in mediawiki for anonymous users that will generate HTML that is then cached by varnish. So we'd need some way of splitting the cache via a header for example to allow that. As you confirm in
https://phabricator.wikimedia.org/T233609#5564523 this is currently not possible. We do however do this on mobile in the beta mode so it's doable just at certain size audiences...

Nuria added a comment.Oct 11 2019, 5:17 PM

To clarify: Anonymous users requests for pages are cached by varnish, logged in requests are not.

phuedx added a comment.EditedOct 14 2019, 4:17 PM

We do however do this on mobile in the beta mode so it's doable just at certain size audiences...

Thanks for all the clarification both 🙂

If you're wondering, Varnish will pass requests from users who've opted into the mobile site's beta mode regardless of they're logged in or out: https://gerrit.wikimedia.org/g/operations/puppet/+/9bd38222816a1e8a252ace2687aaae4a94531b16/modules/varnish/templates/text-common.inc.vcl.erb#170

As @Jdlrobson alludes to the size of that cohort is vanishing, e.g. on 3rd October 2019 some 0.04% of all pageviews on the mobile site were by users who'd opted into the mobile site's beta mode:

+---------------+-----------+--------+
| access_method |     n     | n_beta |
+---------------+-----------+--------+
| desktop       | 337026632 |     72 |
| mobile web    | 325173634 | 144836 |
+---------------+-----------+--------+

select
    access_method,
    sum(1) as n,
    sum(if(x_analytics_map['mf-m'] = 'b' or x_analytics_map['mf-m'] = 'b%2Camc', 1, 0)) as n_beta
from
    wmf.webrequest
where
    year = 2019
    and month = 10
    and day = 3

    and is_pageview
    
    and (access_method = 'mobile web' or access_method = 'desktop')
group by
    access_method
;
Nuria added a comment.Oct 14 2019, 4:34 PM

If you're wondering, Varnish will pass requests from users who've opted into the mobile site's beta mode regardless of they're logged in or out:

Right, that is a cookie based checkup which means that is really not permanent. It will be "as permanent" as those cookies are in the user's device, i think everyone here is aware of this fact but just re-stating. For a change to be truly permanent it needs to be linked to the user's account. I think completely by passing varnish (with the perf implications that it entitles) on a user interface project will be unwise. I would encourage the team to think of solutions that can be driven from the UI entirely.

MBinder_WMF changed the subtype of this task from "Task" to "Spike".Oct 15 2019, 5:10 PM

Logged-in users

A/B testing logged-in users is straightforward. We can easily do that on the server-side, and there is no need for any caching layer changes as all logged-in users requests skip cache.
The only required work is the backend (PHP) work.

Logged-out users

This case is pretty complicated as anon requests are cached in Varnish. Backend servers generate response one, and all users browsing the same page see the same response

To A/B test logged-out users, we need to use some mechanism, eg cookie, to skip the caching, or split the cached responses. There are three main problems with A/B testing anon-users

  • privacy concerns, as we need to mark unique user sessions with some token through all requests
  • additional complexity on the varnish side
  • Successful A/B test is when a single user is assigned to the same group during the whole experiment. This is not possible when A/B testing is executed on the Varnish/Backend sides.

There is an additional way we could test Logged-out users, to switch the default skin to new, and monitor their behavior in some short period. After changing the skin, we would need to wipe the entire varnish cache for the given wiki. On small wikis, it's not a problem, but on English wiki, it requires a bit more planning as wiping varnish cache can lead to service unavailability.

In 2016 we already tried to do an A/B testing framework. However, the idea was abandoned mostly because of the fact it's not possible to pin given user to one group during the experiment and the additional complexity on the caching layer. For more information, please refer to [T135762]
Also https://docs.google.com/document/d/1jRGjVAthJXoCovxyvXWyg07R1POb8zvD_n8IlJXrPVM/

All previous approaches with server-side A/B testing were
> seasonal issue that's come up every few months for the past couple of years.
[https://phabricator.wikimedia.org/T135762]
and it's pretty deep and unfruitful rabbit-hole

If possible, we should do client-side A/B testing, this removed the complexity of dealing with the Varnish caching layer. However, the problem with pinning single user to the same bucket through the entire experiment remains.

Answers:

Is it possible and what is the relative difficulty of providing all logged-out users with an opt-out option for the proposed improvements

It is possible although very difficult and it's recommended not to do that

Does this change based on how many wikis we are testing on? (i.e. what is the maximum number of pages that we can split the cache for at any given time, and for how long?)

No, until we test on small wikis with a small number of visits.

Is it possible, and what is the relative difficulty of performing per-user or per-session A/B test on logged-out users?

Similar to A/B testing all users. We need to do pretty the same code to support the A/B testing. Also, it's pretty tricky to ping the user to the same group through the entire experiment. The only way to pin users is to use cookies, and those might not be present through multiple sessions. A/B testing a single session is easier, although serving different experience each browsing session is something we should avoid.

Does this change based on how many wikis we are testing on? (i.e. what is the maximum number of pages that we can split the cache for at any given time, and for how long?)

From my knowledge, no, until we test on small wikis.

Is it possible to do A/B/C testing for logged-out users?

Yes, but it's not recommended.

Is it possible to do A/B/C testing for logged-in users?

Yes, this is the only recommended way as all requests are served by backend and we can properly assign users to buckets through the entire experiment.

pmiazga removed pmiazga as the assignee of this task.Oct 17 2019, 5:10 PM
pmiazga added a subscriber: pmiazga.
Nuria added a comment.Oct 17 2019, 6:47 PM

If possible, we should do client-side A/B testing, this removed the complexity of dealing with the Varnish caching layer.

+10,0000

However, the problem with pinning single user to the same bucket through the entire experiment remains.

Just like any experiment you need to rely on mediawikiSessionId, as long as experiments are run for a few days this would be perfectly fine. Also, this is only necessary if you want to AB test segmenting on users.

If you are segmenting on pageviews not even cookies are needed and thus no issue with bucketing being permanent.

Nuria added a comment.Oct 23 2019, 5:29 PM

I corrected couple things on doc, mostly around the fact that the means by which users are signed to treatments is a cookie for not logged users no matter whereever you are segmenting (server or client). The client gives you additional possibilities with local storage as well.

Demian added a subscriber: Demian.EditedFeb 8 2020, 12:40 AM

@Jdlrobson please confirm we use the same terminology:

  • Splitting the cache is to keep two (or more) different versions of the same page in Varnish's memory, thus doubling the memory (and disk swap) usage for the affected pages regardless if one or many users request the alternative (but presumably purged from the cache if only a few users used it). Splitting results in +1 application layer request for each alternative.
  • Passing the cache is to avoid caching altogether and pass every request to the application layer (MediaWiki).

@Demian yes same terminology