Page MenuHomePhabricator

[Spike 16hrs] Investigate opt-in audience and instrumentation
Closed, ResolvedPublic

Description

Background

We would like to be able to serve AMC to both logged-in and logged-out users with and without JS enabled. The priority of this is as follows:
Low: non-JS logged-out users
Medium: JS logged-out users, non-JS logged-in users
High: JS logged-in users

Questions to answer:

  • How difficult is it to serve the treatment and preserve the setting across wikis and sessions for the user groups mentioned above?
    • How would this be done?
  • How difficult is it to instrument the opt-in button for the user groups mentioned above? (Probably needs to use EventLogging in order to enable the data analysis planned for T210660: [EPIC] AMC Metrics .)
    • How would this be done?
Additional questions
  1. Is there a way to store information on the client side for all wikis? (probably not)
  2. Do we know how large the non-JS, 100+ edit / month audience is?
  3. Do we know how large the advanced logged-out audience is?
  4. How do we pass AMC opt-in to the special pages?

Notes

Notes:
Traffic / caching concerns for:

  • anon non-JS

No traffic / caching concerns for:

  • logged in JS / non-JS
  • anon JS

Idea: Combine mobile beta opt-in with AMC opt-in?

Event Timeline

ovasileva created this task.
ovasileva renamed this task from [Spike] Investigate opt-in audience and instrumentation to [Spike 16hrs] Investigate opt-in audience and instrumentation.Dec 5 2018, 5:16 PM

Is the opt-in/out status going to be stored in the user preferences (for logged-in users)? In that case we could first look at what the existing PrefUpdate schema can give us regarding the instrumentation.

@pmiazga and I discussed various aspects of this today, he is going to write up some things here, and I will follow up with other details. But to note one thing already as a direct followup on today's meeting:

We think that the existing PrefUpdate schema could work well for the purpose of instrumenting opt-ins/outs for logged-in users. (It should be noted though that it currently does not record mobile beta opt-ins/outs, which could otherwise be a model here.)

Summary

There is no difference between js and non-js users as Special:MobileOptions page works for non-js users. Storing AMC preference can be done on the server side. The system can track analytics events on the server side (see Important notes section). The Advanced Mobile Contributions mode can be handled the same way as MobileBeta Mode. but there are some concerns (please see concerns section at the end of comment).

How difficult is it to serve the treatment and preserve the setting across wikis and sessions for the user groups mentioned above?
How would this be done?

We can store user preference for logged in users and keep the cookie for anonymous users. MediaWiki and MobileFrontend provide us with everything we need.
I would say that preserving user settings via user preferences is easy to do,
To handle anonymous users we need to use a cookie, I would advise to reuse the optin cookie and store additional information in the cookie (store pipe separated set of enabled features). Instead of using MobileContext we need to provide a new class that handles both AMC and Beta modes, plus varnish config has to be updated. I would say that difficulty is medium.

How difficult is it to instrument the opt-in button for the user groups mentioned above? (Probably needs to use EventLogging to enable to planned data analysis.)
How would this be done?

We can track user opt-in/opt-outs on the server side. We need to call EventLogging::logEvent( $schema, $revision, $event );. We want to check user retention, and it can be easily done for logged in users. Furthermore, we can reuse the PrefUpdate schema, but first, we need to fix the WikimediaEventsHooks class (see Important Notes section).
Tracking anonymous users is a bit more difficult but still doable, please check *Tracking anonymous users* section for more information).
I would say that instrumentation is an easy-to-medium difficulty.

  1. Is there a way to store information on the client side for all wikis? (probably not)

We can set a cookie for the top domain (same as beta mode)

  1. How do we pass AMC opt-in to the special pages?

Same as beta mode - use user options or cookie, please check MobileContext::isBetaGroupMember(), on the client side we can pass js config var.

Stats

All mobile beta daily pageviews (T182235#4752852)

There are currently around 130k mobile beta pageviews per day, corresponding to 0.5% of all mobile web page views.

Logged in mobile beta users (T182235#4752911)

There are currently around 60k logged-in mobile beta pageviews per day, corresponding to 7% of all logged-in mobile web page views. A bit over half of all mobile beta pageviews are by anonymous users.

Logged in browsing - (T211142#4799703)

On enwiki, only 0.83% MAIN_NS namespaces browsing are performed by logged in users

We can assume that the AMC user group will not be bigger as a logged-in user group (which is still less than 1% of all page views).

Beta mode

First, let me explain how beta mode works:
The Mobile Beta Mode is a particular mode when users get access to exclusive features not available as part of the default experience. The Beta Mode is available for both anonymous and logged in users. The selection is available in Special:MobileOptions page. MobileFrontend stores the user selection in cookie and user options table.

Beta mode cache avoidance

We use Varnish to cache and serve HTML responses, and Varnish does not cache logged-in users requests. Each user request goes to Varnish, and then to the PHP server. We add optin=beta when building request key hash but later we serve the non-cached response anyway. In short, if optin cookie is set, servers return non-cached page. If user is logged-in, we also serve non-cached page.

Browsing mobile pages

The beta mode is detected by performing checks:

  • if mobileaction is defined as beta or stable - selected mode will be used
  • if a user is anonymous, the value from optin cookie will be used
  • if a user is logged in, mfMode user option will be used
  • if the mfMode option is not defined, system will fall back to the cookie check
Storing beta mode

When a user enables beta mode, the Special:MobileOptions gets submitted which triggers the request to PHP server.
PHP sets mfMode user option (only for logged in user) and sets the optin cookie (both for anons and logged in).

Tracking anonymous users

To track user retention we need to identify somehow which events come from same users. We cannot use client pageToken nor sessiontoken as those stay in the browser only for the current pageview/session. When a user enables the AMC, we could store some unique identifier in local storage, and then send that identifier with every opt-in/opt-out request (it will have to be passed to the server on Special:MobileOptions page). But that value might be identifying. We don't want to assign any identifiers to users.
Instead, we can store last AMC opt-in/opt-out date in local storage. When user opt-in for the first time we send event with lastActionDate=null and we store in local storage the current date. Then on every opt-in/opt-out we will send the lastActionDate=localStorage.get('amc.lastactiondate') with the event, and then override the local storage mc.lastactiondate to the current date.
Each event will have current date, and the time of last action date which should allow us to track events chain (when given browser opted in/opted out). Checking retention rate for anon users is going
be difficult for the analyst (creating a query take takes into consideration dates), but it's possible.

Important notes:

  • beta/stable mode can be overridden by passing mobileaction={MODE} where the mode is one of beta, stable.
  • Because server stores beta mode via user option we can use the PrefUpdate schema because User::saveSettings() triggers the UserSaveSettings hook and WikimediaEvents listens to this hook and triggers PrefUpdate analytics event. The problem is that WikimediaEventsHooks::onUserSaveOptions() does an extra check and sends event only when preferences save comes from Special:Preferences page or from API action=options call.
  • If we decide to support anon users we need to provide a safety switch to disable opt-in for anon users. Just in case feature becomes super popular. The more users enable the feature, the more requests will skip varnish cache and will be asked to handle by PHP servers.

Concerns

Enabling AMC as another opt-in feature for logged-in users looks like an easy thing to do. Even doing that for anonymous users doesn't look that difficult. The real problem is the maintenance of the new feature. If AMC becomes an opt-in feature for both logged-in and anon users those are main concerns:

  • there are way too many versions of mobile pages to test (anon, anon beta, anon AMC, anon beta+amc, logged-in, logged-in beta, logged-in AMC, logged-in beta+amc) - 8 different versions of the same page
  • what if this becomes super popular (scaling risk, not sure how many people are going to use it
  • It looks like this is another beta mode? We had an alpha mode, and we removed that.
  • Maybe can put that as a beta feature? That approach solves 'way too many versions' problem. Additionally, there will be almost no work required to enable opt-in. The only required action is to fix beta opt-in/opt-out instrumentation which is relatively easy to do.
  • cache fragmentation (most probably we are not going to modify the article content, only the mobile UI, what if we decide to store that in cache because too many users will use this feature)?
  • complexity between switching workflows (beta/AMC/standard, the more modes we have, the easier is to miss something)
  • Shared devices. When someone enables the feature (doesn't matter if anon/logged-in), the setting is stored in the cookie. It means that every person who is going to use that device will have the AMC feature enabled. This might be confusing, the beta mode has the same problem.
  • It looks like AMC is all or nothing (there is no granularity when it comes to feature selection). If we decide to allow users to opt-in for some features, this will double the complexity.

There is no difference between js and non-js users as Special:MobileOptions page works for non-js users.

Does this mean there is no JavaScript executed on Special:MobileOptions? If there is JavaScript, it would seem to me that we have different options available to us on the server and client.

We can store user preference for logged in users and keep the cookie for anonymous users. MediaWiki and MobileFrontend provide us with everything we need.
I would say that preserving user settings via user preferences is easy to do,
To handle anonymous users we need to use a cookie, I would advise to reuse the optin cookie and store additional information in the cookie (store pipe separated set of enabled features). Instead of using MobileContext we need to provide a new class that handles both AMC and Beta modes, plus varnish config has to be updated. I would say that difficulty is medium.

I want to summarize this in my own words to make sure I understand it. If you would correct any mistakes, I'd appreciate it:

Revise(?) [[ https://phabricator.wikimedia.org/diffusion/EMFR/browse/master/includes/MobileContext.php | MobileContext ]] to handle AMC. The existing optin cookie referenced in MobileContext can be repurposed (use pipe delimited values for features enabled) to support AMC *for anon users only* and a User option for logged in users only. If AMC is enabled per the cookie / user option OR URL parameter, render advance mode. Otherwise, render standard mode. Both AMC and standard modes must retain beta compatibility. On the JavaScript side, I guess we just reference the cookie via mw.storage and the option via mobileoptions.

What needs to be updated in the varnish config? Something in here?

Were there any alternatives considered, additional MediaWiki references to use, or similar code in our extensions / Core? Soon, we'll implement this functionality and want these notes if they exist.

We can track user opt-in/opt-outs on the server side.

Can events logged on the server AND on the client be tied together? For example, if I Iogged that a user visits the mobile options page on the server and makes a change on the client, can we recognize in the logs that both events are from the same user?

Beta

As I understand from your comment, the following pages are uncached:

  • Any special pages
  • Any page viewed as a logged in user
  • Any page viewed with beta mode enabled

And you propose that any AMC page also be uncached.

Does this mean there is no JavaScript executed on Special:MobileOptions? If there is JavaScript, it would seem to me that we have different options available to us on the server and client.

There is Javascript on Special:MobileOptions (to make toggles look nice and to autosubmit the form when you toggle the beta mode). What I want to say, there is no Javascript required to opt-in to beta, also there is no JS tracking, everything (opt-in/opt-out) and AnalyticsEvents tracking are done on the server side.

Revise(?) MobileContext to handle AMC. The existing optin cookie referenced in MobileContext can be repurposed (use pipe delimited values for features enabled) to support AMC *for anon users only* and a User option for logged in users only.

The cookie is set both for anons and logged-in users. This is the way to preserve settings when you log-out (it's my assumption). A user option is stored for logged in users only.

If AMC is enabled per the cookie / user option OR URL parameter, render advance mode. Otherwise, render standard mode. Both AMC and standard modes must retain beta compatibility.

Correct

On the JavaScript side, I guess we just reference the cookie via mw.storage and the option via mobileoptions.

On the Javascript side, we refer to the config var (passed from the server). It's done like that because we can disable the beta functionality. If we disable the beta mode (via MFEnableBeta) we ignore both user option/cookie, we always return "stable" mode.

What needs to be updated in the varnish config? Something in here?

We need to update | /text-common.inc.vcl.erb#L170-L178 to match new logic

Can events logged on the server AND on the client be tied together? For example, if I Iogged that a user visits the mobile options page on the server and makes a change on the client, can we recognize in the logs that both events are from the same user?

Yes, events can be logged on both sides, but I'm not sure if this is possible to identify that both events (server and js) comes from the same user as we try to make events not identifying.

As I understand from your comment, the following pages are uncached:

We do not cache requests for:

  • any page viewed as a logged in user
  • any page viewed with beta mode enabled

Yes, I propose that any AMC page will also skip cache. I talked with Traffic and looks they will be ok with ~1% traffic increase (but we need to provide better numbers, I just verified is it possible to skip cache for some small amount of requests).

@alexhollender @ovasileva
I think the devs have a good understanding about this now, and we need to work out a direction before we start cutting user stories. We have several options on the table. It's worth noting, that the anon mode can be added at any time in this project so there are likely different flavours of these options. It's feasible but comes with a cost.

  1. Use beta for AMC mode.

We can either rebrand beta as AMC mode OR we can use it to start shipping features early.
PROS: Very little work on our side; no need for server side preferences; all users benefit from the need mode; can start building features right away; compatible with all other 3 options (we can change our mind at any point); only 4 testing modes: anon, anon+AMC, logged in, loggedin+AMC
CONS: No beta mode (could lose experimentation ground for non-AMC things); Need to work out what to do with existing beta features (remove or keep)

  1. Build anon-AMC mode alongside beta (now)

PROS: Both anons and logged in users can benefit from both a beta and AMC mode; difficult problems ironed out early in project;
CONS: High levels of complexity; Lots of edge cases; Big maintenance cost 9 testing modes (anon, anon beta, anon+amc, anon+beta+amc, logged in, logged-in + beta, logged-in + amc, logged-in + beta + amc); not done before, risk of unknown unknowns; every release will have a higher testing cost due to the 8 different modes; Start of project likely to suffer from delays; risky given we will need to update instrumentation and manage existing beta opt-ins

  1. Build anon-AMC alongside beta (later)

PROS: We can do an extended QA on all features in one big go; can ship quicker and faster (due to lower QA cost);
CONS: Anon users do not benefit from mode until later; maintenance cost of testing 9 modes (see #2);

  1. Don't build anon mode at all - logged in only

PROS: reduced maintenance cost (less modes to test);
CONS: anons do not benefit from mode; Need to maintain 6 modes (unlike #1): anon, anon beta, logged in, logged-in + beta, logged-in + amc, logged-in + beta + amc

@Niedzielski @pmiazga @nray @Jdrewniak have I missed any other options? If so let me know on Slack and I'll update this comment (so it's retained at the bottom of the task as a summary)

@Jdlrobson sounds good to me. The only thing I see is that point

4: Don't build anon mode at all

We will have to support: anon, anon beta, logged in, logged-in + beta, logged-in + amc, logged-in + beta + amc. This approach reduces cost a bit (but not having to support anon amc, anon amc + beta).

...

Can events logged on the server AND on the client be tied together? For example, if I Iogged that a user visits the mobile options page on the server and makes a change on the client, can we recognize in the logs that both events are from the same user?

Yes, events can be logged on both sides, but I'm not sure if this is possible to identify that both events (server and js) comes from the same user as we try to make events not identifying.

To clarify, the PrefUpdate schema does log the user ID (see documentation). (@Niedzielski , by "makes a change on the client", did you refer to making an edit to a wiki page, or were you talking about a hypothetical new schema logging preference changes on the client side?)

Summary

...

Tracking anonymous users

To track user retention we need to identify somehow which events come from same users. We cannot use client pageToken nor sessiontoken as those stay in the browser only for the current pageview/session. When a user enables the AMC, we could store some unique identifier in local storage, and then send that identifier with every opt-in/opt-out request (it will have to be passed to the server on Special:MobileOptions page). But that value might be identifying. We don't want to assign any identifiers to users.

Just to avoid confusion, this sentence refers to *anonymous* editors (we do of course assign identifiers to logged-in users, namely their public user name and ID).

Instead, we can store last AMC opt-in/opt-out date in local storage. When user opt-in for the first time we send event with lastActionDate=null and we store in local storage the current date. Then on every opt-in/opt-out we will send the lastActionDate=localStorage.get('amc.lastactiondate') with the event, and then override the local storage mc.lastactiondate to the current date.
Each event will have current date, and the time of last action date which should allow us to track events chain (when given browser opted in/opted out). Checking retention rate for anon users is going
be difficult for the analyst (creating a query take takes into consideration dates), but it's possible.

It's not terribly difficult per se, assuming that every opt-out event comes with the date of the preceding opt-in. But the resulting data is going to be more brittle than for logged-in users, for example because we have no way to distinguish between retained anonymous users and those who lost their cookie/amc.lastactiondate value and opted in again with lastActionDate=null.

Let's discuss this in grooming today. To satisfy the goals of the project, I think our best bet is to go with either option 3 or 4 from T211195#4822270

(@Niedzielski , by "makes a change on the client", did you refer to making an edit to a wiki page, or were you talking about a hypothetical new schema logging preference changes on the client side?)

I suppose either. For example, can we track a workflow that starts on the server and concludes on the client?

(@Niedzielski , by "makes a change on the client", did you refer to making an edit to a wiki page, or were you talking about a hypothetical new schema logging preference changes on the client side?)

I suppose either. For example, can we track a workflow that starts on the server and concludes on the client?

Technically, it is possible, but AFAIK we shouldn't track events in the way they can identify the user. The backend tracks user preference changes by passing the userId property to PrefUpdate schema. We could do the same on the client side (we talk about logged in users). I think @Tbayer can answer your question in a much better way.

I wouldn't focus on anon users as it's a bit more difficult and most probably we will abandon supporting anons.

We're doing logged in only. No changes to beta mode. Kind of 3 and 4. Needs more detail in task description.

We will begin with option 4 and reevaluate at a later time whether we want to continue with option 3 (adding anonymous users) later. Resolving this for now. Thanks for all your great research here!

  1. Don't build anon mode at all - logged in only

PROS: reduced maintenance cost (less modes to test);
CONS: anons do not benefit from mode; Need to maintain 6 modes (unlike #1): anon, anon beta, logged in, logged-in + beta, logged-in + amc, logged-in + beta + amc

@Jdlrobson, do we need to build server and client implementations for the above configurations?