Page MenuHomePhabricator

Enable lazy loaded images for 50% of users in production
Closed, DeclinedPublic3 Story Points

Description

We would like to lazy load images for 1% of users.

  • Ensure pageviews are split into three buckets where 50% of users enter the lazy loading images experiment, 50% as control group with the normal experience.
  • Vary HTML cache on lazy load cookies presence
  • Update Navigation timing so it records whether they are in the lazy load experiment or not
  • Enable lazy loading images when user is in bucket

Related Objects

StatusAssignedTask
OpenNone
OpenNone
OpenNone
OpenNone
OpenNone
OpenNone
OpenNone
DeclinedNone
OpenNone
Resolveddr0ptp4kt
DuplicateJhernandez
Duplicatedr0ptp4kt
OpenNone
ResolvedJdlrobson
ResolvedJdlrobson
DeclinedBBlack
ResolvedJdlrobson
Resolvedphuedx
ResolvedJdlrobson
ResolvedJdlrobson
ResolvedJhernandez
ResolvedJhernandez
ResolvedBBlack
ResolvedNuria

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

@Krinkle @Peter good to talk today.
As discussed I've made the required change on our side in MobileFrontend: https://gerrit.wikimedia.org/r/279076
Next steps would be

  1. to get this reviewed/tested and merged (@Krinkle )
  2. Update NavigationTiming to pass this property (I'll do that)
  3. @Peter.Hedenskog setup the required webpagetests
  4. We then need to talk to @BBlack about randomly assigning the LazyLoadImages cookie to all our traffic. 98% of users would get value 'C' (no impact on traffic), 1% value 'A' (the lazy loaded group) and 1% value 'B' (control group). @BBlack we would want to vary cache in a similar way to how we varied on the NetSpeed cookie. We're not sure whether we can randomly assign cookies in this way so would appreciate your input on whether this is possible and if so how difficult this would be to do.

@Krinkle @Peter anything I've missed?

Setting a cookie for 100% of users and then varying on it does not sound like a good plan on a few different levels. Also, what's the diff between groups C and B, and what does "randomly assign" really mean? Cookies are per user agent/device, not per-request. A random-but-fixed subset of IP addresses?

Hi @BBlack let me attempt to explain a little better.

We need to set the cookie for 1% of users to value b (different html markup allowing lazy loaded images). We'd want to set 1% of users the same cookie with a different value (a) to be in a control group. The other 98% of users see the same as control group B. I suggested all users as I was assuming we'd need to but if we can achieve the above some other way...

The change we hope to show improves performance for all users (specifically targetting low bandwidth 2g connections)

Ideally a random subset of IP addresses is needed to represent a broad spectrum of connections.
That said @Krinkle @Peter would it be better to fence this to a certain country?

@phuedx @Krinkle are you able to review https://gerrit.wikimedia.org/r/279076 to move this along?
Thanks in advance.

Change 279076 merged by jenkins-bot:
Allow opt in to lazy loaded images via cookie

https://gerrit.wikimedia.org/r/279076

Jdlrobson updated the task description. (Show Details)Mar 29 2016, 11:25 PM

Change 280354 had a related patch set uploaded (by Jdlrobson):
Pass lazy load image cookie value to NavigationTiming schema

https://gerrit.wikimedia.org/r/280354

Regardless of how and when we randomly assign this to 1% of users, we first need to make sure this doesn't pollute production cache layers when people use this in private testing on production (beta users and WebPageTest).

It seems the MobileFrontend code does not emit a Vary header for this cookie. Should it?

(I'm not too sure with this part of our stack and whether we need to do anything anywhere else...)

@phuedx are you able to take a pass on https://gerrit.wikimedia.org/r/#/c/280354/ (+1ed by Timo)?

Change 280354 merged by jenkins-bot:
Pass lazy load image cookie value to NavigationTiming schema

https://gerrit.wikimedia.org/r/280354

@BBlack, have any suggestions on how this 1% sampling notion might be done or a simpler alternative approach?

I was wondering if the following psuedocode might do the trick in VCL, confining cache variance to just one subdomain instead of all of them. Crew, is there a problem isolating to one domain name for starters?

on request:
if subdomain == 'hi.m.wikipedia.org' && date less than <date set at specific datetime 14 days from the initiation of the change> && cookies don't contain mfLazyLoadImages {
  let randomNumber = random()
1% set inbound cookie mfLazyLoadImages to 'B'
1% set inbound cookie mfLazyLoadImages to 'C'
anything else: noop
}

on response: parlay mfLazyLoadImages cookie to browser (if it was set inbound), with expiration <date set at specific 14 days from the initiation of the change>

With this approach I would anticipate a little bit of the numbers creeping up in buckets B & C as users make repeat visits.

For rollback in case we don't like it: I would think we'd mainly want to have VCL readied (maybe commented out and documented for rollback strategy?) to actively strip the inbound cookie if it exists, and of course delete the old VCL. This would lett the old varied objects fall out of cache naturally.

Jdlrobson moved this task from To classify to sprint 71 on the Reading-Web-Planning board.
BBlack added a comment.EditedApr 14 2016, 5:02 PM

Yes, we have to do something more-specific, and it's probably not a Vary header. For MW+Varnish here, Vary:Cookie is "special", it only really applies to login session/token cookies, and only in the sense that it makes them (logged-in requests) all bypass the cache for content that's Vary:Cookie, in practice.

I'm mostly questioning (a) whether cookies are the ideal way to do this at all (haven't we done A/B tests without them before using javascript random functions and local storage and such?), and (b) whether (if cookies) we really need to set Cookies for all the uninvolved users, and (c) how we'd go about defining percentages of "users" - we don't really track users, so we need some better definition that works, like percentage of IP addresses or something?

a) For this change given the HTML is radically different with lazy loaded images, JavaScript could not be used.
b) We do not need to set cookies for all uninvolved users.
c) Yeh this bit is tricky... percentage of IP addresses is probably enough.

Tgr added a comment.Apr 14 2016, 7:12 PM

If you do use a Vary header, the correct format is $out->addVaryHeader( 'Cookie', [ 'param=mfLazyLoadImages' ] ) so that the Key header will include the cookie name. Not sure if Varnish reads that or some list needs to be updated on that side as well.

I use the onPageRenderingHash hook - is that not enough?
https://gerrit.wikimedia.org/r/#/c/279076/6/includes/MobileFrontend.hooks.php (L386)
This is what we did for the NetSpeed work.

Tgr added a comment.Apr 15 2016, 3:24 PM

PageRenderingHash is about splitting the parser cache, addVaryHeader is about splitting the Varnish cache. Those are fully independent.

Thanks for clarifying @Tgr

If you do use a Vary header, the correct format is $out->addVaryHeader( 'Cookie', [ 'param=mfLazyLoadImages' ] ) so that the Key header will include the cookie name. Not sure if Varnish reads that or some list needs to be updated on that side as well.

For MediaWiki stuff, our Varnishes handle Vary:Cookie in a very non-standard way. It really just means what I said above - if an MW URL outputs "Vary:Cookie", our varnishes will ensure that authenticated users (users with Cookies matching the [Ss]ession|Token= regex) get cache-bypassed fetches of that URL. It doesn't do any of the other things you'd expect of Vary:Cookie.

I use the onPageRenderingHash hook - is that not enough?
https://gerrit.wikimedia.org/r/#/c/279076/6/includes/MobileFrontend.hooks.php (L386)
This is what we did for the NetSpeed work.

For NetSpeed, part of the solution was a varnish hack to vary the cache's hashing. You can see that in text_common_hash here: https://github.com/wikimedia/operations-puppet/blob/production/templates/varnish/text-common.inc.vcl.erb#L95

Change 284080 had a related patch set uploaded (by BBlack):
Split mobile text cache for lazyloadimages testing

https://gerrit.wikimedia.org/r/284080

Restricted Application added a subscriber: TerraCodes. · View Herald TranscriptApr 19 2016, 8:52 PM

@Jdlrobson - can you confirm correct cookie-matching in https://gerrit.wikimedia.org/r/#/c/284080 ? This is to get the cache split so that manual tests of the cookie don't pollute the cache for users without the cookie. The VCL is splitting the cache between requests with a Cookie header that contains mfLazyLoadImages=A and those that do not.

Change 284080 merged by BBlack:
Split mobile text cache for lazyloadimages testing

https://gerrit.wikimedia.org/r/284080

Jdlrobson updated the task description. (Show Details)Apr 20 2016, 5:38 PM
Jdlrobson renamed this task from Enable lazy loaded images for 1% of users in production to Enable lazy loaded images for 50% of users in production.May 11 2016, 10:38 PM
Jdlrobson updated the task description. (Show Details)
Jdlrobson added a subscriber: leila.

@BBlack with the revised description - a 50% split - does this make things more complicated or easier?

About the same. We can guess with an IP address regex, unless analytics was able to provide some last-digit stats for more accuracy.

About the same. We can guess with an IP address regex, unless analytics was able to provide some last-digit stats for more accuracy.

I had sent around last-digit stats in the existing email thread yesterday (Brandon, you were on CC too). To recap:

  1. obviously the last digit idea excludes IPv6 traffic, where there are more than 10 possibilities for the last character in the IP. Should hopefully not be a big deal, but I wanted to call it out - I checked one recent day, where the IPv6 percentage of mobile web human traffic was 12.3%.
  1. As promised, I ran a quick check (data) for some recent days. Again looking at human mobile web traffic, the last digits of IPv4 IPs are not quite randomly distributed (even if one were to take Benford's law into account): 6 seems consistently the most frequent with around 11%, and even numbers are slightly more frequent than odd ones. But I guess it's possible to control for that. Hashing the IP would still be the cleanest option though.

Unfortunately, 12.3% is going to be too low.
@BBlack so my question is how soon can we reserve some time from you/@faidon or @ori to provide this as an option?

Another thought - we could potentially in PHP add a random number chooser that makes 50% of the pages lazy loaded and 50% without.

This saves the delay/pain of a Varnish change but would mean the experiment bucket would not be consistent for the same user. I might view page A with it enabled and then click through to page B without it enabled.

Thoughts @phuedx ? @leila ?

You can't really do this in PHP randomly, as most fetches will be
cache hits and only a small percentage hit PHP (ideally!).

@BBlack but if the correct result is cached in html... either with lazy loaded images enabled or disabled (and records the boolean whether it is or not) won't that still be 50% of page views?

@BBlack and if PHP is not an option - to allow Leila and I to work out a timeline - how soon can we reserve some time from you/@faidon or @ori to provide this as an option in the Varnish layer?

Unfortunately, 12.3% is going to be too low.
@BBlack so my question is how soon can we reserve some time from you/@faidon or @ori to provide this as an option?

To clarify, 12.3% is the ratio of IPv6 pageviews, meaning that we could still apply the last digit logic as envisaged to 87.7% of pageviews. That said, I agree that hashing the entire IP (the idea put forth earlier) would be preferable. Also, we might be able to reuse this in many future A/B tests.

@BBlack but if the correct result is cached in html... either with lazy loaded images enabled or disabled (and records the boolean whether it is or not) won't that still be 50% of page views?

That sounds a bit like https://xkcd.com/221/ ;)

Yeah the 50% applayer doesn't really work :)

We can take a set of IPv4 last-digits that add up to ~50%, and only apply the regex to IPv4 client adddresses. This seems simple enough if, when evaluating the effects, you can also leave out IPv6 users on that end. We'd basically be doing pseudocode like: if (is_ipv4 and ipv4_string ~ '[0-4]$') { turn_on_lazy_loading }. That's trivial - please just give me a week or so lead time to make some small room in my schedule to get it ready. If you want hashed IPs, that's not trivial, even though it's far more useful in the long term to get that ready and working.

Another thought - we could potentially in PHP add a random number chooser that makes 50% of the pages lazy loaded and 50% without.
This saves the delay/pain of a Varnish change but would mean the experiment bucket would not be consistent for the same user. I might view page A with it enabled and then click through to page B without it enabled.
Thoughts @phuedx ? @leila ?

One of your hypothesis is that we will have more pageviews if we offer lazy-loading. To be able to say whether this hypothesis is true or not, it is important that we keep the treatment (lazy-loading) the same for a relatively long period of time for the user/device. Moving back-and-froth between the two options on the device/user level will not be desirable for the effect you want to show.

Yeah the 50% applayer doesn't really work :)
We can take a set of IPv4 last-digits that add up to ~50%, and only apply the regex to IPv4 client adddresses. This seems simple enough if, when evaluating the effects, you can also leave out IPv6 users on that end. We'd basically be doing pseudocode like: if (is_ipv4 and ipv4_string ~ '[0-4]$') { turn_on_lazy_loading }. That's trivial - please just give me a week or so lead time to make some small room in my schedule to get it ready. If you want hashed IPs, that's not trivial, even though it's far more useful in the long term to get that ready and working.

That sounds perfect. Thanks @BBlack for all your help with this!
Note I'm off on vacation 1st June, so I'm keen to get the test running before I go - that's my only real blocker.

Thanks @leila for the thoughts on the PHP idea. Glad we ruled that out.

Jdlrobson reassigned this task from Jdlrobson to BBlack.May 17 2016, 1:01 AM
Nuria added a subscriber: Nuria.EditedMay 19 2016, 6:43 PM

My 2 cents here are that it seems overkill to run this experiment in all our user base. We can do a 50% split in 10% of users (or less) and that probably will give us enough of a sample to calculate increase of pageviews. We can run precise numbers for computation power and sample size but our user base is large enough we do not need to affect hundreds of millions of users to measure an effect. If lazy loading decreases pageviews or it is unusable from mobile 2G connections (making this up) we want to be able to find that out before we roll it to 50% of our user base.

I understand that this implies slightly more comple changes on varnish than a yes/no choice for every incoming request but seems the safest path. happy to work with @BBlack in what choice he thinks is best

@BBlack, I think keeping it simple with that sort of approach is the best way to go.

As for the header we would expect to see in the origin server (for the purpose of varying the output), what did you have in mind?

I believe the approach is as follows:

1a. @BBlack: Ready Varnish patch.
1b. @Jdlrobson: Ready MobileFrontend patch to Vary content on specified header construct

  1. Verify MobileFrontend patch is in production.
  2. Enable Varnish patch.

Presently bn.m.wikipedia.org has image lazy loading enabled by way of a $wg variable (so as new pages are generated they will have it by default). We plan to add it for fa.m.wikipedia.org and uk.m.wikipedia.org shortly as well. Just wanted to mention that in case you want to exclude them from cache splits.

@Nuria, apologies, I was composing my response and didn't see yours. I thought I had heard that doing a smaller sample would not actually work. I don't know if that means that it was just that 1% was too small (whereas, for example, maybe approximately 10% is sufficient).

leila added a comment.May 19 2016, 8:03 PM

@Nuria I'm with you if the test was to be run in a language such as en. but in this specific case, we're talking about languages with very low traffic, such as fawiki. To be able to measure the impact, we need to provide lazy loading for a larger % of the population than we would normally do.

Nuria added a comment.May 19 2016, 8:47 PM

@leila : I see, that is how we are controlling the size.

We have been working on a better scheme that would allow us to launch features for a small percentage of users in a big wiki easily. See: https://phabricator.wikimedia.org/T135762#2310419

Nuria added a comment.May 20 2016, 3:06 PM

I would like to proceed with the more thought out implementation for the lazy loading bucketing rather than our initial thoughts about splitting by IP.
Strategy is described here: https://phabricator.wikimedia.org/T135762

I am getting worried about concurrent experiments taking place and data getting muddled due to that. At this time we have both lazy loading and hoovercards happening, not conflicting quite yet but it is easy how to see that would happen.

Note that the changes we want to do in varnish will give us the ability to launch a feature to a given percentage of any wiki. Thus we could run experiments in bigger wikis rather than having to launch features to small wikis to control sample size.

Not detracting from the other conversation, which if it's viable soon is great, but lazy loading is on mobile web and Hovercards is on desktop web. How would they conflict?

Nuria added a comment.May 20 2016, 3:28 PM

@dr0ptp4kt: these experiments are not in conflict yet, correct but the code that decides what user is in what experiment or what is enabled where is all over the place which is bound to cause problems.

Centralizing changes in varnish has several benefits:

  1. one stop shop to look at experiments enabled
  2. doesn't require local storage (which Hovers requires to enable the feature right now)
  3. Ability to launch a feature to a percentage of the user base in larger wikis
  4. experiment info copied by default to x-analytics and ready for analysis

@Nuria, I agree centralization would be nice.

@BBlack / @Nuria, any update? I'd like to queue up the remainder of origin- & client-side work before @Jdlrobson goes on vacation.

Would it make more sense to do a one-off here while pursuing the more generic solution?

Nuria added a comment.May 24 2016, 3:59 PM

Would it make more sense to do a one-off here while pursuing the more generic solution?

No, i do not think so, the one off doesn't seem much faster.

What's the ETA?

It occurred to us that we could actually roll out the lazy loading and then apply the A/B enrollment for a small percentage of users to not get the lazy loading treatment (that is, basically giving them the same non-lazy loaded treatment that has been customary for so long), in the vein of http://engineroom.wpengine.com/2016/04/04/a-faster-ft-com/.

Jdlrobson closed this task as Declined.Aug 29 2016, 9:57 PM

In the end we did this for everyone.