Page MenuHomePhabricator

According to our instrumentation Opera Mini page size increased with lazy loading of images
Closed, DeclinedPublic

Description

For a while we have been able to measure SpeedIndex and FirstPaint + the total amount of downloaded data for UC mini and Opera mini (running as extreme, meaning a proxy browser). We only do it for one page (https://en.m.wikipedia.org/wiki/Facebook) so it isn't perfect but we can at least see what happens for that page.

When we released the lazy loading of images, the total page weight (everything we send) increased for Opera Mini, meaning we make it worse for people that tries to minimize the bytes sent:

opera-mini.png (496×2 px, 98 KB)

For other browsers it looks ok (UC mini too):
https://grafana.wikimedia.org/dashboard/db/mobile-webpagetest?panelId=49&fullscreen

I've pinged Opera dev rel on Twitter but maybe someone already have a contact that we can talk to?

Event Timeline

Since we only send the non Javascript version to Opera Mini, it seems we added some over head to plain the HTML or have we changed something else?

Ok, I learn new things everyday :) We send Javascript so Opera executes that on the server side so the extra payload comes from the lazy loading.

Interesting. I'm a little confused how it could jump so high. Why would the number of bytes be so much higher?

We need to work out what solution there is here (if any).

@Jdlrobson I'll ping you with an email address to Opera.

It's hard to see since it's a blackbox, but since they execute the javascript server side, they somehow someway send some extra things in the blob to the browser :)

@Peter, sounds like you found a contact. I have one if you need as well - please let me know if you'd like me to link you and @Jdlrobson up with that person.

As I understand it, the inlined JS added to the HTML (and practically speaking, executed in the compression server) to ensure that compression proxy / jQuery-incompatible yet JS supporting UAs continue to have the images in the page as always (naturally implies continued full page image transfer) would be this, which gets minified.

https://github.com/wikimedia/mediawiki-extensions-MobileFrontend/blob/d971e896bded5bdef033c42a45b2ff7963a16ca5/includes/MobileFrontend.skin.hooks.php#L23-L37

(window.NORLQ = window.NORLQ || []).push( function () {
	var ns, i, p, img;
	ns = document.getElementsByTagName( 'noscript' );
	for ( i = 0; i < ns.length; i++ ) {
		p = ns[i].nextSibling;
		if ( p.className.indexOf( 'lazy-image-placeholder' ) > -1 ) {
			img = document.createElement( 'img' );
			img.setAttribute( 'src', p.getAttribute( 'data-src' ) );
			img.setAttribute( 'width', p.getAttribute( 'data-width' ) );
			img.setAttribute( 'height', p.getAttribute( 'data-height' ) );
			img.setAttribute( 'alt', p.getAttribute( 'data-alt' ) );
			p.parentNode.replaceChild( img, p );
		}
	}
} );

The determination of what gets run via NORLQ is determined here as I understand:

https://github.com/wikimedia/mediawiki/blob/3cfcd55011244c0767079bf4dbeb0dcc2345d34c/resources/src/startup.js#L50-L101

I'm unsure whether that JS is itself delivered down to the client (granted, it would not be executed at the client, but that's a different question).

I'm under the impression that the ResourceLoader JavaScript modules for RL-suppressed browsers doesn't get downloaded at all, and thus would be out of scope for what ultimately gets sent down to the browser. That is to say, while ResourceLoader JavaScript modules will get sent for other browsers, they won't get sent for full page compression contexts of Opera Mini.

Now, when we look at the HTML the browser receives for images in a page like the one describing the thriller Prometheus it looks like the following. Note I've added extra newlines for legibility; go look at the HTML source for a cleaner example.

<div class="thumbimage">
<a href="/wiki/File:Noomi_Rapace_2007.jpg" class="image"><noscript>
<img alt=""
src="//upload.wikimedia.org/wikipedia/commons/thumb/9/9c/Noomi_Rapace_2007.jpg/175px-Noomi_Rapace_2007.jpg"
width="175" height="197" data-file-width="472" data-file-height="530"
</noscript>
<span class="lazy-image-placeholder"
style="width: 175px;height: 197px;"
data-src="//upload.wikimedia.org/wikipedia/commons/thumb/9/9c/Noomi_Rapace_2007.jpg/175px-Noomi_Rapace_2007.jpg"
data-alt="" data-width="175" data-height="197">
</span>
</a>
</div>
</div>

In the tradeoff over fragmenting the cache to optimize for the three classes of browser (noscript, compression proxy, full jQuery compatibility/RL) versus not fragmenting the cache (which is what we're doing in our code), this seems to be a relatively lightweight way to deliver the payload. That said, is there anything in it that can be trimmed down?

When we think about both the binary-level compression Opera Mini servers apply as part of delivery of the full page payload to the Opera Mini client, as well as any HTTP gzip compression when not using Opera's wire protocol (although I doubt that does anything significant on top of the Opera Mini compression for such cases)...

Is there any way to ensure that the bytestream will better contain repeating character sequences, both within a given sequence of tags for one image, as well as across the full collection of images for a page? Are there byte sequences that can be removed? For example, are the data-* fields necessary in the <noscript> wrapped <img>?

This is an aside, but are there places newlines can be removed within tags to simply send less data down? It doesn't amount to much specifically in the case of these images, but it is at least several bytes per image.

Now, given that the document format sent across the wire to the Opera Mini clients when in full page compression is binary, it does sound like it would be worthwhile to determine if the engineers building the binary document could remove superfluous non-presentational bytes if they're at play here. <noscript> tags, for example, aren't interesting at all. Similarly, if it's actually presentational bytes that have grown and not pure HTML markup, it would be nice to know if there's any micro-optimization that can be made in the HTML that would play nice for all the browsers but also be picked up by the compression proxy middleware and result in lighterweight delivery.

@Peter is the webpagetest probe measuring the data transfer using HTTP delivery (e.g., through an HTTP proxy on the infrastructure), Opera's wire protocol (e.g., picking up traffic measured across a binary port during document delivery), or something like that?

So we have it here, here's the graph from above zoomed out to look at the broader trends across the several major browsers to add some additional context (global and regional browser trends obviously differ - thanks for filing the task). As mentioned above, this is with respect to the Facebook article specifically. We did know we'd be inflating the HTML somewhat with the expectation of less image bandwidth, so it's worth figuring out what things we might be able to do further to reduce (compressed) HTML and, more specifically, in the context of Opera Mini.

dulles_page_weight.jpg (542×1 px, 95 KB)

Another note I mentioned the other day: we should consider removal of the NORLQ code block if the page doesn't bear images, at least if that's technically easy to do.

WebPageTest captures the packets sent to the device, so we can get a summary, like the total bytes sent to the device generated for that page.

I think it would be good to reach out and explain our implementation, since the Opera Mini server is a blackbox for us, I've only told them about the problem, I think @Jdlrobson better can explain the exact implementation. Best case we don't need to do anything, maybe they can tweak on their side :)

We only have it up and running for the Facebook page, but I think it affects all pages since it correlates to the release. All other browsers we test looks great though :)

I've sent an email to an Opera contact to see if we can shed some light on this.

What Opera Mini version is installed on the test device? The two operation modes available on Android devices differ a lot when it comes to functionality and data usage. Can you say which mode is turned on?

It's running extreme mode, from Dulles USA, Android Motorola G. I don't have the exact Android version.

And FYI the total amount of kb:s for the HTML (compressed) increased 2-3 kb when we released.

I looked deeper into the change to have exact figures. For Chrome (the same Facebook page), the HTML increased by 1 kb and JS also by 1 kb.

Screen Shot 2016-08-30 at 8.40.09 PM.png (514×2 px, 80 KB)

And for Opera the total size increased from 367 to 423 = 56 kb.

So I looked at access logs for Indonesian Wikipedia on the Opera Mini user agent to see how many bytes of images were requested before and after the change.

On the 17th we served 23911333049 bytes (23.9GB)
On the 19th we served 19770281102 bytes (19.77GB)

We made the change on the 18th so this suggests to me that there is a problem in the grafana graphs.

On Opera Mini we currently don't run JavaScript so I would expect no shift in these numbers after the change. If your graphs were accurate 56kb per Opera Mini user would be a significant increase and I'd expect to see the number on the 19th much higher - not less. Of course, if Opera Mini serves the images from its own server then you can ignore this assessment. Can dig deeper if need be.

I used the following query:

hive -e "use wmf; select month, day, substr(referer,1,26), sum(response_size) from webrequest where year = 2016 and month = 8 and day = 17 and uri_host = 'upload.wikimedia.org' and user_agent rlike 'Opera Mini' and referer rlike '^https://id\.m\.wikipedia\.org/wiki/([^:])+$' and content_type rlike '^image' and agent_type = 'user' and http_status = '200' group by month, day, substr(referer,1,26);" > opmini-8-17.tsv

I don't think the amount of image bytes served is the right way to check. WebPageTest is measuring the amount sent from Opera servers to the browser. We are only able to see the amount sent from our servers to Opera servers. What Opera does with the data is a black box for us.

And since we don't lazy load images on Opera Mini that number shows that we sent less images that day, nothing more right?

@Jdlrobson @Peter are we waiting for someone from Opera to weigh in? Since this is a high priority task, and a month old, if we are not waiting I'd suggest a formal spike to dedicate resources to this and knock it out. :) CC @ovasileva

Jdlrobson lowered the priority of this task from High to Medium.Oct 3 2016, 4:52 PM

@Peter based on https://phabricator.wikimedia.org/T143663#2603240 is this not a problem that lies in Opera's ecosystem? We probably caused it but I suspect there is some code Opera's side that needs to change if this is indeed happening.

@Smyru is there anything strange in your server logs that suggest Opera is serving more images to users on Wikipedia mobile after Thursday, Aug 18 2016?

I don't think there is much we can do this side right now unfortunately.

@Jdlrobson An average size of output page for Wikipedia actually dropped from ~112kb to ~105kb, so there is no penalty for our end users.

@Smyru with the same amount of page views? when @Jdlrobson did the check, it seems like we serve less to your servers = meaning less users. For a browser that shows all the images (not loading them async) the change shouldn't result in less bytes being shipped, rather a increase by 1 kb or something for the increase of the HTML?

@Peter, the summer time is not representative period for trends on our side, but I don't see a sudden drop of page transcodes per second.

ok thanks @Smyru and the data is only for Opera Mini running in extreme mode right?

Thanks @Smyru for the quick response. I'm glad everything seems to be working as expected.

@Peter it sounds like we may have a problem with our instrumentation?

@Jdlrobson do you mean the numbers in WebPageTest? We could test more pages, is there a easy way to turn on/off lazy loading? Just to verify.

I don't think the numbers match up? With the changes we did, we should still ship the same amount of kb to Opera Mini in extreme mode (or maybe a 1 kb more per request for the extra HTML). Opera mini should still download all the images in extreme mode. So we shouldn't expect a 7kb less sent that could only be less users/page views.

Jdlrobson renamed this task from Opera Mini page size increased with lazy loading of images to According to our instrumentation Opera Mini page size increased with lazy loading of images.Nov 2 2016, 8:35 PM
Jdlrobson lowered the priority of this task from Medium to Low.
Jdlrobson added a project: Performance Issue.

@Peter what do you suggest we do here? It's possible to disable lazy loading images with the X-Debug headers and changing the config to:

"MFLazyLoadImages": {
			"base":false
			"beta": false
		},

I've not done that before, but Timo should be able to help you.

The slight increase in HTML (image+placeholder+noscript, + NORLQ/Grade C loader) is imho diminishable and not a problem in this context.

The first graph @Peter posted showed a 100-200KB increase, which is fairly significant. I can't see this in the graphs now (it seems we lost data from before August 2016 in Graphite for most metrics?).

However, assuming that it is true that end-user bandwidth usage for Opera Mini went up, then I would guess the following: Opera might have its own lazy-loading system for native images in the HTML And maybe our JS-code, which runs on their server-side, doesn't just create the images, but also causes them to bypass their lazy-load system? Or perhaps not lazy-load, but some kind of extra compression step that only applies to images originally found in the HTML?

Yes I've removed them some time ago since it wasn't working and constantly failing (the same with iOS). Lets close it and we really needed help from Opera to be able to pinpoint what happened but I feel we didn't get that.

Peter changed the task status from Resolved to Declined.Jun 21 2017, 9:46 AM