Page MenuHomePhabricator

Write up performance state-of-the-union statements for mobile web and ResourceLoader
Closed, ResolvedPublic

Description

For the next performance team meeting (Wednesday 5/6), have a short (~1-paragraph) summary of the current performance profile of these two projects and indicate where the big performance gains are to be had (if there are any).

Event Timeline

ori created this task.Apr 29 2015, 7:40 PM
ori raised the priority of this task from to Needs Triage.
ori updated the task description. (Show Details)
ori added a project: Performance-Team.
ori added subscribers: ori, Gilles, Krinkle.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptApr 29 2015, 7:40 PM

@Jdlrobson is this a fair assessment of the performance situation for mobile web?

Currently, mobile web on the enwiki front page loads 29kb of CSS in the head (compressed). An initial look at the CSS already revealed that there were a lot of gains (at least 27%) to be had with obvious modules that can simply be moved out of the head. On the JS front, things are looking excessive as well, with 86.5kb of compressed JS loaded before firstPaint. Considering that mobile web usage is often associated with poor bandwidth/unreliable connections, the gains to be had on both fronts of cleaning up the CSS/JS top queue for mobile web look to be quite significant.

@Gilles Personally I think the biggest problem on mobile is images and the HTML. I think JS and CSS can be microoptimised and moved out of the head but it's through rethinking HTML and images that we'll make the most gains... Is that fair?

Gilles added a comment.May 5 2015, 9:59 PM

Makes sense, the size of everything is what matters for mobile. Where do you see the biggest gains to be had in the areas of html and image weight?

Jdlrobson added a subscriber: Mhurd.May 6 2015, 6:51 AM
  • I think there are small gains to be had by transforming HTML via Parsoid to make it cleaner. For example @Mhurd discovered that navboxes for example were contributing to a high percentage of the page html.
  • I would like us to get to a stage where images are only loaded when they are scrolled into view.
  • One idea we've played with is to simply serve the lead section in the html and then pull the rest of it in via JavaScript when the user scrolls (and have a fallback link to the full article) but I'm not sure if this is viable... would this scale?
Gilles added a comment.May 6 2015, 3:52 PM
  • I think there are small gains to be had by transforming HTML via Parsoid to make it cleaner. For example @Mhurd discovered that navboxes for example were contributing to a high percentage of the page html.

That sounds very interesting, any idea what gain in % of page weight we'd be looking at for navboxes as an example?

  • I would like us to get to a stage where images are only loaded when they are scrolled into view.

At my previous workplace smart scrolling for images like this was implemented for desktop and I recall that it was very difficult to pull off, as browsers aren't used to their DOM being constantly fiddled with during scroll. It's doable, but difficult, and your mileage may vary in terms of how smooth it really is depending on the browser at hand. I'd be worried that it makes phones with limited computing power sluggish. That being said, the pages I'm talking about were 100% thumbnails, so maybe it's not a fair comparison to articles that usually only have a few images here and there.

Another reason why I think it's a risky proposition for mobile is because people with slow bandwidth will suffer a double whammy. It delays even further their slow image requests that were already going to take a while.

That being said, even with the above constraints it might be worth pursuing and only applying to clients that prove to have the required capabilities after a measurement or mini-benchmark of computation speed and bandwidth/latency. The devil is in the details, but it's definitely something worth looking into.

Is it going to be a huge performance gain, though? I'm not sure. Most mobile browsing is done looking at a single page at a time and the images should only load after the article at the moment, and their requests would be aborted when navigating away. What's the advantage you think this will give users? Bandwidth economy by not downloading images never looked at? Faster scroll because the DOM is lighter?

  • One idea we've played with is to simply serve the lead section in the html and then pull the rest of it in via JavaScript when the user scrolls (and have a fallback link to the full article) but I'm not sure if this is viable... would this scale?

If I'm following, this is an extension of the image idea, but for the DOM itself, right? This sounds very difficult to pull off, since auto-balancing of DOM nodes might result in broken-looking content. It's very hard to cut DOM into pieces and make sure that the layout stays the same, even more so on a website that displays user-created content.

Deskana added a subscriber: Deskana.May 6 2015, 4:02 PM
  • I think there are small gains to be had by transforming HTML via Parsoid to make it cleaner. For example @Mhurd discovered that navboxes for example were contributing to a high percentage of the page html.

That sounds very interesting, any idea what gain in % of page weight we'd be looking at for navboxes as an example?

Cutting navboxes and some other things that are never displayed on either mobile web or mobile apps cut HTML payload by around 50% on [[Barack Obama]]. Obviously that's a rather extreme example, but on the other hand the articles that are edited more (and therefore have more redundant HTML) tend to also be amognst the most viewed articles.

  • One idea we've played with is to simply serve the lead section in the html and then pull the rest of it in via JavaScript when the user scrolls (and have a fallback link to the full article) but I'm not sure if this is viable... would this scale?

If I'm following, this is an extension of the image idea, but for the DOM itself, right? This sounds very difficult to pull off, since auto-balancing of DOM nodes might result in broken-looking content. It's very hard to cut DOM into pieces and make sure that the layout stays the same, even more so on a website that displays user-created content.

The Android app actually does this right now. It loads the lead section, then loads the rest of them immediately afterwards. It caused the app to perform noticeably better than the iOS app under low bandwidth conditions. Because the lead section and the rest of the sections tend to be treated as very discrete chunks by editors, I've never seen any display issues as a result of this approach, so I was totally happy with it from a product standpoint.

Gilles added a comment.May 6 2015, 4:20 PM

Lead section on pageload + pulling the rest right after (or waiting for scroll if dealing with a low latency/fast bandwidth situation) definitely sounds like a great win to be had. I forgot that the structure of our content helps here.

It does sounds like HTML pageload weight reduction, in the form of parsoid simplification and/or only serving the lead section with the initial pageload is probably the biggest performance gain to be had at the moment.

Here's an updated state-of-the-union paragraph on mobile web performance, then:

The size of the HTML served at pageload seems to be where the biggest performance gains are to be had on mobile web at the moment. One area to explore is the automated size reduction of the HTML via Parsoid. Another is to only serve the lead section on pageload and load the rest of the page asynchronously. A secondary improvement-worthy area is the current firstPaint performance hampered by the amount of CSS and JS served in the head. Some of which has already been identified as low-hanging fruit. Lastly, the idea of loading images on demand when they are scrolled to could be experimented with.

@Gilles, in the app service the size reduction is achieved by a DOM transformation on Parsoid HTML. The semantic RDFa information in the Parsoid output helps to reliably identify specific content.

In RESTBase we are working on exposing a fairly generic section retrieval and edit API for HTML content (T94890). We'll keep the lead section use case in mind for that.

Looks good to me @Gilles! Thanks for investing your time in identifying our main problems!
Only serving the lead section would do wonders for images on demand

The Android app actually does this right now

Is it using an API call to achieve this at the moment?

The Android app actually does this right now

Is it using an API call to achieve this at the moment?

Indeed, it requests the first section, then requests the rest afterwards.

https://github.com/wikimedia/apps-android-wikipedia/blob/master/wikipedia/src/main/java/org/wikipedia/page/PageViewFragmentInternal.java#L1064-L1152

The code is a bit obscure and not so readable though. If you're interested in learning more, you can grab me at the Hackathon and we can tease out the actual queries that are used. :-)

Thanks for the pointer, that led me to the API being used: http://en.wikipedia.org/w/api.php?action=mobileview&page=Barack_Obama&sections=0&prop=text|sections&format=json which is from MobileFrontend itself. And looking at the mobile web DOM, it seems like it would be very straightforward to only serve section 0 with the pageload and request the rest through the API on DOM readiness.

The Barack Obama article might be a bit of an extreme example due to its length, but in that case the API data needed for section 0's text + the list of sections is almost 30 times smaller than the data needed for all sections's text (5.9kb gzipped versus 173.8kb gzipped).

phuedx added a subscriber: phuedx.May 21 2015, 5:47 PM

Just a question to think about, too. I think the goal is to minimize the delivered html in php and load the rest of the sections via JavaScript through the Api or RESTBase, right? What about users without enabled JavaScript? Does they need to click a link to load the page a second time (e.g. with a new url parameter "?format=full" or something else)?

@Florian I think that's the simplest solution for no-JS clients, yes. A "read more" link of sorts.

Krinkle triaged this task as Normal priority.Jul 9 2015, 6:19 PM
Krinkle set Security to None.
Krinkle moved this task from Inbox to Backlog: Small & Maintenance on the Performance-Team board.
Krinkle closed this task as Resolved.Dec 5 2016, 11:35 PM
Krinkle claimed this task.