Page MenuHomePhabricator

Remove `navboxes` from HTML in mobile web beta and show the impact
Closed, ResolvedPublic5 Estimated Story Points

Description

As a user I want larger chunks of hidden HTML to not be loaded at all so that I can save bandwidth.

(Feel free to split this into subtasks during sprint)

We would like to get a sense of how reducing HTML size impacts first paint.
We have identified navboxes as representing 10% of the HTML. Although this is minimal compared to reference HTML (see T123328) it may provide us a cheap way to verify if reducing HTML does indeed impact the first paint for our users significantly (with whatever instrumentation is at our disposal), and the implicit benefit of ensuring we can do a good job of measuring and showing said impact.

  • A patch should be prepared that for MobileFrontend (beta only) strips navboxes from the HTML. It should be configurable so we can turn it on and off.
  • This should be first deployed on beta cluster in the beta mode channel and we should verify and record there how we expect this to impact first paint, HTML size and server response time in a controlled fashion.

Specifically:

  • How does this impact TTFB for the Barack Obama article (expressed in relative terms e.g. first paint halved)
  • How does this impact First paint for the Barack Obama article (expressed in relative terms e.g. first paint halved)
  • How does this impact fully load time for the Barack Obama article (expressed in relative terms e.g. first paint halved)
  • How does this impact HTML bytes for the Barack Obama article (expressed in relative terms e.g. first paint halved)
  • Now enable on the beta cluster in mobile web stable and measure the impact on the above as before.
  • Patch should now be deployed to the production cluster on enwiki (beta only)
  • We should verify and record the degree to which result in mobile web beta matches our expectations on the beta cluster (it is understood there are some differences in cache hits; cachebusting parameters may be one way to reduce such confounding variables, even if imperfectly)
  • Send a report to team detailing lessons learned and how we can improve this implement/measure process in future and what next steps should be (configure off/continue experimentation in this field)

Next:

  • Pushing to stable. If the impact is not generally harmful on cached pages, it should likely be promoted to production on the mobile web. A cache purge is not necessary as the change can fade in. However, this does raise the point of how our instrumentation can know when the page has navboxes and when they're hidden.
  • In future we may aim to lazy load the navbox content. However, this is out of scope at the moment as the content is already CSS hidden and this would be a new feature.

Related Objects

Event Timeline

Jdlrobson raised the priority of this task from to Medium.
Jdlrobson updated the task description. (Show Details)
Jdlrobson added subscribers: Peter, phuedx, Niedzielski and 18 others.

To get realistic metrics, you'll probably want to primarily test with inline CSS. The results in https://phabricator.wikimedia.org/T113066#1893866 strongly suggest that first paint is primarily driven by CSS availability, and less by page size.

To get realistic metrics, you'll probably want to primarily test with inline CSS. The results in https://phabricator.wikimedia.org/T113066#1893866 strongly suggest that first paint is primarily driven by CSS availability, and less by page size.

My intention was for us to start small - inlining CSS is going to be much trickier. It's fine if we find there are no noticeable impacts of our changes as we'd fail fast. Do we have a separate task for discussing inlining CSS? I would like to be talking about that in parallel as how to overcome the caching is not clear to me.

I have created T124966 for inlining of CSS. I think it's actually easier to prototype than extracting navboxes, considering that the minimal implementation just inlines the RL response.

dr0ptp4kt renamed this task from Remove `navboxes` from HTML in the mobile beta and show the impact to Remove `navboxes` from HTML in mobile web beta and show the impact.Jan 30 2016, 3:10 PM
dr0ptp4kt updated the task description. (Show Details)
dr0ptp4kt updated the task description. (Show Details)

Change 267812 had a related patch set uploaded (by Jdlrobson):
Experiment one: Labs stripping HTML in beta

https://gerrit.wikimedia.org/r/267812

Change 267813 had a related patch set uploaded (by Jdlrobson):
Allow configuration of mobile formatter in beta only

https://gerrit.wikimedia.org/r/267813

Passing the baton. Someone should grab this Tuesday @Jhernandez / @phuedx are you interested?
After swatting https://gerrit.wikimedia.org/r/267812 you should be able to see some kind of results after 3hrs on https://grafana.wikimedia.org/dashboard/db/mobile-2g - you can then SWAT a change for stable and see if you get similar results (labs so no SWAT windows necessary)

Change 267813 merged by jenkins-bot:
Allow configuration of mobile formatter in beta only

https://gerrit.wikimedia.org/r/267813

I've reviewed tested and merged the patches.

I can't merge the config change, so I can't swat this. Will ping people around.

Moving to needs more work until we get it deployed in beta cluster.

No deploys for the moment as per #wikimedia-operations:

<phuedx> jynus: no, sync-common breaks the production machines atm
<phuedx> :/
<phuedx> 10:53:08 <_joe_> I'm going to read the scap source to find out what is happening but please no deploys atm

T125506 tracks the issue with the deployment server. Per T125506#1989711, today's SWATs are cancelled.

Change 267812 merged by jenkins-bot:
Experiment one: Labs stripping HTML in beta

https://gerrit.wikimedia.org/r/267812

Change 268202 had a related patch set uploaded (by Jdlrobson):
Config change 2: Suppress HTML from initial stable views on BC

https://gerrit.wikimedia.org/r/268202

According to https://grafana.wikimedia.org/dashboard/db/mobile-2g

  • HTML Bytes jumped from 187.7k to 152.5k (19% decrease)

Screen Shot 2016-02-03 at 11.33.24 AM.png (280×710 px, 36 KB)

According to https://grafana-admin.wikimedia.org/dashboard/db/mobile-2g?panelId=46&fullscreen impact on render / fully load time / TTFB are less conclusive..
Baseline:

Screen Shot 2016-02-03 at 11.43.30 AM.png (80×184 px, 10 KB)

Test run 1:
Screen Shot 2016-02-03 at 11.43.33 AM.png (84×190 px, 10 KB)

Test run 2:
Screen Shot 2016-02-03 at 2.25.23 PM.png (105×181 px, 13 KB)

Test run 3:
Screen Shot 2016-02-03 at 5.32.17 PM.png (87×157 px, 13 KB)

First paint

Test runFirst paintImprovement on baseline
Baseline (2016-02-03 16:18)18.30
2016-02-03 19:005.968%
2016-02-03 22:0916.599%
2016-02-04 01:1817.097%

Fully loaded

Test runFirst paintImprovement on baseline
Baseline (2016-02-03 16:18)36s0%
2016-02-03 19:0024s33%
2016-02-03 22:0935.02s2%
2016-02-04 01:1837.82-2%

TTFB

Test runFirst paintImprovement on baseline
Baseline (2016-02-03 16:18)14.7s0%
2016-02-03 19:003.9s73%
2016-02-03 22:0913.09s11%
2016-02-04 01:189.4236%

Will continue to update this as the day progresses. Next result in 3 hours...

@phuedx @Jhernandez I suggest you taking over from me when I finish for the day.
I would suggest SWATing https://gerrit.wikimedia.org/r/268202 and continue the analysis in the beta cluster stable channel to see if the impacts there are in anyway consistent with beta.

Before pushing this to production I'd suggest we get the fix for T125260 merged so we can see the results on beta across all page views.

@Jdlrobson: I've converted your tables into a spreadsheet and added two more entries based on the following:

Screen Shot 2016-02-04 at 12.04.49.png (153×228 px, 13 KB)

Screen Shot 2016-02-04 at 12.05.12.png (142×221 px, 13 KB)

Change 268202 merged by jenkins-bot:
Config change 2: Suppress HTML from initial stable views on BC

https://gerrit.wikimedia.org/r/268202

BC stable HTML size was 188k beforehand. I completely forgot to purge the page (since mobile stable channel does not get purged so I just did that now 9:20pm PST - so hopefully we'll start collecting data for stable that will be more reliable.. :)

BC stable HTML is now 152kb which is consistent with https://phabricator.wikimedia.org/T124959#1995877.
I've updated the spreadsheet with the early first results.

Interestingly from first result, first paint seems to have taken a hit, but TTFB has gone up and fully loaded time down as might be expected. Obviously it goes without saying we need more data points.

I've added more entries to the spreadsheet throughout the day. I'd like to keep on doing so until close to the end of the sprint before we start drawing conclusions.

I came to the conclusion that comparing to one arbitrary result that happened exactly before the change was not such a good idea so I've added tabs that basically take the average value before and average value after. Unfortunately, we are a little limited with results from before the change. It leads to values more consistent with what I envisioned so I'd suggest adding results to these tabs from now on.

I would suggest we run the test on production beta next week so we can see if there are any parallels between our beta cluster environment and production cluster. Then yes let's draw conclusions near the end of the sprint.

This comment was removed by phuedx.
This comment was removed by phuedx.

I setup a script:
https://gist.github.com/jdlrobson/a358f9ed079fbc9ebf18

Have included calculations of medians before and after too.

Results as follows..

enwiki-bc-mobile-2gslow.anonymous.Barack_Obama :

PropertyBefore (avg)after (avg)Delta (Avg)% decrease (Avg)Before (median)After (median)Delta (median)% decrease (median)
html.bytes192215.4162168.130047.315.63%192216.0156159.036057.018.76%
TTFB.median3912.83915.2-2.4-0.06%3911.03911.00.00.00%
fullyLoaded.median24939.924970.0-30.1-0.12%24226.025178.0-952.0-3.93%
render.median5811.65707.6104.01.79%5884.05884.00.00.00%

x ~/git/measureperfchange $ node index.js "enwiki-bc-mobile-beta-2gslow.anonymous.Barack_Obama" 4 2 2016 22 16
enwiki-bc-mobile-beta-2gslow.anonymous.Barack_Obama :

PropertyBefore (avg)after (avg)Delta (Avg)% decrease (Avg)Before (median)After (median)Delta (median)% decrease (median)
html.bytes181695.5156211.225484.314.03%192157.5156209.035948.518.71%
TTFB.median11904.012438.4-534.4-4.49%12952.513468.5-516.0-3.98%
render.median15527.015844.5-317.5-2.04%16688.017137.0-449.0-2.69%
fullyLoaded.median33748.334493.5-745.1-2.21%35999.035843.0156.00.43%

[re-run after Baha pointed out issue with median calculation]

I think we should prepare a test for production beta for Tuesday or Wednesday to start collecting data there to see if there is any pattern between there and beta cluster.

Change 269341 had a related patch set uploaded (by Jdlrobson):
Test HTML stripping in production mobile beta

https://gerrit.wikimedia.org/r/269341

@Jdlrobson,

I setup a script: https://gist.github.com/jdlrobson/a358f9ed079fbc9ebf18

There is a typo in your script on line 18. The '/' should be '%'.

Scheduled a SWAT this afternoon. I don't expect to see much impact on the Barack Obama render/fully loaded/TTFB time however I'm curious to see if there is any impact on global beta traffic or any correlation with the beta cluster beta or stable tests.

I'll send a mail on Friday wrapping up this work.

Change 269341 merged by jenkins-bot:
Test HTML stripping in production mobile beta

https://gerrit.wikimedia.org/r/269341

So this is now live and verified so will get picked up in next run and you'll be able to generate results via:

node index.js "enwiki-mobile-beta-2gslow.anonymous.Barack_Obama" 9 2 2016 22 0

We should expect about 18% change in bytes
In beta currently:
Average median TTFB is 4953
Average median render time is 10609
Average median fully load time is 24154

I expect these figures to barely change over the course of the day but potentially fully load time should decrease, render time barely change and TTFB increase.

@Jdlrobson, great write up.

As you mention on the wiki page:

TTFB, first render and fully loaded time all took a hit in the beta cluster. All showed a negative impact post change.

Do we know why? Also, why do we think the beta numbers may not be a good indicator when in fact beta articles are effectively static? I'd say they are the truer measure of the change, rather than the production numbers. I also find it hard to understand how the structured language overlay work may have impacted the results.

Seconded. It's a good write up @Jdlrobson.

I also find it hard to understand how the structured language overlay work may have impacted the results.

Seconded. I don't understand why the language switcher work, which was merged yesterday, affected the measurements.

It might be worth mentioning that the Beta Cluster isn't an isolated environment. It's used for manual and automated testing of all of WMF's deployed extensions – probably more. There's no one time when the cluster isn't under some load. This isn't true in production as most (all?) requests from anonymous users are served from a cache.

Sorry it wasnt specific to language switcher I was mostly keen to get across the clusters unstable nature - changes hourly. Sam's comment above probably captures it a little better. I'll make some edits today.

We can get a feel for the impact of this change on enwiki with an insource:navbox search (~250K pages).

@phuedx can you be responsible for sign off on this one? Anything I missed?

Jdlrobson claimed this task.
  • I removed the config change on beta as I had a theory it might causing T126700 (I would like to know one way or the other for future changesa) nd have updated the mediawiki page now there is 7 days worth of data.

Signing off.