Page MenuHomePhabricator

Reliable measure how fast a Wikipedia article would be without JavaScript
Closed, ResolvedPublic

Description

After me and @Krinkle discussion today, we think it would be valuable to measure how fast articles would be without any JavaScript to create baseline. There's a couple of ways we can do it:

  1. Use speed-tests dir and create a test where we load just the minimal modules needed for an article (@Krinkle need your input of exact what that should be)
  2. Measure use Chrome and just block the first JavaScript request (that will cause minimal overhead for the test, but not test exactly as we want.
  3. Use Chrome again but manipulate the response, so that we change the RLPAGEMODULES tand o just include the ones we minimal need. I'm not sure what overhead that will create but I can have a go. I did some testing yesterday with https://chromedevtools.github.io/devtools-protocol/tot/Fetch/ and that should do the trick. We can run these test easily in our infrastructure too.

Event Timeline

Change 891490 had a related patch set uploaded (by Phedenskog; author: Phedenskog):

[performance/mobile-synthetic-monitoring-tests@master] Test Barack Obama without doing the first JavaScript request.

https://gerrit.wikimedia.org/r/891490

Change 891490 merged by jenkins-bot:

[performance/mobile-synthetic-monitoring-tests@master] Test Barack Obama without doing the first JavaScript request.

https://gerrit.wikimedia.org/r/891490

For reference, current values on desktop and mobile page views for enwiki Obama:

en.wikipedia
RLPAGEMODULES=["ext.cite.ux-enhancements","ext.tmh.player","ext.scribunto.logs","site","mediawiki.page.ready","jquery.makeCollapsible","mediawiki.toc",
"skins.vector.js","skins.vector.es6","mmv.head","mmv.bootstrap.autostart","ext.visualEditor.desktopArticleTarget.init","ext.visualEditor.targetLoader","ext.eventLogging","ext.wikimediaEvents","ext.navigationTiming","ext.cx.eventlogging.campaigns","ext.centralNotice.geoIP","ext.centralNotice.startUp","ext.gadget.ReferenceTooltips","ext.gadget.charinsert","ext.gadget.extra-toolbar-buttons","ext.gadget.switcher","ext.centralauth.centralautologin","ext.popups","ext.echo.centralauth","ext.uls.compactlinks","ext.uls.interface","ext.cx.uls.quick.actions","wikibase.client.vector-2022","ext.growthExperiments.SuggestedEditSession"];
en.m.wikipedia
RLPAGEMODULES=["ext.cite.ux-enhancements","ext.tmh.player","ext.scribunto.logs","site","mediawiki.page.ready","skins.minerva.scripts","mobile.init","ext.relatedArticles.readMore.bootstrap","ext.eventLogging","ext.wikimediaEvents","ext.navigationTiming","ext.cx.eventlogging.campaigns","ext.cx.entrypoints.languagesearcher.init","mw.externalguidance.init","ext.centralNotice.geoIP",
"ext.centralNotice.startUp","ext.gadget.EditNoticesOnMobile","ext.gadget.switcher","ext.centralauth.centralautologin","ext.popups","ext.echo.centralauth","ext.cx.entrypoints.mffrequentlanguages","ext.growthExperiments.SuggestedEditSession"];

I suggest the following:

RLPAGEMODULES=["ext.eventLogging"];

This suffices to effectively bring in a dozen or so indirect dependencies and base modules, and thus have the browser still go through the motions of having a JS pipeline like we do today (with a startup manifest, a script tag, a request going in the background, async evaluating module scripts as they come, and resolving/exporting/importing a dozen or so module definitinos etc). — yet without any specific page enhancements, DOM interactions, or active JS computations taking place. That should help create a realistic baseline that, unlike an artificial noscript mode has the browser still engaged and behaving as it would normally on our page views.

Great, I will try to add that the coming weeks, we need to upgrade the version of browsertime/sitespeed.io on BitBar to be able to mock with the content.

I got the hack number 3 to work today, but it adds a couple of 100 ms on TTFB, so I need make another version we do the exact same thing in a script (like getting the HTML body) but do not change the HTML, hopefully that can give me a good base line. Also one problem on the desktop version is that when I removed all JS, it always rendered only with the header (that started to happen with vector-2022), then it's harder to verify that some metric stay the same.

I pushed to test on the new bare metal server where we use the same code, except that for one we don't change the RLPAGEMODULES. That way we almost get the same TTFB. I'll keep the test running during the weekend and the we can have a look at the differences. It's a little hard to see exactly since we are running "wiki loves".

Under the key minJS I pushed where we add minimal JS. Then in the key JS we test the same URL with the same script except changing the resource loader to make our comparison more fair.

I did some work on this last week to try to get metrics that after the page is loaded, trying to measure changes in interaction. Not sure though what the best metrics is. I run into some problem on how to measure things in a good way since today we handle things differently if we don't have any JavaScript. For example on mobile all content are folded by default and we unfold using JavaScript so running with only eventLogging gives us a short page with everything folded.

I can at least push an example to the phones later this week so we can have a look.

Change 921879 had a related patch set uploaded (by Phedenskog; author: Phedenskog):

[performance/mobile-synthetic-monitoring-tests@master] Run tests with and without JavaScript.

https://gerrit.wikimedia.org/r/921879

Change 921879 merged by jenkins-bot:

[performance/mobile-synthetic-monitoring-tests@master] Run tests with and without JavaScript.

https://gerrit.wikimedia.org/r/921879

Change 922066 had a related patch set uploaded (by Phedenskog; author: Phedenskog):

[performance/mobile-synthetic-monitoring-tests@master] Update sitespeed.io.

https://gerrit.wikimedia.org/r/922066

Change 922066 merged by jenkins-bot:

[performance/mobile-synthetic-monitoring-tests@master] Update sitespeed.io.

https://gerrit.wikimedia.org/r/922066

Change 922069 had a related patch set uploaded (by Phedenskog; author: Phedenskog):

[performance/mobile-synthetic-monitoring-tests@master] Add missing multi flag for test to run.

https://gerrit.wikimedia.org/r/922069

Change 922069 merged by jenkins-bot:

[performance/mobile-synthetic-monitoring-tests@master] Add missing multi flag for test to run.

https://gerrit.wikimedia.org/r/922069

I've setup two tests to run on a Moto G5 phone try to compare the performance impact of the JavaScript we load. The method I used was using Chrome and a specific CDP feature where you can change the response before it reaches the browser engine. That way I could change which module that is loaded and load ext.eventLogging. Changing the responses adds some time on TTFB so I added a test to run side by side that do the exact same thing except do not switch out modules in the resource loader. That way the TTFB for both tests are the same.

I also tried to do some interaction with the page but that is hard to do and make sure that we compare the same thing. For example I wanted to test to scroll a page to the bottom of the article, but today in our mobile version the article is folded and opened up with JavaScript, so if those JavaScript do not run, the non JavaSCript version is really short. What I ended up doing is clicking on the menu to see if we could trigger first input delay.

In this test I ended up focus on Total Blocking Time. According to the Chrome team a good TBT score should be under 200 ms on an average mobile hardware. In India for us an average device is probably something like Samsun A51 looking at the CPU benchmark score we collect https://grafana.wikimedia.org/d/cFMjrb7nz/cpu-benchmark?orgId=1

Screenshot 2023-05-31 at 13.30.13.png (1×2 px, 265 KB)

But with the setup we have at BitBar those phones runs on tests with WebPageReplay so I used the Moto G5 phones. They are slower than the average so our Gould should probably not be getting under 200 ms.

First here's what the script look like (if we wanna do the same test again):

module.exports = async function ( context, commands ) {
  const cdpClient = commands.cdp.getRawClient();
  await cdpClient.Fetch.enable({
    handleAuthRequests: false,
    patterns: [
      {
        urlPattern: '*',
        resourceType: 'Document',
        requestStage: 'Response'
      }
    ]
  });

  cdpClient.Fetch.requestPaused(async reqEvent => {
    const { requestId, resourceType } = reqEvent;

    const myBody = await cdpClient.Fetch.getResponseBody({
      requestId
    });

    let text = Buffer.from(myBody.body, 'base64').toString('utf8');

    let rlmodules = text.substring(
      text.lastIndexOf('RLPAGEMODULES'),
      text.lastIndexOf('];') + 2
    );
    text = text.replaceAll(rlmodules, 'RLPAGEMODULES=["ext.eventLogging"];');
    return cdpClient.Fetch.fulfillRequest({
      requestId,
      responseCode: 200,
      body: Buffer.from(text, 'utf8').toString('base64')
    });
  });

  await commands.measure.start('barackWithoutJS');
  await commands.navigate('https://en.m.wikipedia.org/wiki/Barack_Obama');
  await commands.click.byId('mw-mf-main-menu-button');
  await commands.wait.byTime(2000);
  return commands.measure.stop();
}

The total blocking time on the device running default setup accessing the Barack Obama page is 715 ms. It's 11 long tasks and the longest long task is 269 ms

The same with the modified version has a total blocking time of 83 ms, 6 long tasks and the longest task is 198 ms. We still have long tasks and half of them before first paint (parsing HTML/CSS).

I kept the tests running for a week+ and the metrics stay the same over time.

Summary

We have room for improvement for our total blocking metrics and making the user experience better for people on mobile on slow devices.