Page MenuHomePhabricator

Investigate performance impact of full-page CSS invert filter for dark-mode
Closed, ResolvedPublic5 Estimated Story PointsBUG REPORT

Description

Background: Users have complained that CSS invert impacts scrolling (See https://en.wikipedia.org/wiki/Wikipedia_talk:Dark_mode_(gadget)#Extreme_slowness_while_having_this_enabled for reference)

One of the options being considered for dark-mode, and the option currently implemented by Extension:DarkMode, is a full-page color invert using the CSS filter property.

Essentially: filter: invert(1) hue-rotate(180deg) on the whole page.

This goal of this ticket is to determine whether or not that approach would have a negative performance impact on end-users, and based on that, whether it's appropriate to use this approach for anonymous users.

TODO

  • Develop a synthetic testing scenario that can measure the impact of this change
  • Measure the impact of this approach on low-end devices
  • Measure the impact of this approach on memory
  • Measure the impact of this approach on very long pages

The hypothesis here is that the filter property might create a new CSS compositing layer over the whole page, which, depending on the size of the HTML, could be taxing on the device GPU and memory, which might lead to rendering slowdowns or bugs.

Event Timeline

ovasileva triaged this task as Medium priority.Oct 19 2023, 4:22 PM
ovasileva raised the priority of this task from Medium to High.Oct 23 2023, 2:57 PM

Some information that may be helpful:

I think for this reason any synthetic test should check scrolling.

The following test might work:

  • Grab a long page with lots of DOM elements e.g. https://en.wikipedia.org/wiki/Tyrannosaurus
  • Clone the HTML and in the second HTML add a style/script that simulate dark mode and a scroll to the bottom inside the head element:
<style>html { filter: invert( 1 ) hue-rotate( 180deg ); } </style>
<script>window.setInterval( function () { window.scrollTo(0, window.scrollY+ 100); }, 200 )</script>
  • Compare the performance of the 2 pages.
  • Run for both mobile and desktop skin(s).

Thank you @Jdlrobson . I'll start to test that out next week when I'm back from vacation (back on Thursday the 2:nd).

I'll be a little slow on this since our WebPageReplay tests with Chrome broke in our monitoring two days ago (T350105), I need to understand the root cause of that before I can start with this.

I did some testing last week using Chrome and gonna document it here how I did it. Using the Chrome dev tools protocol in Chrome you can change the server response before it reaches the browser, so manipulating the HTML is easy. With a script like this I'm adding the filter:

module.exports = async function ( context, commands ) {
  const cdpClient = commands.cdp.getRawClient();
  await cdpClient.Fetch.enable({
    handleAuthRequests: false,
    patterns: [
      {
        urlPattern: '*',
        resourceType: 'Document',
        requestStage: 'Response'
      }
    ]
  });

  cdpClient.Fetch.requestPaused(async reqEvent => {
    const { requestId, resourceType } = reqEvent;

    const myBody = await cdpClient.Fetch.getResponseBody({
      requestId
    });

    let text = Buffer.from(myBody.body, 'base64').toString('utf8');

    text = text.replaceAll('</head>', '<style>html { filter: invert( 1 ) hue-rotate( 180deg ); } </style> </head>');
    return cdpClient.Fetch.fulfillRequest({
      requestId,
      responseCode: 200,
      body: Buffer.from(text, 'utf8').toString('base64')
    });
  });

  await commands.measure.start('Tyrannosaurus');
  await commands.navigate('https://en.wikipedia.org/wiki/Tyrannosaurus');
  await commands.scroll.toBottom(250);
  return commands.measure.stop();
}

And then run it like this (using the built in CPU throttling in Chrome to get simulate a slower computer):

sitespeed.io desktop.cjs --multi -n 5 --cpu --browsertime.chrome.CPUThrottlingRate 16

And then looking at long tasks and the devtools time line log and compare it to the exact same tests, except not changing the HTML.

For a first look, I couldn't see anything that would suggest a slow down. I'm gonna push it to a test server to just be 100% sure and then I'll have a look to push a test page so I can run the same test in Firefox (Firefox do not have the on the fly capabilities to change the HTML at the moment as I know).

Change 972830 had a related patch set uploaded (by Phedenskog; author: Phedenskog):

[performance/synthetic-monitoring-tests@master] Test dark mode on a dedicated server.

https://gerrit.wikimedia.org/r/972830

Change 972830 merged by jenkins-bot:

[performance/synthetic-monitoring-tests@master] Test dark mode on a dedicated server.

https://gerrit.wikimedia.org/r/972830

Change 972836 had a related patch set uploaded (by Phedenskog; author: Phedenskog):

[performance/synthetic-monitoring-tests@master] Throttle the CPU for dark mode tests.

https://gerrit.wikimedia.org/r/972836

Change 972836 merged by jenkins-bot:

[performance/synthetic-monitoring-tests@master] Throttle the CPU for dark mode tests.

https://gerrit.wikimedia.org/r/972836

Change 972865 had a related patch set uploaded (by Phedenskog; author: Phedenskog):

[performance/synthetic-monitoring-tests@master] Remove CPU throttling for dark mode tests.

https://gerrit.wikimedia.org/r/972865

Change 972865 merged by jenkins-bot:

[performance/synthetic-monitoring-tests@master] Remove CPU throttling for dark mode tests.

https://gerrit.wikimedia.org/r/972865

Change 973080 had a related patch set uploaded (by Phedenskog; author: Phedenskog):

[performance/synthetic-monitoring-tests@master] Test CPU throttling 7.

https://gerrit.wikimedia.org/r/973080

Change 973080 abandoned by Phedenskog:

[performance/synthetic-monitoring-tests@master] Test CPU throttling 7.

Reason:

Lets run these tests with WebPageReplay and a physical throttled CPU.

https://gerrit.wikimedia.org/r/973080

Change 973121 had a related patch set uploaded (by Phedenskog; author: Phedenskog):

[performance/synthetic-monitoring-tests@master] Update container to support user journeys tests with WebPageReplay.

https://gerrit.wikimedia.org/r/973121

Change 973121 merged by jenkins-bot:

[performance/synthetic-monitoring-tests@master] Update container to support user journeys tests with WebPageReplay.

https://gerrit.wikimedia.org/r/973121

We will start getting results tomorrow Tuesday and will be able to make conclusions on Thursday.

Summary

I haven't seen anything that shows any regression. I've been running tests on a slow down desktop computer and a Moto G5.

More info

I've been running tests on a bare metal server for desktop and locally on a rooted Moto G5. It took some time because I'm also working on a better alert system for our monitoring and I planned to use parts of that in the testing on the Moto G5, I'll explain that soon. When that project is finished the idea is that anyone can run these tests and get reliable metrics and reliable answer if there's any regressions.

For desktop I started to run some tests locally and couldn't seen difference in Chrome but I pushed it to the bare meta server to be 100% sure. The thing is that the metrics we push to Graphite is little bit hard to read, there some fluctuations going on there, so I've changed to use the same model on the desktop tests as on mobile.

To get better signal if there are any difference in the measurements I used the https://en.wikipedia.org/wiki/Mann–Whitney_U_test . That something Timo implemented for Fresnel (I used a slightly modified version) and Mozilla uses it for its analyse when trying to find regression. Mann Whitney helps us answer the question if there's any significance different between two datasets. I used one way tests, looking for the answer "Is there any significant difference between the default page vs the modified version".

The data sets needs to large for it work so for the tests I've been running 31 runs testing agains the default Tyrannosaurus and then 31 runs against a modified version with the dark background and then let Mann Whitney do the job. I've been looking at CPU long tasks (tasks longer than 50 ms) and some internal Chrome performance data that you can get from the CDP protocol.

Mobile test

I've been running the tests multiple times be sure that I get the same result. I've been using a rooted Moto G5 so I can pin the CPU speed to be the same all the time through the tests. The CPU speed is the 90 percentile of users in India and 65+ in South Africa (maybe we should look at a slower device for South Africa users).

I also artificially slowed down one of the tests (by adding more trace categories to the Chrome trace log) to see that Mann Whitney would signal the change, and it does.

There are many metrics here, because I'm building the measurements to be generic. There some numbers here I use to just verify that the data is ok, we can focus Mann Whitney U number (it should be lee than 0.05 to be a significant change). You can see that no metrics have a significant change. The metrics I focused on are:

  • Total blocking time / total duration = how long time is the main thread blocked on long tasks
  • Number of long tasks = how many long tasks happens
  • Last long task = when do the last long task happen

Screenshot 2023-11-14 at 13.20.13.png (1×2 px, 434 KB)

All metrics here aren't self explaining, so please let me know If something is unclear.

I also graphed all the data for the runs to get a feeling for the metrics. The baseline is when we test the page as is, and the "current" is the modified version. When the dots are large, it means both series share the same data.

Screenshot 2023-11-14 at 13.21.11.png (544×2 px, 127 KB)

Screenshot 2023-11-14 at 13.21.04.png (540×2 px, 110 KB)

Screenshot 2023-11-14 at 13.21.16.png (606×2 px, 134 KB)

Screenshot 2023-11-14 at 13.20.30.png (560×2 px, 133 KB)

I also manually browsed a couple of trace logs from Chrome but nothing stranger there.

Please let me know if you think something is missing and I'll add that or rerun the tests.

I'll update task later today with desktop data.

Desktop

For the desktop I've slowed down the CPU so we hit almost the 99 percentile of our users CPU speed on desktop in India and South Africa. Running the same test, hitting the desktop domain.

There's no significant change for the focus metrics:

Screenshot 2023-11-14 at 18.45.57.png (1×2 px, 460 KB)

And the raw data points looks like this:

Screenshot 2023-11-14 at 18.46.25.png (534×2 px, 114 KB)

Screenshot 2023-11-14 at 18.46.31.png (598×2 px, 132 KB)

Screenshot 2023-11-14 at 18.46.35.png (642×2 px, 134 KB)

Screenshot 2023-11-14 at 18.46.09.png (564×2 px, 137 KB)

These tests were running on my own machine, so it would have been favourable to run it on a standalone server, but I think this is good enough for now.

Thanks for the analysis so far!

One last addition we should test we should apply all the rules in https://en.m.wikipedia.org/wiki/MediaWiki:Gadget-dark-mode.css to simulate community overrides and see if they impact the results.

I've added a test where I also inject the https://en.m.wikipedia.org/wiki/MediaWiki:Gadget-dark-mode.css before the hue-rotate and the head tag. Comparing the just adding the hue-rotate for people that already have that CSS didn't do any difference at all.

Screenshot 2023-11-15 at 16.15.09.png (1×2 px, 453 KB)

I also benchmarked the Gadget-dark-mode.css against the default page without any dark mode set and it actually picked up a small but significant change in time spent in recalculate styles and a couple of other layout metrics:

Screenshot 2023-11-15 at 16.17.12.png (1×2 px, 443 KB)

This is reassuring.

Given https://en.wikipedia.org/wiki/Wikipedia_talk:Dark_mode_(gadget)#Extreme_slowness_while_having_this_enabled is there also a way to run this with hardware acceleration disabled in Chrome (or is that captured in the above tests?) ?

way to run this with hardware acceleration disabled in Chrome (or is that captured in the above tests?) ?

No it's not captured in the above tests. I wonder though how many that actually disables the GPU and if you do, you probably have the same problem on other web sites too? However I did some testing, you can disable the GPU by adding --disable-gpu to Chrome when you start it, then I verified it by accessing chrome://gpu in the browser, then you can see if its turned off or not. Disabling worked and when I run the tests I couldn't get any is siginifcant change in the metrics, so I cannot reproduce what the reporters say un the "extreme slowness". I wonder if that is Firefox only?

After meeting with @Peter We went through the tables and agreed there is no obvious regression in any case of the dark-mode on either mobile or desktop sites.

@Peter will have a followup task created to generalise the approach of these tests for future uses. The newer way would allow us to easily tests for any performance changes regarding dark-mode or similar changes easier and have faster feedback loop.

Thanks @Peter

Thanks @Peter (and @Mabualruz ) for all the analysis! It's good to know that we couldn't find any performance issues and that we're be able to generalize the approach.
I have created T351556 as a placeholder for re-running these tests again on the implemented version of dark mode rather than a best-guess as we have here.
Feel free to create any subtasks to that one relating to any generalization we'll need to do beforehand.