User Details
- User Since
- Aug 17 2015, 6:48 PM (448 w, 13 h)
- Availability
- Available
- IRC Nick
- phedenskog
- LDAP User
- Unknown
- MediaWiki User
- PHedenskog (WMF) [ Global Accounts ]
Sat, Mar 16
I think this is good enough now.
Fri, Mar 15
I did a first version at https://wikitech.wikimedia.org/wiki/Performance/Dictionary
Thu, Mar 14
Wed, Mar 13
Tue, Mar 12
@zeljkofilipin the first alpha of 9.0 was release earlier today https://github.com/webdriverio/webdriverio/releases/tag/v9.0.0-alpha.0 so I think we should aim for upgrading to 9 instead.
Fri, Mar 8
I think Chrome and Firefox is enough for now. adding Edge would more machines.
I think T351929 fixes most of this for us.
This is done.
We don't have android tests so there's nothing to doing right now.
I actually removed the Alexa tests. Let's sync with the web team and if we all think this is useful we can add it again.
I think we should focus on Long animation frame API (Loaf) instead T359286 since that can potentially gives us actionable work. Long tasks shows us that something is wrong but not what is wrong.
I guess this is because of the extra delay added on single clicks if the viewport is set to a specific size and using Chromedriver the browser "knows" that its a single click.
Thu, Mar 7
I got some feedback from Gilberto Cocchi that the delay come from that we set the width=1000 (meaning it waits for a double tap see https://developer.chrome.com/blog/300ms-tap-delay-gone-away). I tried with touch-action: manipulation; and that fixed it for me.
I missed update the issue: So I can reproduce on my Moto G5 when connecting devtools to Chrome and manually clicking. For automation, I don't get the same Input delays on the interactions as if I do it manually (I added T359506 for that). Right now we are stuck in understanding the Input delay we get on the interactions.
Wed, Mar 6
It was me that disabled getting that data months ago. I have enabled it again.
Tue, Mar 5
I removed those tests since they where so noisy.
I moved the tests to a bare metal server.
Hi again, I couldn't fully understand the numbers, so I asked about it on the Web Performance slack channel today and got answer from Barry Pollard who is a devrel at Google. If you are on the channel you can read the conversation here: https://webperformance.slack.com/archives/C04BK7K1X/p1709629694106149
All the tests are moved and alerts has been updated.
Mon, Mar 4
Thanks @fgiunchedi I'll turn them off. Actually it has been good for me with that alert since we run that Graphite instance ourselves and it has happened a couple of times that there's been problem with it but I I guess I could do something smarter with some other alert, I'll look into that.
Hmm the problem was the screen with Xvfb, setting it to 1 made it work. I'll look into it more tomorrow. Now the test runs at least so I can verify that all test works.
The server is installed but I'm having some problem getting all tests up and running because of the screen size for xvfb, let me see how I can fix that.
Hi! I have a question:
Hmm, I haven't got access to the server yet. I got the email about the purchase but nothing in the GUI yet. I've turned off the tests and need to turn off the alarms too. The last I added a server, it took some time, so I ordered another one. But hopefully this will come through soonish.
I've just started the move, hopefully it will be done in a couple of hours.
Thu, Feb 22
This started to happen again today. I wonder @fgiunchedi can you help me point in the right direction how long timeout we have today for the alerts that goes to the synthetic graphite instance? I looked at our instance but couldn't see anything.
The cleanup is done so lets do this the 4th of March.
Cleanup done.
It seems like this was a temporary glitch.
This must be some glitch in Grafana. After many tries and deleting the rules and re-create them, they work as expected.
So this annoying. When add graph in Grafana to graph number of pages that have regression:
Wed, Feb 21
Tue, Feb 20
Mon, Feb 19
Feb 16 2024
This has been changed. To close I need to remove the old data in Graphite.
When we do this change I want to do two more changes:
- remove the latency for the "old" test so we run all the tests the same way. That will makes our tests run faster.
- change group 1 tests to use it.wikipedia.org. Pages there have more content and that will make it easier to spot regressions.
Feb 15 2024
Feb 14 2024
Feb 13 2024
What I've done now is reconfigured a personal bare metal server and then I'm gonna keep that running for a couple of days and se how stable metrics we can with direct tests.
I think this is showing when things started go wrong. It's the same for all Chrome URLs that we test. What is strange though is that we don't see the same on Firefox (on the same server). I verified that it's the exact same Chrome version.
Feb 9 2024
This happened again last night. I've been comparing processes and in the high states there is kworker (kernel worker threads) but I having a hard time digging deeper.
Feb 8 2024
Looking at the server stats I can see that everytime this happens (increase in first visual change, total blocking time) we have an decrease in CPU idle time and decrease ps state running.
I cannot see any correlation for the CPU benchmark metric and the TBT.
All WebPageReplay tests has been updated to use Mann Whitney. I've updated the documentation too, now I need to finish the blog post to close this task.
The numbers have gone back. We can see that long tasks changed, and that made first visual change better.
Feb 7 2024
I did some checks and for linting I think it will only help us if we used typescript with rules like @typescript-eslint/await-thenable, @typescript-eslint/require-await and @typescript-eslint/no-floating-promises.
I'm updating the group 0 and group 1 alerts today for desktop. What's missing to close this is then to do the blog post and do emulated mobile tests for with significant change.
Feb 6 2024
I removed some old test (beta cluster) and increased the number or runs for group 0 and 1 and then turned on Mann Whitney tests for them too.
Feb 5 2024
I got some feedback from Greg:
- First let's run the exact same test against the exact same content throughout a day or two and see the variance we have. I need to do some changes in our implementation because today we alway re-record against a fresh version of the page. With the fix, we will run against the exact same version (the way we tried with the static Banksy page). Maybe we can use the Bansky page first.
- When we have a significant change, alert of the change is larger than 2%. Today we alert on all significant changes.
{ "android": true, "replay": true, "browsertime": { "android": { "rooted": true }, "connectivity": { "engine": "throttle", "throttle": { "localhost": true }, "profile": "custom", "alias:": "120rtt", "rtt": 120 }, "chrome": { "args": [ "host-resolver-rules=MAP *:80 127.0.0.1:8085,MAP *:443 127.0.0.1:8086,EXCLUDE localhost", "ignore-certificate-errors-spki-list=PhrPvGIaAMmd29hj8BCZOq096yj7uMpRNHpn5PDxI6I=", "user-data-dir=/data/local/tmp/chrome/" ] }, "firefox": { "preference": [ "network.dns.forceResolve:127.0.0.1", "security.OCSP.enabled:0", "network.socket.forcePort:80=8085;443=8086"
I just re-deployed the server direct1 to see if we get any difference in metrics. There's an annotation when I deployed so lets wait a couple hours and check.
Feb 4 2024
Looking at the Chrome trace log there's a layout phase before FCP that causes it that is something like 300 ms longer. It the same amount of elements though.
A next step tomorrow could be to just destroy that instance and then deploy a new and run the tests there to see if it's dependent on the instance.
I've looked at the instances at https://grafana.wikimedia.org/d/Vikt09zIk/performance-synthetic-server-metrics and the instance that run these tests seems to use more memory than the rest of the instances. I've rebooted the instance and will restart the tests. I'm just worried that these instabilities comes from the Hetzner cloud instance.
Adding some more data over time. This is for Largest Contentful Paint:
Ok, this has been running for over a month now. I'm having a meeting with Gregory Mierzwinski tomorrow for some feedback. Since I've upgraded to latest WebPageReplay we have had no instability in metrics for Chrome. For Firefox we had one page that goes back and forth. Feedback from Greg is that if we only have one URL that fails, maybe we should skip that URL.
Feb 2 2024
I removed the alerts and will look at the root cause when I'm back from FOSDEM on Monday, I'll keep the task open so I don't forget.
Feb 1 2024
I verified using Chrome on my rooted Samsung A51, the metrics are stable. I'll add my configuration here before I close the issue.
This has come back again for all articles tested using Chrome direct tests:
Jan 31 2024
Jan 29 2024
This is done, it's much faster than the old 4.
The full trace log looks like this:
Running from the Raspberry Pi, running 11 runs, one or two of the runs get ERROR: WebDriverError: Failed to decode response from marionette. To get some more useful out of that I need to turn on the Geckodriver log.
Jan 26 2024
This actually works. The first time I tried it failed but after that it just works.