Page MenuHomePhabricator

Investigate performance regression 2025-12-18 on it.wikipedia.org
Open, Needs TriagePublic

Description

There was an two alerts firing the 19th of December:

  • group 1: First Visual Change Firefox desktop
  • group 1: Largest Contentful Paint Firefox desktop

All dashboards/graphs can be found at the page drill down dashboard.

Checking if the change was significant (this is for https://it.wikipedia.org/wiki/Jannik_Sinner), yes it was:

Screenshot 2026-01-08 at 07.43.57.png (782×3 px, 169 KB)

And looking at some metrics:

Screenshot 2026-01-08 at 07.49.03.png (1×3 px, 420 KB)

Also verified that there's no banners etc:
{F71467142 width=100%}
{F71467143 width=100%}

Is it the same for emulated mobile and using Chrome?

This is the same page for Chrome:

Screenshot 2026-01-08 at 07.51.36.png (1×3 px, 420 KB)

It also happens in Chrome but there was no alert fired. Looking at all URLs that was tested for Chrome we can see that three URLs has an regression and the fourth has actually better performance, that's why those alerts never fired.

Screenshot 2026-01-08 at 07.56.23.png (718×2 px, 201 KB)

For emulated mobile (testing the mobile version) it's the same pattern for LCP. There's one URL that made the alerts to not fire.

Screenshot 2026-01-08 at 08.05.05.png (1×1 px, 277 KB)

There's seems to be some tuning of the alerts that needs to be fixed, lets create another task for that.

So what was the reason for the regression?

Event Timeline

ssastry added subscribers: cscott, MSantos, ssastry.

This is all related to rollout of Parsoid read views on itwiki on Dec 18th. I had looked at these dashboards (sorry this link is staff only since it is a reference to a slack discussion) on Dec 20th. My overall conclusion was that some metrics had degradation but nothing too severe and some metrics saw an improvement. So, there is nothing immediate that we needed to do.

But, if you see anything that is particularly concerning to you, please flag it here. Thanks!

Looking back at that slack thread, I may have looked at Chrome results, not Firefox (since I didn't realize till today that there are two browsers in that drop down). In any case, looks like the behavior on Chrome is better than on Firefox it appears.

We did a "cold cache" roll out on itwiki as well, so I'm also interested to know if performance at present is consistent with the performance on Dec 18 or if it has improved as the caches have filled up.

Our plan for future roll outs to top-50 wikis is to use an incremental rollout (which wasn't ready in time for itwiki), so knowing how long it took for performance to get back to 'normal' would help us figure out how slowly to do our incremental rollout.

I doubt it is related to cold cache (since it is the same page being tested repeatedly). I think this is html related (html size, extra dom nodes). In any case, I already checked a 30 day window and the effects linger across the time window.

We can see the regression in the metrics that Google collects in the Crux data (it's a 28 days rolling averages so it takes some time before changes is visible).

Screenshot 2026-01-09 at 08.14.25.png (826×3 px, 244 KB)

Looking at the yearly graphs it looks though that those metrics is coming and going (but I think its possible to see the increase in the end of the year):

Screenshot 2026-01-09 at 08.16.22.png (1×3 px, 332 KB)

What's worrying (from another perspective) is that I cannot see it in our own real user metrics (here or here). Or at least its hard to see, we should look into make those metrics/graphs more actionable.

Peter removed Peter as the assignee of this task.Jan 9 2026, 8:46 AM

There is a bug with tablets which might explain the layout shift as it causes a flash of unstyled content T414221: Frontend performance issue for tablet devices loading parsoid content but that should only impact tablet devices (resolution between 640px and 1100px).

The *28 day average* in the yearly stats starts trending up in November in @Peter's graphs, while PRV didn't roll out to itwiki until Dec 18.

There is a modest increase in the daily graphs in December which I think is attributable to Parsoid, but we have only measured a ~10% performance drop in average times from the Parsoid to legacy transition. (And we are working on this, because our average times should be mostly measuring caching, and that pipeline is unchanged between legacy and Parsoid. The increase might be due to different cache hit rates or some other factor.)

For the larger change shown in the yearly graphs it seems like maybe we should be looking for something which happened in mid- or late October, given where it seems to start, unless I'm misunderstanding how their rolling average is computed.