Benchmark baremetal vs k8s mediawiki perf (2023)
Open, Stalled, MediumPublic
Actions

Assigned To

Authored By

	Krinkle
	Mar 28 2023, 2:36 AM

Description

It's ~2 years after we completed the initial benchmarks and PHP/container tuning at T280497: Benchmark performance of MediaWiki on k8s.

Given that Kubernetes traffic is still relatively low (test wikis only) it does not really factor into any of regularly monitored metrics, such as Backend Pageview Timing, load.php backend latency, and Parser performance. Two years seems like a long time, and with the wider rollout happening soon I think it's a good time to take another look to avoid any surprises.

The stability and more stastistically sound data from benchmw makes it also a good candidate to do this more regularly, e.g. perhaps perf team can do this once a quarter to have a more stable apples-to-apples comparison (where there will still be differences, but those are exactly the differences we want to not ignore, such as differences in our software, differences in our hardware, and differences in PHP tuning). (This is not a neutral "MediaWiki software" benchmark, as for that we'd want to factor out hardware and system tuning. Rather it is a "MediaWiki at WMF" benchmark.)

Additional reasons beyond that it is good to have benchmw data periodially:

A lot has changed in MediaWiki development in two years. If we've inadvertently put move pressure on baremetal strengths it's better to know now so that we can catch those regressions and fix them in core/extensions during the months ahead. Instead of finding out months later / post-rollout and having to "react" to perf-team alerts in unplanned ways. Or worse, misattributing it to slow/unrelated environment changes. It's much more helpful to have the side-by-side comparison. I expect it to be next to impossible to confidentely attribute any improvement or regression in end-user perf metrics to k8s due to many real-user variables.

Related Objects
Search...

Status	Assigned	Task
Stalled	None	T255792 Quibble runs core:unit tests twice!
Open	None	T328919 Upgrade to PHPUnit 10
Open	None	T338103 Micro-optimize ApiResult::isMetadataKey with str_starts_with once we support PHP8+
Open	None	T328921 Drop PHP 7.4 support from MediaWiki
Stalled	None	T334726 Use return type `never` in Wikibase
Open	None	T328922 Drop PHP 8.0 support from MediaWiki
Stalled	None	T319055 Upgrade to psr/container 2.x
Stalled	Krinkle	T319432 Migrate WMF production from PHP 7.4 to PHP 8.1
Open	None	T291916 Tracking task for Bullseye migrations in production
Stalled	None	T356293 Migrate MW appservers' base images to bullseye
Open	None	T290536 Serve production traffic via Kubernetes
Stalled	Krinkle	T333269 Benchmark baremetal vs k8s mediawiki perf (2023)

Event Timeline

Krinkle created this task.Mar 28 2023, 2:36 AM

Peter subscribed.Mar 28 2023, 8:49 AM

larissagaulia moved this task from Inbox, needs triage to To-do: Goals, prioritized next 4 Quarters on the Performance-Team board.Apr 17 2023, 6:40 PM

Krinkle updated the task description. (Show Details)May 4 2023, 4:58 PM

Krinkle claimed this task.May 4 2023, 5:37 PM

Krinkle triaged this task as Medium priority.

In order to do this properly, we need to do as follows, IMHO:

Pick a k8s node, or even better, reimage one appserver to act as an additional k8s node with specific node taints so that no "normal" pod can be executed
Add a deployment of mw-on-k8s targeting those taints, add as many replicas as we can fit in that node; remember to also allow http connections as ab doesn't work well with TLS
Use benchmw against this node (with the port you chose for http) and a depooled appserver with the same generation of hardware

Krinkle moved this task from To-do: Goals, prioritized next 4 Quarters to To-do: Goals prioritized current Quarter on the Performance-Team board.Jun 15 2023, 5:41 PM

@Joe Thanks, that sounds good to me and would be more fair and representative indeed. However, that does of course place a dependency on additional work from your team. Is that something we can do in the next couple of weeks as part of the k8s goal?

@Krinkle, since we are approaching the end of Q4, we will discuss internally and schedule this work for Q1.

jijiki added a project: serviceops.Jun 27 2023, 8:23 AM

Clement_Goubert subscribed.Jun 27 2023, 4:23 PM

Clement_Goubert moved this task from Incoming 🐫 to this.quarter 🍕 on the serviceops board.Aug 1 2023, 11:51 AM

Krinkle edited projects, added MediaWiki-Platform-Team; removed Performance-Team.Aug 10 2023, 10:16 PM

Krinkle moved this task from Inbox, needs triage to Blocked/waiting on the MediaWiki-Platform-Team board.Aug 11 2023, 3:17 AM

ArielGlenn subscribed.Sep 25 2023, 1:21 PM

larissagaulia moved this task from Blocked/waiting to Soon on the MediaWiki-Platform-Team board.Oct 9 2023, 1:40 PM

Krinkle changed the task status from Open to Stalled.Nov 13 2023, 5:41 PM

Krinkle moved this task from Soon to Within 2 Qs on the MediaWiki-Platform-Team board.

Krinkle moved this task from Within 2 Qs to Blocked/waiting on the MediaWiki-Platform-Team board.Nov 16 2023, 11:25 PM

Benchmark baremetal vs k8s mediawiki perf (2023)Open, Stalled, MediumPublicActions

Description

Related ObjectsSearch...

Event Timeline

Benchmark baremetal vs k8s mediawiki perf (2023)
Open, Stalled, MediumPublic
Actions

Related Objects
Search...