Page MenuHomePhabricator

Benchmark baremetal vs k8s mediawiki perf (2023)
Open, Stalled, MediumPublic

Description

It's ~2 years after we completed the initial benchmarks and PHP/container tuning at T280497: Benchmark performance of MediaWiki on k8s.

Given that Kubernetes traffic is still relatively low (test wikis only) it does not really factor into any of regularly monitored metrics, such as Backend Pageview Timing, load.php backend latency, and Parser performance. Two years seems like a long time, and with the wider rollout happening soon I think it's a good time to take another look to avoid any surprises.

The stability and more stastistically sound data from benchmw makes it also a good candidate to do this more regularly, e.g. perhaps perf team can do this once a quarter to have a more stable apples-to-apples comparison (where there will still be differences, but those are exactly the differences we want to not ignore, such as differences in our software, differences in our hardware, and differences in PHP tuning). (This is not a neutral "MediaWiki software" benchmark, as for that we'd want to factor out hardware and system tuning. Rather it is a "MediaWiki at WMF" benchmark.)

Additional reasons beyond that it is good to have benchmw data periodially:

A lot has changed in MediaWiki development in two years. If we've inadvertently put move pressure on baremetal strengths it's better to know now so that we can catch those regressions and fix them in core/extensions during the months ahead. Instead of finding out months later / post-rollout and having to "react" to perf-team alerts in unplanned ways. Or worse, misattributing it to slow/unrelated environment changes. It's much more helpful to have the side-by-side comparison. I expect it to be next to impossible to confidentely attribute any improvement or regression in end-user perf metrics to k8s due to many real-user variables.

Event Timeline

Krinkle triaged this task as Medium priority.

In order to do this properly, we need to do as follows, IMHO:

  1. Pick a k8s node, or even better, reimage one appserver to act as an additional k8s node with specific node taints so that no "normal" pod can be executed
  2. Add a deployment of mw-on-k8s targeting those taints, add as many replicas as we can fit in that node; remember to also allow http connections as ab doesn't work well with TLS
  3. Use benchmw against this node (with the port you chose for http) and a depooled appserver with the same generation of hardware

@Joe Thanks, that sounds good to me and would be more fair and representative indeed. However, that does of course place a dependency on additional work from your team. Is that something we can do in the next couple of weeks as part of the k8s goal?

@Krinkle, since we are approaching the end of Q4, we will discuss internally and schedule this work for Q1.

Krinkle changed the task status from Open to Stalled.Nov 13 2023, 5:41 PM
Krinkle moved this task from Soon to Within 2 Qs on the MediaWiki-Platform-Team board.