Page MenuHomePhabricator

Make mw-experimental production ready
Closed, ResolvedPublic

Description

mw-experimental is currently deployed and working on wikikube-worker2100, albeit it does not meet out production standards.

Production Readiness Checklist

  • has a separate puppet profile 1156392
  • properly picks up the appropriate image version 1156410
  • systemd timer to update the code 1156410
  • it is accessible via XWD 1154070 1154069
  • one pod per DC running on specific VMs T397051 1159518
  • runs debug mw image 1159524
  • add CNAMES for mw-experimental.(eqiad|codfw).wmnet pointing to wikikube-worker-exp1001 & wikikube-worker-exp2001
  • deployers can refresh the code on demand 1161469
  • basic functionality is documented: mw-experimental on wikitech

Event Timeline

jijiki changed the task status from Open to In Progress.Jun 12 2025, 3:37 PM
jijiki triaged this task as Medium priority.

Change #1156392 had a related patch set uploaded (by Effie Mouzeli; author: Effie Mouzeli):

[operations/puppet@production] kubernetes: create mediawiki_experimental profile

https://gerrit.wikimedia.org/r/1156392

Change #1156410 had a related patch set uploaded (by Effie Mouzeli; author: Effie Mouzeli):

[operations/puppet@production] profile::kubernetes::mediawiki_experimental: properly determine latest image

https://gerrit.wikimedia.org/r/1156410

Change #1159524 had a related patch set uploaded (by Effie Mouzeli; author: Effie Mouzeli):

[operations/puppet@production] kubernetes.yaml: switch mw-experimental to debug image

https://gerrit.wikimedia.org/r/1159524

Change #1156392 merged by Effie Mouzeli:

[operations/puppet@production] kubernetes: create mediawiki_experimental profile

https://gerrit.wikimedia.org/r/1156392

Change #1156410 merged by Effie Mouzeli:

[operations/puppet@production] profile::kubernetes::mediawiki_experimental: properly update latest image

https://gerrit.wikimedia.org/r/1156410

Change #1159524 merged by Effie Mouzeli:

[operations/puppet@production] kubernetes.yaml: switch mw-experimental to debug image

https://gerrit.wikimedia.org/r/1159524

Change #1160158 had a related patch set uploaded (by Effie Mouzeli; author: Effie Mouzeli):

[operations/dns@master] wmnet: add mw-exprimental CNAMES

https://gerrit.wikimedia.org/r/1160158

Change #1160158 merged by Effie Mouzeli:

[operations/dns@master] wmnet: add mw-exprimental CNAMES

https://gerrit.wikimedia.org/r/1160158

Change #1160173 had a related patch set uploaded (by Effie Mouzeli; author: Effie Mouzeli):

[operations/deployment-charts@master] otel: add tolerations for mw-experimental hosts

https://gerrit.wikimedia.org/r/1160173

Change #1160236 had a related patch set uploaded (by Effie Mouzeli; author: Effie Mouzeli):

[operations/puppet@production] mediawiki_experimental: $kubernetes_release dir fix

https://gerrit.wikimedia.org/r/1160236

Change #1160236 merged by Effie Mouzeli:

[operations/puppet@production] mediawiki_experimental: $kubernetes_release dir fix

https://gerrit.wikimedia.org/r/1160236

Change #1161469 had a related patch set uploaded (by Effie Mouzeli; author: Effie Mouzeli):

[operations/puppet@production] admin.yaml: allow deployers to run mw-experimental-mediawiki-image-update

https://gerrit.wikimedia.org/r/1161469

jijiki updated the task description. (Show Details)

Change #1161477 had a related patch set uploaded (by Effie Mouzeli; author: Effie Mouzeli):

[operations/puppet@production] mediawiki_experimental: add motd

https://gerrit.wikimedia.org/r/1161477

Change #1161469 merged by Effie Mouzeli:

[operations/puppet@production] admin.yaml: allow deployers to run mw-experimental-mediawiki-image-update

https://gerrit.wikimedia.org/r/1161469

Change #1161477 merged by Effie Mouzeli:

[operations/puppet@production] mediawiki_experimental: add motd

https://gerrit.wikimedia.org/r/1161477

Change #1160173 merged by jenkins-bot:

[operations/deployment-charts@master] otel: add tolerations for mw-experimental hosts

https://gerrit.wikimedia.org/r/1160173

The easy way to make private security patches was to make the edit on the deployment host in /srv/mediawiki-staging/private and then use scap pull on a debug host to test it. (That avoids the complexity of having to ssh/scp file changes between production hosts, as the deployment host is where the code eventually needs to live, and you are not supposed to download /private files to your computer, even temporarily.) Will that still work?

The easy way to make private security patches was to make the edit on the deployment host in /srv/mediawiki-staging/private and then use scap pull on a debug host to test it. (That avoids the complexity of having to ssh/scp file changes between production hosts, as the deployment host is where the code eventually needs to live, and you are not supposed to download /private files to your computer, even temporarily.) Will that still work?

For the time being this is not possible, as there is no scap integration with mw-experimental, and sadly it was not included in the requirements. However, we can work out a solution to address this.

It would be great to find a solution for it. I don't do security changes often though, so maybe worth asking someone from the Security team (@sbassett?) for confirmation that this is a real use case.

I also find scap pull more convenient for arbitrary debugging, mostly because git stat / git diff make it easy to keep track of what I changed (and scap pull is so fast for PHP-only changes that running it barely makes a difference) but that's a minor thing.

It would be great to find a solution for it. I don't do security changes often though, so maybe worth asking someone from the Security team (@sbassett?) for confirmation that this is a real use case.

The Security-Team does use the "scap pull to an mw-debug" method from time-to-time for security deployments; typically if it's a larger security patch or we're uncertain about what else it might break in production. The more typical case is to test security patches locally via MediaWiki-Docker et al and then run some basic checks (php -l etc) just prior to deployment. And of course be ready to roll back quickly if we notice a big jump in error rates via logstash. So while it's not an emergency for the Security-Team, it would be nice to retain some comparable functionality for mw-experimental.

scap pull is tied to the legacy bare metal PHP deployment process that is rapidly becoming obsolete. Improving workflows for testing security patch changes is something that we should figure out how to invest in, but we should also be thinking about how to do this without requiring that we retain legacy tooling and workflows as canonical solutions.

There will be work resuming in the July-September 2025 quarter related to https://www.mediawiki.org/wiki/Wikimedia_Release_Engineering_Team/Pretrain. That work is not expected to be directly about mw-experimental at this point, but we will be poking at scap things including figuring out how to make sure we apply security patch changes to the wmf/next containers as updates are made to those patches. We might be able to use some time to at least gather better requirements and wishes for security patch workflows at the same time and then try to figure out how to make some progress on those needs.

jijiki claimed this task.

opened T397916 for the concerns raised, marking this task as done.

Change #1181673 had a related patch set uploaded (by Clément Goubert; author: Clément Goubert):

[operations/puppet@production] mw_experimental: Fix PuppetConstantChange alert

https://gerrit.wikimedia.org/r/1181673

Change #1181673 merged by Clément Goubert:

[operations/puppet@production] mw_experimental: Fix PuppetConstantChange alert

https://gerrit.wikimedia.org/r/1181673