Page MenuHomePhabricator

Migrate Security-Team jobs to mw-cron
Closed, ResolvedPublic

Description

Migrate Security-Team periodic mediawiki jobs from mwmaint to mw-cron on kubernetes.

Job nameCriticalityDone?
mediawiki_job_generatecaptcha.timerHY

Doc on the new platform

ServiceOps new will handle migrating the jobs, but would appreciate input from Security-Team on:

  • jobs that should be watched more
  • jobs that are low criticality and could be migrated first
  • outdated jobs that can be removed
  • any potential gotchas in the way these jobs use MediaWiki

Details

Related Changes in Gerrit:
SubjectRepoBranchLines +/-
operations/puppetproduction+2 -2
operations/puppetproduction+1 -2
operations/deployment-chartsmaster+10 -18
operations/deployment-chartsmaster+5 -5
mediawiki/extensions/ConfirmEditmaster+11 -0
mediawiki/extensions/ConfirmEditwmf/1.45.0-wmf.3+11 -0
mediawiki/extensions/ConfirmEditwmf/1.45.0-wmf.3+32 -17
mediawiki/extensions/ConfirmEditmaster+33 -13
mediawiki/extensions/ConfirmEditREL1_43+32 -17
mediawiki/extensions/ConfirmEditREL1_44+32 -17
mediawiki/extensions/ConfirmEditREL1_42+32 -17
mediawiki/extensions/ConfirmEditmaster+32 -17
operations/puppetproduction+34 -14
mediawiki/extensions/ConfirmEditREL1_43+1 -1
mediawiki/extensions/ConfirmEditwmf/1.45.0-wmf.3+1 -1
mediawiki/extensions/ConfirmEditwmf/1.45.0-wmf.2+1 -1
mediawiki/extensions/ConfirmEditREL1_44+1 -1
mediawiki/extensions/ConfirmEditmaster+1 -1
operations/puppetproduction+14 -34
operations/deployment-chartsmaster+52 -2
operations/puppetproduction+73 -23
operations/puppetproduction+21 -0
Show related patches Customize query in gerrit

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Reverting for now, the font issue can be solved quickly, but fixing the shellout is going to be a little more complex. We basically need to do one of two things:

  1. Find a way around limit.sh in UnboxedCommand (which doesn't seem possible as is), as it's relying on cgroup functionality that isn't available inside kubernetes containers
  2. Convert the GenerateFancyCaptcha.php code to use BoxedCommand through one of the shellbox deployments (possibly need to change the captcha.py script as well to have it generate an archive we can easily pass between services rather than 10000 individual images)

cc @Krinkle would you have some time to help with one of these solutions?

Clement_Goubert changed the task status from Open to Stalled.May 22 2025, 4:48 PM

Change #1149438 merged by Clément Goubert:

[operations/puppet@production] Revert "mw::maintenance: Migrate generatecaptcha to mw-cron"

https://gerrit.wikimedia.org/r/1149438

There may be a third way of working around this, possibly quicker than both the above options, which would be to either add functionality to GenerateFancyCaptcha.php or create a new UploadFancyCaptcha.php script that gets pointed to a directory of captcha images, and have the Job run captcha.py --args --dir /tmp/whatever && mwscript UploadFancyCaptcha.php --dir /tmp/whatever.

Clement_Goubert changed the task status from Stalled to In Progress.May 26 2025, 10:21 AM

@Joe is going to try to make the approach outlined in my last comment work quickly for the sake of finishing up the migration, but we don't think it's a good long term solution.

@Reedy, could you have a look at a longer term proper solution like #2 from this comment, or an alternative proposition if you have one?

This is the last maintenance job left to migrate to mw-cron.

Slight side note (I mentioned this on the merged-in task, but I just wanted to make sure it's seen by ServiceOps new!): if/given that the generatecaptcha job is owned by the Security-Team, should any auto-created @phaultfinder tasks about it be tagged with Security-Team, rather than (or in addition to) the generic Security tag?

Slight side note (I mentioned this on the merged-in task, but I just wanted to make sure it's seen by ServiceOps new!): if/given that the generatecaptcha job is owned by the Security-Team, should any auto-created @phaultfinder tasks about it be tagged with Security-Team, rather than (or in addition to) the generic Security tag?

Noted on my side, I'll let a member of Security-Team have the last word on how they want the alert tagged.

Change #1150624 had a related patch set uploaded (by Hnowlan; author: Hnowlan):

[operations/puppet@production] alertmanager: adjust phab project to security-team rather than security tag

https://gerrit.wikimedia.org/r/1150624

this is blocking PHP upgrade and hence the mediawiki release. Could we get this looked into as high priority?
@Krinkle , @MSantos - FYI

Find a way around limit.sh in UnboxedCommand (which doesn't seem possible as is)

This sounds like something that should be fixed generally, rather than hacking around it in a job-specific way.

Is there a different way to enforce limits that would be more appropriate for Kubernetes?

this is blocking PHP upgrade and hence the mediawiki release. Could we get this looked into as high priority?
@Krinkle , @MSantos - FYI

How does it block the MediaWiki release?

this is blocking PHP upgrade and hence the mediawiki release. Could we get this looked into as high priority?
@Krinkle , @MSantos - FYI

How does it block the MediaWiki release?

Not necessary a blocker to the release but to the OKR 5.4 that intends to sync PHP Upgrades with MW Release.

Find a way around limit.sh in UnboxedCommand (which doesn't seem possible as is)

This sounds like something that should be fixed generally, rather than hacking around it in a job-specific way.

Is there a different way to enforce limits that would be more appropriate for Kubernetes?

limit.sh or firejail can't run inside a container because of the need for cgroup access. The general "fix" for this is to not shell out locally, but use a remote shellbox, or just let the container enforce its local limits in a case like this where we control inputs and outputs.

So if the goal is to just not use limit.sh (based on the presence of an env flag or whatever), that's a trivial code change. Converting to shellbox is IMO enough of an effort that even though it's nice to do it, it's not reasonable to make that a requirement for everything that needs to use ShellCommand on Kubernetes.

How come this didn't come up before? There's plenty of use of UnboxedCommand outside cron scripts.

Change #1151737 had a related patch set uploaded (by Reedy; author: Reedy):

[mediawiki/extensions/ConfirmEdit@master] GenerateFancyCaptchas: Add option to run captcha.py va exec()

https://gerrit.wikimedia.org/r/1151737

So if the goal is to just not use limit.sh (based on the presence of an env flag or whatever), that's a trivial code change. Converting to shellbox is IMO enough of an effort that even though it's nice to do it, it's not reasonable to make that a requirement for everything that needs to use ShellCommand on Kubernetes.

Are we falling over https://gerrit.wikimedia.org/g/mediawiki/core/+/59fb92f2112247c87948a76955bb65c16f4d601d/includes/shell/CommandFactory.php#119 ?

'useBashWrapper' => file_exists( '/bin/bash' ),

Similarly in charts...

There doesn't seem to be an obvious (runtime?) way to override that (which has its own potential issues)...

So if the goal is to just not use limit.sh […]

Are we falling over https://gerrit.wikimedia.org/g/mediawiki/core/+/59fb92f2112247c87948a76955bb65c16f4d601d/includes/shell/CommandFactory.php#119 ?

'useBashWrapper' => file_exists( '/bin/bash' ),

Could be. In looking at is statically, I ended up in another rabit hole:

Related prior commits:

The captcha script is already calling disableSandbox() and limits( [ 'time' => 0 ] ). It seems that it is still inhering another default limit (either mem, cputime, or filesize), which is triggering the limit.sh wrapper still, per https://gerrit.wikimedia.org/g/mediawiki/libs/Shellbox/+/7fd0b3d62189fb876b01f1e57246c2e7d3a5e050/src/Command/BashWrapper.php#35.

Do we log the commands run anywhere? As if that's the actual solution, to just explicitly set all the limits to 0...

public function wrap( Command $command ) {
		$time = intval( $command->getCpuTimeLimit() );
		$wallTime = intval( $command->getWallTimeLimit() );
		$mem = intval( $command->getMemoryLimit() );
		$filesize = intval( $command->getFileSizeLimit() );
		if ( $time > 0 || $mem > 0 || $filesize > 0 || $wallTime > 0 ) {
			$cmd = '/bin/bash ' . Shellbox::escape( __DIR__ . '/limit.sh' ) . ' ' .
				Shellbox::escape( $command->getCommandString() ) . ' ' .
				Shellbox::escape(
					"SB_INCLUDE_STDERR=" . ( $command->getIncludeStderr() ? '1' : '' ) . ';' .
					"SB_CPU_LIMIT=$time; " .
					'SB_CGROUP=' . Shellbox::escape( $this->cgroup ) . '; ' .
					"SB_MEM_LIMIT=$mem; " .
					"SB_FILE_SIZE_LIMIT=$filesize; " .
					"SB_WALL_CLOCK_LIMIT=$wallTime; " .
					"SB_USE_LOG_PIPE=yes"
				);
			$command->unsafeCommand( $cmd )
				->useLogPipe();
			if ( $command->getAllowedPaths() ) {
				// If specific paths have been allowed, make sure we explicitly
				// allow limit.sh. We don't do this unconditionally because it
				// doesn't work as expected in firejail, see T274474, T182486
				$command->allowPath( __DIR__ . '/limit.sh' );
			}
		}
	}

Change #1151795 had a related patch set uploaded (by Reedy; author: Reedy):

[mediawiki/extensions/ConfirmEdit@master] GenerateFancyCaptchas: Explicitly set all limits to 0

https://gerrit.wikimedia.org/r/1151795

Do we log the commands run anywhere? As if that's the actual solution, to just explicitly set all the limits to 0...

public function wrap( Command $command ) {
		$time = intval( $command->getCpuTimeLimit() );
		$wallTime = intval( $command->getWallTimeLimit() );
		$mem = intval( $command->getMemoryLimit() );
		$filesize = intval( $command->getFileSizeLimit() );
		

When I dump these locally, I get these values right before 'time' => 0 (walltime) is applied in the Command->limits() / UnboxedCommand->wallTimeLimit() call.

  ["cpuTimeLimit":"Shellbox\Command\Command":private]=>
  int(180)
  ["wallTimeLimit":"Shellbox\Command\Command":private]=>
  int(180)
  ["memoryLimit":"Shellbox\Command\Command":private]=>
  int(314572800) # 307,200K, 300M
  ["fileSizeLimit":"Shellbox\Command\Command":private]=>
  int(104857600) # 102,400K, 100M

# shell/Command::limits
array(2) {
  ["time"]=>
  int(0)
}
array(2) {
  ["time"]=>
  int(0)
  ["walltime"]=>
  int(0)
}
array(2) {
  ["cpuTimeLimit"]=>
  int(0)
  ["wallTimeLimit"]=>
  int(0)
}

A quick search reveals that these match these:

	wgMaxShellMemory = 307_200,
	wgMaxShellFileSize = 102_400,
	wgMaxShellTime = 180,
	wgMaxShellWallClockTime = 180,

I haven't checked how they're injected exactly, but it makes sense that those would be inherited by default.

Change #1151795 merged by jenkins-bot:

[mediawiki/extensions/ConfirmEdit@master] GenerateFancyCaptchas: Explicitly set all limits to 0

https://gerrit.wikimedia.org/r/1151795

Change #1152096 had a related patch set uploaded (by Reedy; author: Reedy):

[mediawiki/extensions/ConfirmEdit@wmf/1.45.0-wmf.3] GenerateFancyCaptchas: Explicitly set all limits to 0

https://gerrit.wikimedia.org/r/1152096

Change #1152097 had a related patch set uploaded (by Reedy; author: Reedy):

[mediawiki/extensions/ConfirmEdit@wmf/1.45.0-wmf.2] GenerateFancyCaptchas: Explicitly set all limits to 0

https://gerrit.wikimedia.org/r/1152097

Change #1152098 had a related patch set uploaded (by Reedy; author: Reedy):

[mediawiki/extensions/ConfirmEdit@REL1_44] GenerateFancyCaptchas: Explicitly set all limits to 0

https://gerrit.wikimedia.org/r/1152098

Change #1152099 had a related patch set uploaded (by Reedy; author: Reedy):

[mediawiki/extensions/ConfirmEdit@REL1_43] GenerateFancyCaptchas: Explicitly set all limits to 0

https://gerrit.wikimedia.org/r/1152099

Change #1152098 merged by jenkins-bot:

[mediawiki/extensions/ConfirmEdit@REL1_44] GenerateFancyCaptchas: Explicitly set all limits to 0

https://gerrit.wikimedia.org/r/1152098

Change #1152097 merged by jenkins-bot:

[mediawiki/extensions/ConfirmEdit@wmf/1.45.0-wmf.2] GenerateFancyCaptchas: Explicitly set all limits to 0

https://gerrit.wikimedia.org/r/1152097

Change #1152096 merged by jenkins-bot:

[mediawiki/extensions/ConfirmEdit@wmf/1.45.0-wmf.3] GenerateFancyCaptchas: Explicitly set all limits to 0

https://gerrit.wikimedia.org/r/1152096

Change #1152099 merged by jenkins-bot:

[mediawiki/extensions/ConfirmEdit@REL1_43] GenerateFancyCaptchas: Explicitly set all limits to 0

https://gerrit.wikimedia.org/r/1152099

One point of note while trying to get up to speed on what was learned from the last attempt (T388531#10849236):

I see that https://gerrit.wikimedia.org/r/1149351 specified a different --font than what's historically been used in the captchaloop script [0] - freefont/FreeMonoBoldOblique.ttf vs. dejavu/DejaVuSans.ttf. Is that intentional?

The reason I ask is that the latter is definitely already present in the mediawiki CLI image, so no changes to font packages (e.g., with something like [1]) should be required if we intend to keep that the same.

[0] https://gerrit.wikimedia.org/g/operations/puppet/+/refs/heads/production/modules/mediawiki/files/captchaloop

[1] https://gitlab.wikimedia.org/repos/releng/release/-/merge_requests/178

I see that https://gerrit.wikimedia.org/r/1149351 specified a different --font than what's historically been used in the captchaloop script [0] - freefont/FreeMonoBoldOblique.ttf vs. dejavu/DejaVuSans.ttf. Is that intentional?

I couldn't see any reference FreeMonoBoldOblique it Codesearch:
https://codesearch.wmcloud.org/search/?q=FreeMonoBoldOblique

But, I do find references in old Gerrit patches: https://gerrit.wikimedia.org/r/q/FreeMonoBoldOblique

I'm guessing this may've been accidentally restored from an old Git commit.

Anecdotally:

mwdebug
krinkle@mwdebug1002:~$ ls -halF /usr/share/fonts/truetype/dejavu/
drwxr-xr-x 2 root root 4.0K Apr 10 13:01 ./
drwxr-xr-x 3 root root 4.0K Apr 10 12:59 ../
-rw-r--r-- 1 root root   36 Apr 10 13:00 .uuid

No fonts here.

mwmaint
[18:11 UTC] krinkle at mwmaint1002.eqiad.wmnet in ~
$ ls -halF /usr/share/fonts/truetype/
drwxr-xr-x 3 root root 4.0K Jul 13  2021 ./
drwxr-xr-x 5 root root 4.0K Jul 13  2021 ../
-rw-r--r-- 1 root root   36 Jul 13  2021 .uuid
drwxr-xr-x 2 root root 4.0K Jul 13  2021 dejavu/

[18:11 UTC] krinkle at mwmaint1002.eqiad.wmnet in ~
$ ls /usr/share/fonts/truetype/dejavu/
DejaVuSans-Bold.ttf  DejaVuSans.ttf  DejaVuSansMono-Bold.ttf  DejaVuSansMono.ttf  DejaVuSerif-Bold.ttf  DejaVuSerif.ttf

One font: dejavu.

k8s-mw-script
> [18:51 UTC] krinkle at deploy1003.eqiad.wmnet in ~
$ mwscript-k8s --attach -- eval.php --wiki testwiki
> var_dump(glob('/usr/share/fonts/*'));
array(1) {
  [0]=>
  string(25) "/usr/share/fonts/truetype"
}
> var_dump(glob('/usr/share/fonts/truetype/**/*.ttf'));
array(6) {
  [0]=>
  string(52) "/usr/share/fonts/truetype/dejavu/DejaVuSans-Bold.ttf"
  [1]=>
  string(47) "/usr/share/fonts/truetype/dejavu/DejaVuSans.ttf"
  [2]=>
  string(56) "/usr/share/fonts/truetype/dejavu/DejaVuSansMono-Bold.ttf"
  [3]=>
  string(51) "/usr/share/fonts/truetype/dejavu/DejaVuSansMono.ttf"
  [4]=>
  string(53) "/usr/share/fonts/truetype/dejavu/DejaVuSerif-Bold.ttf"
  [5]=>
  string(48) "/usr/share/fonts/truetype/dejavu/DejaVuSerif.ttf"
}

One font: dejavu.

Re: font, I probably checked a local copy of the script that wasn't up to date, I'll fix the command used for the k8s job.
Merging https://gerrit.wikimedia.org/r/c/operations/puppet/+/1152649 and doing a trial run now

Mentioned in SAL (#wikimedia-operations) [2025-06-02T11:32:47Z] <claime> Manual run of cronjobs/generatecaptcha on k8s - T388531

cgoubert@deploy1003:/srv/deployment-charts/helmfile.d/services/mw-cron$ kubectl logs -f generatecaptcha-manual-202506021132-f8qql mediawiki-main-app 
extensions/ConfirmEdit/maintenance/GenerateFancyCaptchas.php: Start run
Generating 10000 new captchas.. Done.

Generated 10000 captchas in 0.1 seconds
Getting a list of old captchas to delete... Done.
Copying the new captchas to storage... Done.

Copied 0 captchas to storage in 0.0 seconds
Deleting 9900 old captchas...
^C^C

I killed the job before it deleted all captchas, I'll revert and rerun on mwmaint to restore all captcha files

Mentioned in SAL (#wikimedia-operations) [2025-06-02T11:39:52Z] <claime> cgoubert@mwmaint1002:~$ sudo systemctl restart mediawiki_job_generatecaptcha.service - T388531

Apparently I didn't kill it before it deleted all captchas, leading to the container being deleted from swift (!!!) and causing 500s for all captcha protected actions.
Trying to restore the captchas by just running the script errored out

Jun 02 12:03:20 mwmaint1002 mediawiki_job_generatecaptcha[5598]: Copying the new captchas to storage...Errored.
Jun 02 12:03:20 mwmaint1002 mediawiki_job_generatecaptcha[5598]: An unknown error occurred in storage backend "global-swift-eqiad".
Jun 02 12:03:21 mwmaint1002 mediawiki_job_generatecaptcha[5598]: Removing temporary files... Done.
Jun 02 12:03:21 mwmaint1002 mediawiki_job_generatecaptcha[5598]: Whole captchas generation process took 118.6 seconds

The container is being recreated by @MatthewVernon and I will run the script from mwmaint to restore captchas.

Change #1152665 had a related patch set uploaded (by Reedy; author: Reedy):

[mediawiki/extensions/ConfirmEdit@master] GenerateFancyCaptchas: Handle MW reporting success, but no files being stored

https://gerrit.wikimedia.org/r/1152665

Mentioned in SAL (#wikimedia-operations) [2025-06-02T12:11:49Z] <claime> cgoubert@mwmaint1002:~$ sudo systemctl restart mediawiki_job_generatecaptcha.service - T388531

Change #1152749 had a related patch set uploaded (by Reedy; author: Reedy):

[mediawiki/extensions/ConfirmEdit@master] GenerateFancyCaptchas: Dont' try and delete captchas if the filename is empty

https://gerrit.wikimedia.org/r/1152749

Change #1152665 merged by jenkins-bot:

[mediawiki/extensions/ConfirmEdit@master] GenerateFancyCaptchas: Handle captcha.py not generating any captchas, but not erroring

https://gerrit.wikimedia.org/r/1152665

Change #1152763 had a related patch set uploaded (by Reedy; author: Reedy):

[mediawiki/extensions/ConfirmEdit@REL1_44] GenerateFancyCaptchas: Handle captcha.py not generating any captchas, but not erroring

https://gerrit.wikimedia.org/r/1152763

Change #1152764 had a related patch set uploaded (by Reedy; author: Reedy):

[mediawiki/extensions/ConfirmEdit@REL1_43] GenerateFancyCaptchas: Handle captcha.py not generating any captchas, but not erroring

https://gerrit.wikimedia.org/r/1152764

Change #1152765 had a related patch set uploaded (by Reedy; author: Reedy):

[mediawiki/extensions/ConfirmEdit@REL1_42] GenerateFancyCaptchas: Handle captcha.py not generating any captchas, but not erroring

https://gerrit.wikimedia.org/r/1152765

Change #1152765 merged by jenkins-bot:

[mediawiki/extensions/ConfirmEdit@REL1_42] GenerateFancyCaptchas: Handle captcha.py not generating any captchas, but not erroring

https://gerrit.wikimedia.org/r/1152765

Change #1152763 merged by jenkins-bot:

[mediawiki/extensions/ConfirmEdit@REL1_44] GenerateFancyCaptchas: Handle captcha.py not generating any captchas, but not erroring

https://gerrit.wikimedia.org/r/1152763

Change #1152764 merged by jenkins-bot:

[mediawiki/extensions/ConfirmEdit@REL1_43] GenerateFancyCaptchas: Handle captcha.py not generating any captchas, but not erroring

https://gerrit.wikimedia.org/r/1152764

Change #1153591 had a related patch set uploaded (by Reedy; author: Reedy):

[mediawiki/extensions/ConfirmEdit@wmf/1.45.0-wmf.3] GenerateFancyCaptchas: Handle captcha.py not generating any captchas, but not erroring

https://gerrit.wikimedia.org/r/1153591

Change #1151737 abandoned by Reedy:

[mediawiki/extensions/ConfirmEdit@master] GenerateFancyCaptchas: Add option to run captcha.py va exec()

Reason:

For now...

https://gerrit.wikimedia.org/r/1151737

Change #1153591 merged by jenkins-bot:

[mediawiki/extensions/ConfirmEdit@wmf/1.45.0-wmf.3] GenerateFancyCaptchas: Handle captcha.py not generating any captchas, but not erroring

https://gerrit.wikimedia.org/r/1153591

Change #1153598 had a related patch set uploaded (by Reedy; author: Reedy):

[mediawiki/extensions/ConfirmEdit@wmf/1.45.0-wmf.3] GenerateFancyCaptchas: Don't try and delete captchas if the filename is empty

https://gerrit.wikimedia.org/r/1153598

Change #1152749 merged by jenkins-bot:

[mediawiki/extensions/ConfirmEdit@master] GenerateFancyCaptchas: Don't try and delete captchas if the filename is empty

https://gerrit.wikimedia.org/r/1152749

Change #1153598 merged by jenkins-bot:

[mediawiki/extensions/ConfirmEdit@wmf/1.45.0-wmf.3] GenerateFancyCaptchas: Don't try and delete captchas if the filename is empty

https://gerrit.wikimedia.org/r/1153598

Mentioned in SAL (#wikimedia-operations) [2025-06-04T12:18:22Z] <reedy@deploy1003> Started scap sync-world: Backport for [[gerrit:rLPRI1153591423f9|GenerateFancyCaptchas: Handle captcha.py not generating any captchas, but not erroring (T388531)]], [[gerrit:1153592|captcha.py: Expand variables and user in filenames (T395810)]], [[gerrit:1153593|captcha.py: Check if output dir exists, and attempt to create it (else error) (T395804)]], [[gerrit:1153595|captcha.py: Bail out if no words were read

Mentioned in SAL (#wikimedia-operations) [2025-06-04T12:20:31Z] <reedy@deploy1003> reedy: Backport for [[gerrit:rLPRI1153591423f9|GenerateFancyCaptchas: Handle captcha.py not generating any captchas, but not erroring (T388531)]], [[gerrit:1153592|captcha.py: Expand variables and user in filenames (T395810)]], [[gerrit:1153593|captcha.py: Check if output dir exists, and attempt to create it (else error) (T395804)]], [[gerrit:1153595|captcha.py: Bail out if no words were read from wordlist (T3

Mentioned in SAL (#wikimedia-operations) [2025-06-04T12:28:13Z] <reedy@deploy1003> Finished scap sync-world: Backport for [[gerrit:rLPRI1153591423f9|GenerateFancyCaptchas: Handle captcha.py not generating any captchas, but not erroring (T388531)]], [[gerrit:1153592|captcha.py: Expand variables and user in filenames (T395810)]], [[gerrit:1153593|captcha.py: Check if output dir exists, and attempt to create it (else error) (T395804)]], [[gerrit:1153595|captcha.py: Bail out if no words were rea

Mentioned in SAL (#wikimedia-operations) [2025-06-04T13:51:40Z] <claime> Manual run of generatecaptcha on mw-cron, no delete - T388531

Change #1153634 had a related patch set uploaded (by Clément Goubert; author: Clément Goubert):

[operations/deployment-charts@master] mediawiki: Fix captcha wordlists path

https://gerrit.wikimedia.org/r/1153634

Change #1153634 merged by jenkins-bot:

[operations/deployment-charts@master] mediawiki: Fix captcha wordlists path

https://gerrit.wikimedia.org/r/1153634

Mentioned in SAL (#wikimedia-operations) [2025-06-04T14:28:52Z] <cgoubert@deploy1003> Started scap sync-world: 1153634: mediawiki: Fix captcha wordlists path - T388531

Mentioned in SAL (#wikimedia-operations) [2025-06-04T14:31:17Z] <cgoubert@deploy1003> Finished scap sync-world: 1153634: mediawiki: Fix captcha wordlists path - T388531 (duration: 02m 24s)

Mentioned in SAL (#wikimedia-operations) [2025-06-04T14:33:25Z] <cgoubert@deploy1003> Started scap sync-world: 1153634: mediawiki: Fix captcha wordlists path - T388531

Mentioned in SAL (#wikimedia-operations) [2025-06-04T14:35:59Z] <cgoubert@deploy1003> Finished scap sync-world: 1153634: mediawiki: Fix captcha wordlists path - T388531 (duration: 02m 33s)

Change #1153647 had a related patch set uploaded (by Clément Goubert; author: Clément Goubert):

[operations/deployment-charts@master] mediawiki: Fix captcha configmap structure

https://gerrit.wikimedia.org/r/1153647

Change #1153647 merged by jenkins-bot:

[operations/deployment-charts@master] mediawiki: Fix captcha configmap structure

https://gerrit.wikimedia.org/r/1153647

Mentioned in SAL (#wikimedia-operations) [2025-06-04T17:13:16Z] <cgoubert@deploy1003> Started scap sync-world: 1153647: mediawiki: Fix captcha configmap structure - T388531

Mentioned in SAL (#wikimedia-operations) [2025-06-04T17:15:56Z] <cgoubert@deploy1003> Finished scap sync-world: 1153647: mediawiki: Fix captcha configmap structure - T388531 (duration: 02m 39s)

extensions/ConfirmEdit/maintenance/GenerateFancyCaptchas.php: Start run
Current number of captchas is 9898.
Generating 2102 new captchas.. Done.
Generation script for 2102 captchas ran in 2.0 seconds
Enumerated 1697 temporary captchas in 0.3 seconds
Expecting 2102 new captchas, only 1697 found on disk; continuing
.Copying the new captchas to storage... Done.
Copied 1697 captchas to storage in 42.2 seconds
Removing temporary files... Done.
Whole captchas generation process took 44.9 seconds
extensions/ConfirmEdit/maintenance/GenerateFancyCaptchas.php: Finished run

Well, captcha generation seems to have worked. I'll investigate why it didn't create all of them tomorrow.

It is possible it has clashes (because the word list isn’t that big)…

And then doesn’t try and make another word pair instead

Same as why it seemingly had 9898 to begin with…

swift stat global-data-captcha-render | grep Objects
               Objects: 11595

All generated captchas were correctly uploaded. I think we can try and run it with --delete now.

Change #1153964 had a related patch set uploaded (by Clément Goubert; author: Clément Goubert):

[operations/puppet@production] mw::maintenance: Delete old captchas

https://gerrit.wikimedia.org/r/1153964

Change #1153964 merged by Clément Goubert:

[operations/puppet@production] mw::maintenance: Delete old captchas

https://gerrit.wikimedia.org/r/1153964

Mentioned in SAL (#wikimedia-operations) [2025-06-05T10:27:45Z] <claime> Manual run of generatecaptcha on mw-cron with delete - T388531

Clement_Goubert claimed this task.
extensions/ConfirmEdit/maintenance/GenerateFancyCaptchas.php: Start run
Generating 10000 new captchas.. Done.

Generation script for 10000 captchas ran in 36.7 seconds

Enumerated 10000 temporary captchas in 0.7 seconds
Getting a list of old captchas to delete... Done.
Copying the new captchas to storage... Done.

Copied 10000 captchas to storage in 205.1 seconds
Deleting 11595 old captchas...
Done.

Deleted 11595 old captchas in 157.8 seconds
Removing temporary files... Done.

Whole captchas generation process took 401.0 seconds
extensions/ConfirmEdit/maintenance/GenerateFancyCaptchas.php: Finished run

Special:CreateAccount shows captchas without errors.

Success \o/

Slight side note (I mentioned this on the merged-in task, but I just wanted to make sure it's seen by ServiceOps new!): if/given that the generatecaptcha job is owned by the Security-Team, should any auto-created @phaultfinder tasks about it be tagged with Security-Team, rather than (or in addition to) the generic Security tag?

Noted on my side, I'll let a member of Security-Team have the last word on how they want the alert tagged.

Given that this task has been resolved, please could I gently bump this question to the secteam? :)
From my perspective, the problem with using the Security tag here is that it doesn't represent a team or a component — it's just used in Phabricator to indicate that a given task might have a security aspect to it.

Slight side note (I mentioned this on the merged-in task, but I just wanted to make sure it's seen by ServiceOps new!): if/given that the generatecaptcha job is owned by the Security-Team, should any auto-created @phaultfinder tasks about it be tagged with Security-Team, rather than (or in addition to) the generic Security tag?

I don't know if the Security-Team officially owns this. According to our only canonical documentation (which is admittedly very flawed), ConfirmEdit is owned by the Editing-team and two volunteers: https://www.mediawiki.org/wiki/Developers/Maintainers. I think @Reedy has mostly jumped in to help address various issues as they have a lot of domain knowledge around the extension and its captcha bits. Though it's probably fine to tag the Security-Team for any @phaultfinder tasks. At the very least we'll triage them during our weekly clinic. But at this time, any tagging shouldn't be considered as triggering resource allocations or SLOs or anything like that.

Re the ownership question (& as a bit of a side note!) — at a guess (from searching for 'generatecaptcha' within #Security-Team—tagged tasks), potentially the fact that the Security-Team is documented (somewhere) to own this job might originate from T150029: Create cronjob for regular captcha regeneration?

Change #1150624 merged by Hnowlan:

[operations/puppet@production] alertmanager: adjust phab project to security-team rather than security tag

https://gerrit.wikimedia.org/r/1150624