Page MenuHomePhabricator

captcha.py needs to be ported to Python 3
Open, MediumPublic

Description

When updating the Puppet manifests for MediaWiki, I noticed that captcha.py is based on Python 2. That's not an immediate issue (the OS release we're migrating to (Debian Buster) still supports Python 2, but it will become an issue with the next OS release (which will no longer support Python 2).

Event Timeline

Dzahn triaged this task as Medium priority.Nov 23 2020, 11:01 PM
$ 2to3 captcha.py 
RefactoringTool: Skipping optional fixer: buffer
RefactoringTool: Skipping optional fixer: idioms
RefactoringTool: Skipping optional fixer: set_literal
RefactoringTool: Skipping optional fixer: ws_comma
RefactoringTool: Refactored captcha.py
--- captcha.py	(original)
+++ captcha.py	(refactored)
@@ -157,27 +157,27 @@
 			word = word + chr(97 + random.randint(0,25))
 
 	if verbose:
-		print("word is %s" % word)
+		print(("word is %s" % word))
 
 	if len(word) < min_length:
 		if verbose:
-			print("skipping word pair '%s' because it has fewer than %d characters" % (word, min_length))
+			print(("skipping word pair '%s' because it has fewer than %d characters" % (word, min_length)))
 		return None
 
 	if max_length > 0 and len(word) > max_length:
 		if verbose:
-			print("skipping word pair '%s' because it has more than %d characters" % (word, max_length))
+			print(("skipping word pair '%s' because it has more than %d characters" % (word, max_length)))
 		return None
 
 	if nonalpha.search(word):
 		if verbose:
-			print("skipping word pair '%s' because it contains non-alphabetic characters" % word)
+			print(("skipping word pair '%s' because it contains non-alphabetic characters" % word))
 		return None
 
 	for naughty in blacklist:
 		if naughty in word:
 			if verbose:
-				print("skipping word pair '%s' because it contains blacklisted word '%s'" % (word, naughty))
+				print(("skipping word pair '%s' because it contains blacklisted word '%s'" % (word, naughty)))
 			return None
 	return word
 
@@ -291,7 +291,7 @@
 
 	p = multiprocessing.Pool(threads);
 	data = []
-	print("Generating %s CAPTCHA images separated in %s image(s) per chunk run by %s threads..." % (count, chunks, threads))
+	print(("Generating %s CAPTCHA images separated in %s image(s) per chunk run by %s threads..." % (count, chunks, threads)))
 	for i in range(0, threads):
 		data.append([chunks, words, blacklist, opts, font, fontsize])
 
RefactoringTool: Files that need to be modified:
RefactoringTool: captcha.py
$ 2to3 captcha-old.py 
RefactoringTool: Skipping optional fixer: buffer
RefactoringTool: Skipping optional fixer: idioms
RefactoringTool: Skipping optional fixer: set_literal
RefactoringTool: Skipping optional fixer: ws_comma
RefactoringTool: Refactored captcha-old.py
--- captcha-old.py	(original)
+++ captcha-old.py	(refactored)
@@ -140,27 +140,27 @@
 			word = word + chr(97 + random.randint(0,25))
 
 	if verbose:
-		print("word is %s" % word)
+		print(("word is %s" % word))
 
 	if len(word) < min_length:
 		if verbose:
-			print("skipping word pair '%s' because it has fewer than %d characters" % (word, min_length))
+			print(("skipping word pair '%s' because it has fewer than %d characters" % (word, min_length)))
 		return None
 
 	if max_length > 0 and len(word) > max_length:
 		if verbose:
-			print("skipping word pair '%s' because it has more than %d characters" % (word, max_length))
+			print(("skipping word pair '%s' because it has more than %d characters" % (word, max_length)))
 		return None
 
 	if nonalpha.search(word):
 		if verbose:
-			print("skipping word pair '%s' because it contains non-alphabetic characters" % word)
+			print(("skipping word pair '%s' because it contains non-alphabetic characters" % word))
 		return None
 
 	for naughty in blacklist:
 		if naughty in word:
 			if verbose:
-				print("skipping word pair '%s' because it contains blacklisted word '%s'" % (word, naughty))
+				print(("skipping word pair '%s' because it contains blacklisted word '%s'" % (word, naughty)))
 			return None
 	return word
 
@@ -274,7 +274,7 @@
 
 	p = multiprocessing.Pool(threads);
 	data = []
-	print("Generating %s CAPTCHA images separated in %s image(s) per chunk run by %s threads..." % (count, chunks, threads))
+	print(("Generating %s CAPTCHA images separated in %s image(s) per chunk run by %s threads..." % (count, chunks, threads)))
 	for i in range(0, threads):
 		data.append([chunks, words, blacklist, opts, font, fontsize])
 
RefactoringTool: Files that need to be modified:
RefactoringTool: captcha-old.py
Reedy changed the task status from Open to Stalled.Dec 1 2020, 2:07 PM

Marking stalled as it's unclear what needs fixing (if anything)

MoritzMuehlenhoff changed the task status from Stalled to Open.Dec 1 2020, 2:11 PM

Well, at minimum the shebang needs to be switched to #!/usr/bin/python3. If that's all that's needed, even better.

Change 644528 had a related patch set uploaded (by Reedy; owner: Reedy):
[mediawiki/extensions/ConfirmEdit@master] Update captch(-old)?.py shebang to python 3

https://gerrit.wikimedia.org/r/644528

I think that's possibly all that is needed at this point. 2to3 changes aren't necessary, I think?

I have tested (not extensively) on python3, and it seemed to work fine as of the last patches from earlier in the year. Always possible there's some edge cases (especially when running the larger batches) that I didn't find.

When it's merged and live in prod, I'll have another look at the logs and see

Change 644532 had a related patch set uploaded (by Muehlenhoff; owner: Muehlenhoff):
[operations/puppet@production] Extend PIL Python package with Python 3 counterparts

https://gerrit.wikimedia.org/r/644532

Well, at minimum the shebang needs to be switched to #!/usr/bin/python3. If that's all that's needed, even better.

Actually, while this may help if it's run standalone (which we don't, and there's no +x on the scripts anyway)... It doesn't actually help with how MediaWiki executes the script, as it uses python not python3 in the eventual command run via PHP

Currently it does

		$cmd = sprintf( "python %s --key %s --output %s --count %s --dirs %s",
			wfEscapeShellArg( dirname( __DIR__ ) . '/' . $captchaScript ),
			wfEscapeShellArg( $wgCaptchaSecret ),
			wfEscapeShellArg( $tmpDir ),
			wfEscapeShellArg( (string)$countGen ),
			wfEscapeShellArg( $wgCaptchaDirectoryLevels )
		);

So as is, it would continue running under (potentially) 2.7 anyway:

$ cat test.py
#!/bin/python3
import sys
print("Python version")
print (sys.version)
$ python --version
Python 2.7.18
$ python test.py 
Python version
2.7.18 (default, Aug  4 2020, 11:16:42) 
[GCC 9.3.0]

I'll fix the usage in the maintenance script too

Change 644586 had a related patch set uploaded (by Reedy; owner: Reedy):
[mediawiki/extensions/ConfirmEdit@master] Make GenerateFancyCaptchas.php execute captcha(-old)?.py via python3

https://gerrit.wikimedia.org/r/644586

I'm guessing there's no reason to actually make the python version (ie whether to use python or python3 configurable with a flag/argument to GenerateFancyCaptchas.php... Right?

I'm guessing there's no reason to actually make the python version (ie whether to use python or python3 configurable with a flag/argument to GenerateFancyCaptchas.php... Right?

No, I can't think of any reason. Python 2 is dead since almost a year and even super-old distros have some version of Python 3.

Change 644532 merged by Dzahn:
[operations/puppet@production] Extend PIL Python package with Python 3 counterparts

https://gerrit.wikimedia.org/r/644532

Change 644528 merged by jenkins-bot:
[mediawiki/extensions/ConfirmEdit@master] Update captch(-old)?.py shebang to python 3

https://gerrit.wikimedia.org/r/644528

Change 644586 merged by jenkins-bot:
[mediawiki/extensions/ConfirmEdit@master] Make GenerateFancyCaptchas.php execute captcha(-old)?.py via python3

https://gerrit.wikimedia.org/r/644586