Page MenuHomePhabricator

HTMLForm hidden fields gone -- CAPTCHA failure rate at 100%
Closed, ResolvedPublic

Description

Since the train deployment to group2 the CAPTCHA failure rate has increased to 100%.

18:16 dancy@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.42.0-wmf.23 refs T354441

https://grafana.wikimedia.org/d/000000370/captcha-failure-rates?orgId=1

Analysis shows that the hidden form field "captchaId" is missing. And also hidden form fields are missing from other HTMLForm forms.

Event Timeline

I'm regenerating them as a classical "turn it off and on again" and hope this fixes the issue.

I get this sometimes:

An error occurred when running captcha.py:
multiprocessing.pool.RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/usr/lib/python3.7/multiprocessing/pool.py", line 121, in worker
    result = (True, func(*args, **kwds))
  File "/usr/lib/python3.7/multiprocessing/pool.py", line 44, in mapstar
    return list(map(*args))
  File "/srv/mediawiki/php-1.42.0-wmf.23/extensions/ConfirmEdit/captcha.py", line 275, in run_in_thread
    subdir = gen_subdir(opts.output, md5hash, opts.dirs)
  File "/srv/mediawiki/php-1.42.0-wmf.23/extensions/ConfirmEdit/captcha.py", line 168, in gen_subdir
    os.mkdir(fulldir)
FileExistsError: [Errno 17] File exists: '/tmp/mw-fancycaptcha-1711055439-de0840/3/1'
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/srv/mediawiki/php-1.42.0-wmf.23/extensions/ConfirmEdit/captcha.py", line 426, in <module>
    p.map(run_in_thread, data)
  File "/usr/lib/python3.7/multiprocessing/pool.py", line 268, in map
    return self._map_async(func, iterable, mapstar, chunksize).get()
  File "/usr/lib/python3.7/multiprocessing/pool.py", line 657, in get
    raise self._value
FileExistsError: [Errno 17] File exists: '/tmp/mw-fancycaptcha-1711055439-de0840/3/1'

Also this:

Generated 900 captchas in 1.1 seconds
Copying the new captchas to storage...Errored.
An unknown error occurred in storage backend "global-swift-codfw".

Removing temporary files... Done.

I'm running this in eqiad, why it tries to push it to codfw swift?

https://www.mediawiki.org/wiki/MediaWiki_1.42/wmf.23/Changelog

Not blaming this patch, but the only obviously relevant patch to file handling in MW core (and swift too) would be:

git #d23c1743 - filebackend: Retry Swift requests with new auth token on 401 (task T358830) by Tim Starling

and

The current token in eqiad expires Wednesday 2024-03-13 14:47:20. The codfw token will expire the next day at 16:03:02.

Also this:

Generated 900 captchas in 1.1 seconds
Copying the new captchas to storage...Errored.
An unknown error occurred in storage backend "global-swift-codfw".

Removing temporary files... Done.

I'm running this in eqiad, why it tries to push it to codfw swift?

But also there was the DC switchover yesterday... ^

The ratio has recovered somewhat.

The ratio has recovered somewhat.

context: I rolled back the train to wmf.22 in group1. We probably should roll it back to group0 honestly but also it's going to be a mess debugging what's the underlying issue.

[22:05:01] <logmsgbot> !log ladsgroup@deploy1002 rebuilt and synchronized wikiversions files: (no justification provided)

Screenshot 2024-03-21 at 22.12.36.png (258×472 px, 17 KB)

Screenshot 2024-03-21 at 22.12.42.png (257×404 px, 16 KB)

Were there any user reports? What happened when you tried to do a captcha-protected action? Did the image load? Or was the answer supposedly incorrect?

Were there any user reports? What happened when you tried to do a captcha-protected action? Did the image load? Or was the answer supposedly incorrect?

The report was from myself and @JJMC89 on the EN account creation team as we saw a rapid spike of requests complaining about the CAPTCHA. Tested myself as well

The image loaded, but the answer was always incorrect

From excimer it seems that since $wgCaptchaDeleteOnSolve is set. FancyCaptcha::passCaptcha() checks and sees the captcha passes, then deletes the captcha and subsequently, the check is run again and it can't find the file and it fails the check.

This means even if we set $wgCaptchaDeleteOnSolve to false, it'll ask the captcha again. Something is running the user creation check twice and that's the underlying issue.

nvm. I was looking at the wrong code. It's late here.

I loaded the account creation page on testwiki, got the captcha ID, dumped the stored info with eval.php, then submitted the form with XWD verbose logging. In the post request debug log, the ID it used in the memcached fetch did not match what I saw on the form.

There was no captchaId field in the HTML of the form. There's meant to be a hidden field identifying the captcha, but it was missing.

MW in wmf.23 is eating all hidden HTML values from extensions, including campaigns, Growth experiments and more.

I can reproduce this locally so I suggest rolling back the train. We're not gaining anything by having this be deployed.

tstarling renamed this task from CAPTCHA failure rate at 100% to HTMLForm hidden fields gone -- CAPTCHA failure rate at 100%.Mar 21 2024, 11:38 PM
tstarling updated the task description. (Show Details)

I think the likely cause is https://gerrit.wikimedia.org/r/c/mediawiki/core/+/1008569 . Hidden form fields are weird -- getInputHTML() returns an empty string, and instead getDiv() or getTableRow() adds an item to the form's mHiddenFields. The patch added HTMLFormField::getCodex() which calls getInputCodex() which calls getInputHTML(), and none of this does the special hidden field side-effect.

I think the likely cause is https://gerrit.wikimedia.org/r/c/mediawiki/core/+/1008569 . Hidden form fields are weird -- getInputHTML() returns an empty string, and instead getDiv() or getTableRow() adds an item to the form's mHiddenFields. The patch added HTMLFormField::getCodex() which calls getInputCodex() which calls getInputHTML(), and none of this does the special hidden field side-effect.

I confirm that's the cause, one commit before it, locally the hidden input is there, just when you switch to that commit, it stops showing up. I know it's not that much useful information but better than nothing.

Change #1013425 had a related patch set uploaded (by Catrope; author: Catrope):

[mediawiki/core@master] HTMLHiddenField: Support CodexHTMLForm

https://gerrit.wikimedia.org/r/1013425

Not able to test locally whether this patch fixes it, because I don't have CAPTCHAs set up locally. I'll try to test this on patchdemo.

Thank you @tstarling for figuring out why this happened, my patch wrote itself after reading your explanation. It would have taken some digging to discover this for myself.

Test wiki created on Patch demo by Roan Kattouw (WMF) using patch(es) linked to this task:
https://patchdemo.wmflabs.org/wikis/ec0445078e/w

Testing this on PatchDemo it appears to be working.

This PatchDemo with my patch has the hidden form fields: https://patchdemo.wmflabs.org/wikis/ec0445078e/w/index.php?title=Special:CreateAccount&returnto=Main+Page

This PatchDemo without my patch (just running master) does not have the hidden form fields: https://patchdemo.wmflabs.org/wikis/da77645662/w/index.php?title=Special:CreateAccount&returnto=Main+Page

Change #1013425 merged by jenkins-bot:

[mediawiki/core@master] HTMLHiddenField: Support CodexHTMLForm

https://gerrit.wikimedia.org/r/1013425

Change #1013258 had a related patch set uploaded (by Reedy; author: Catrope):

[mediawiki/core@wmf/1.42.0-wmf.23] HTMLHiddenField: Support CodexHTMLForm

https://gerrit.wikimedia.org/r/1013258

Ladsgroup assigned this task to Catrope.

Tested locally and the hidden element is there now. Thank you!

Change #1013258 merged by jenkins-bot:

[mediawiki/core@wmf/1.42.0-wmf.23] HTMLHiddenField: Support CodexHTMLForm

https://gerrit.wikimedia.org/r/1013258

Mentioned in SAL (#wikimedia-operations) [2024-03-22T12:03:15Z] <reedy@deploy1002> Synchronized php-1.42.0-wmf.23/includes/htmlform/fields/HTMLHiddenField.php: T360717 (duration: 13m 06s)

For future: Maybe we should add a regression test for hidden fields being present?

For future: Maybe we should add a regression test for hidden fields being present?

Yup that should be part of T359166; right now test coverage for HTMLForm as a whole is basically nonexistent.

I would just like an explicit, maximally integrated regression test for this bug. By maximally integrated, I mean testing as many layers as possible while still voting on MediaWiki core. Like this...

Change #1013725 had a related patch set uploaded (by Tim Starling; author: Tim Starling):

[mediawiki/core@master] HTMLForm: Add regression test for T360717

https://gerrit.wikimedia.org/r/1013725

Change #1013725 merged by jenkins-bot:

[mediawiki/core@master] HTMLForm: Add regression test for T360717

https://gerrit.wikimedia.org/r/1013725