Page MenuHomePhabricator

All cronjobs using foreachwikiindblist broken in production: Fatal error: Uncaught Exception: MWWikiversions::readDbListFile: unable to read . in /srv/mediawiki/multiversion/MWWikiversions.php:94
Closed, ResolvedPublicPRODUCTION ERROR

Description

Error

This error is generated from maintenance/refreshLinkRecommendations.php in the GrowthExperiments repo, and the log file where the error is seen is here /var/log/mediawiki/mediawiki_job_growthexperiments-refreshLinkRecommendations/syslog.log

exception.trace
Jun 25 04:50:50 mwmaint1002 mediawiki_job_growthexperiments-refreshLinkRecommendations[83621]: Notice: Undefined offset: 0 in /srv/mediawiki/multiversion/MWWikiversions.php on line 71
Jun 25 04:50:50 mwmaint1002 mediawiki_job_growthexperiments-refreshLinkRecommendations[83621]: Fatal error: Uncaught Exception: MWWikiversions::readDbListFile: unable to read . in /srv/mediawiki/multiversion/MWWikiversions.php:94
Jun 25 04:50:50 mwmaint1002 mediawiki_job_growthexperiments-refreshLinkRecommendations[83621]: Stack trace:
Jun 25 04:50:50 mwmaint1002 mediawiki_job_growthexperiments-refreshLinkRecommendations[83621]: #0 /srv/mediawiki/multiversion/MWWikiversions.php(71): MWWikiversions::readDbListFile('')
Jun 25 04:50:50 mwmaint1002 mediawiki_job_growthexperiments-refreshLinkRecommendations[83621]: #1 /usr/local/bin/expanddblist(5): MWWikiversions::evalDbListExpression('')
Jun 25 04:50:50 mwmaint1002 mediawiki_job_growthexperiments-refreshLinkRecommendations[83621]: #2 {main}
Jun 25 04:50:50 mwmaint1002 mediawiki_job_growthexperiments-refreshLinkRecommendations[83621]:   thrown in /srv/mediawiki/multiversion/MWWikiversions.php on line 94
Impact

refreshLinkRecommendations.php isn't filling up task queues in production anymore.

Notes

The first entry in the log for this error is at 4:50:50 on 2021-25-06.

Event Timeline

The cron job calls foreachwikiindblist which calls expanddblist which calls MWWikiversions::evalDbListExpression() which is intended to be used for DB list expressions (like dblists/group2.dbexpr), not normal DB lists so it fails. (The exact way it fails is confusing to me - apparently preg_split returns false, so there is no 0 offset and MWWikiversions::readDbListFile will be called with the empty string as the filename. But preg_split only returns false if the pattern is incorrect, why would it do that here?)

Obviously foreachwikiindblist should work, and did work in the past, with normal dblists, so this seems to be a bug in the multiwiki code (which is split between operations/mediawiki-config and operations/puppet. Neither had any relavant changes recently, though. Not sure what's going on...

Even more confusingly, manually it seems to work:

tgr@mwmaint1002:~$ /usr/local/bin/expanddblist growthexperiments
arwiki
bgwiki
...

Oh, I see, expanddblist passes the DBlist name to MWWikiversions::evalDbListExpression(), not the contents of the list. I guess the intention is to support things like foreachwikiindblist 'growthexperiments - group2' ...?

expanddblist passes $argv[1] to MWWikiversions::evalDbListExpression() so I guess that somehow ends up as the empty string so preg_split returns an empty array?

Tgr raised the priority of this task from High to Unbreak Now!.Jun 25 2021, 11:54 AM
Tgr added a project: SRE.

It seems like everything that uses foreachwikiindblist is broken, see e.g. /var/log/mediawiki/mediawiki_job_growthexperiments-listTaskCounts/syslog.log or /var/log/mediawiki/mediawiki_job_echo_mail_batch/syslog.log. Escalating.

Tgr renamed this task from Fatal error: Uncaught Exception: MWWikiversions::readDbListFile: unable to read . in /srv/mediawiki/multiversion/MWWikiversions.php:94 to All cronjobs using foreachwikiindblist broken in production: Fatal error: Uncaught Exception: MWWikiversions::readDbListFile: unable to read . in /srv/mediawiki/multiversion/MWWikiversions.php:94.Jun 25 2021, 11:54 AM

Seems to have happened on June 24 between 18:30 and 19:15.

Nothing relevant in SAL. From the puppet logs, rOPUPfafc34c8f5be: mediawiki: Port mw-cli-wrapper to Python is the only thing that seems related - ping @Legoktm.

Affected jobs:

tgr@mwmaint1002:~$ ack 'foreachwikiindblist' /lib/systemd/system/ -l
/lib/systemd/system/mediawiki_job_startupregistrystats.service
/lib/systemd/system/mediawiki_job_growthexperiments-deleteOldSurveys.service
/lib/systemd/system/mediawiki_job_purge_old_cx_drafts.service
/lib/systemd/system/mediawiki_job_echo_mail_batch.service
/lib/systemd/system/mediawiki_job_growthexperiments-refreshLinkRecommendations.service
/lib/systemd/system/mediawiki_job_pageassessments_cleanup.service
/lib/systemd/system/mediawiki_job_growthexperiments-listTaskCounts.service

The Growth ones won't mind not running for a few days, not sure about the others. Just judging from their name they don't sound critical.

Even more confusingly, manually it seems to work:

tgr@mwmaint1002:~$ /usr/local/bin/expanddblist growthexperiments
arwiki
bgwiki
...

Not if you prefix it with mw-cli-wrapper as systemd does:

[urbanecm@mwmaint1002 ~]$ /usr/local/bin/mw-cli-wrapper /usr/local/bin/expanddblist growthexperiments
Notice: Undefined offset: 1 in /usr/local/bin/expanddblist on line 5
Notice: Undefined offset: 0 in /srv/mediawiki/multiversion/MWWikiversions.php on line 71
Fatal error: Uncaught Exception: MWWikiversions::readDbListFile: unable to read . in /srv/mediawiki/multiversion/MWWikiversions.php:94
Stack trace:
#0 /srv/mediawiki/multiversion/MWWikiversions.php(71): MWWikiversions::readDbListFile('')
#1 /usr/local/bin/expanddblist(5): MWWikiversions::evalDbListExpression('')
#2 {main}
  thrown in /srv/mediawiki/multiversion/MWWikiversions.php on line 94
Traceback (most recent call last):
  File "/usr/local/bin/mw-cli-wrapper", line 47, in <module>
    subprocess.run(sys.argv[1:], check=True, shell=True)
  File "/usr/lib/python3.5/subprocess.py", line 398, in run
    output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['/usr/local/bin/expanddblist', 'growthexperiments']' returned non-zero exit status 255

This clearly shows mw-cli-wrapper is at fault somehow.

...except foreachwiki also uses foreachwikiindblist and it's used by another couple jobs:

tgr@mwmaint1002:~$ ack -w foreachwiki /lib/systemd/system/ -l
/lib/systemd/system/mediawiki_job_cirrus_sanitize_jobs.service
/lib/systemd/system/mediawiki_job_initsitestats.service
/lib/systemd/system/mediawiki_job_purge_expired_userrights.service
/lib/systemd/system/mediawiki_job_purge_abusefilteripdata.service
/lib/systemd/system/mediawiki_job_purge_checkuser.service
/lib/systemd/system/mediawiki_job_update_special_pages.service
/lib/systemd/system/mediawiki_job_purge_securepollvotedata.service
/lib/systemd/system/mediawiki_job_cleanup_upload_stash.service

Those seem to be broken in a different way:

Jun 24 20:10:03 mwmaint1002 mediawiki_job_cirrus_sanitize_jobs[202808]: Usage: foreachwiki <command>

Some of them deal with purging PII so those might be more time-sensitive.

This is interesting... compare:

[urbanecm@mwmaint1002 ~]$ /usr/local/bin/mw-cli-wrapper /usr/local/bin/expanddblist growthexperiments
Notice: Undefined offset: 1 in /usr/local/bin/expanddblist on line 5
Notice: Undefined offset: 0 in /srv/mediawiki/multiversion/MWWikiversions.php on line 71
Fatal error: Uncaught Exception: MWWikiversions::readDbListFile: unable to read . in /srv/mediawiki/multiversion/MWWikiversions.php:94
Stack trace:
#0 /srv/mediawiki/multiversion/MWWikiversions.php(71): MWWikiversions::readDbListFile('')
#1 /usr/local/bin/expanddblist(5): MWWikiversions::evalDbListExpression('')
#2 {main}
  thrown in /srv/mediawiki/multiversion/MWWikiversions.php on line 94
Traceback (most recent call last):
  File "/usr/local/bin/mw-cli-wrapper", line 47, in <module>
    subprocess.run(sys.argv[1:], check=True, shell=True)
  File "/usr/lib/python3.5/subprocess.py", line 398, in run
    output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['/usr/local/bin/expanddblist', 'growthexperiments']' returned non-zero exit status 255
[urbanecm@mwmaint1002 ~]$
[urbanecm@mwmaint1002 ~]$ /usr/local/bin/mw-cli-wrapper '/usr/local/bin/expanddblist growthexperiments'
arwiki
bgwiki
bnwiki
cawiki
cswiki
[...]
[urbanecm@mwmaint1002 ~]$
[urbanecm@mwmaint1002 ~]$ python3
Python 3.5.3 (default, Nov 18 2020, 21:09:16)
[GCC 6.3.0 20170516] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import subprocess
>>> subprocess.run(['expanddblist', 's1'])
enwiki
CompletedProcess(args=['expanddblist', 's1'], returncode=0)
>>> subprocess.run(['expanddblist', 's1'], shell=True)
Notice: Undefined offset: 1 in /usr/local/bin/expanddblist on line 5
Notice: Undefined offset: 0 in /srv/mediawiki/multiversion/MWWikiversions.php on line 71
Fatal error: Uncaught Exception: MWWikiversions::readDbListFile: unable to read . in /srv/mediawiki/multiversion/MWWikiversions.php:94
Stack trace:
#0 /srv/mediawiki/multiversion/MWWikiversions.php(71): MWWikiversions::readDbListFile('')
#1 /usr/local/bin/expanddblist(5): MWWikiversions::evalDbListExpression('')
#2 {main}
  thrown in /srv/mediawiki/multiversion/MWWikiversions.php on line 94
CompletedProcess(args=['expanddblist', 's1'], returncode=255)
>>>

Without shell=True, it does what I would expect it to. I don't understand why omiting it works, unfortunately.

args is required for all calls and should be a string, or a sequence of program arguments. Providing a sequence of arguments is generally preferred, as it allows the module to take care of any required escaping and quoting of arguments (e.g. to permit spaces in file names). If passing a single string, either shell must be True (see below) or else the string must simply name the program to be executed without specifying any arguments.

Not clear from the documentation, but it seems that with Shell=True it discards all the arguments (ie. if args is a list, it only uses the first element):

tgr@mwmaint1002:~$ mw-cli-wrapper echo 1

tgr@mwmaint1002:~$ mw-cli-wrapper ls /home/
bnwikibooks-namespaceDupes.txt		  deleteOldSurveys.log			       fixLinkRecommendationData.bnwiki.desc.log.3  fixLinkRecommendationData.cswiki.log.3	 fixStuckGlobalRename.php
...
Joe lowered the priority of this task from Unbreak Now! to Medium.
Joe subscribed.

The revert fixed the problem AFAICS. I'll work on the source of the issues.

Confirming that cronjobs are running again.

FTR, I fixed the issue as reported by @Tgr an d @Urbanecm and now all scripts seem to be running correctly. Boldly resolving this task.