Page MenuHomePhabricator

labstore1006/1007: issue copying mediawiki_history_dumps files from Hadoop HDFS
Closed, DuplicatePublic

Description

We got an icinga/VO notification today that labstore1006/1007 had a systemd unit failure.

The issue seems to be:

Jun 01 05:00:06 labstore1007 systemd[1]: Started Copy mediawiki_history_dumps files from Hadoop HDFS..
Jun 01 05:00:06 labstore1007 kerberos-run-command[12564]: User dumpsgen executes as user dumpsgen the command ['/bin/bash', '-c', '/usr/local/bin/hdfs-rsync -r -t --delete --exclude "readme.html" --chmod=go-w hdfs:///wmf/data/archive/mediawiki/history/{$(/bin/date --date="$(/bin/date +%Y-%m-15) -1 month" +"%Y-%m"),$(/bin/date --date="$(/bin/date +%Y-%m-15) -2 month" +"%Y-%m")} file:///srv/dumps/xmldatadumps/public/other/mediawiki_history/']
Jun 01 05:00:08 labstore1007 kerberos-run-command[12564]: Error: Argument parsing error:
Jun 01 05:00:08 labstore1007 kerberos-run-command[12564]:         Error validating src list:
Jun 01 05:00:08 labstore1007 kerberos-run-command[12564]:                 hdfs:///wmf/data/archive/mediawiki/history/2020-05 does not exist
Jun 01 05:00:08 labstore1007 kerberos-run-command[12564]: Try --help for more information.
Jun 01 05:00:08 labstore1007 systemd[1]: analytics-dumps-fetch-mediawiki_history_dumps.service: Main process exited, code=exited, status=1/FAILURE
Jun 01 05:00:08 labstore1007 systemd[1]: analytics-dumps-fetch-mediawiki_history_dumps.service: Failed with result 'exit-code'.

Event Timeline

aborrero triaged this task as High priority.Jun 1 2020, 5:35 AM

Apparently this already happened last month:

May 19 05:00:00 labstore1007 systemd[1]: Started Copy mediawiki_history_dumps files from Hadoop HDFS..
May 06 05:00:10 labstore1007 kerberos-run-command[23838]: User dumpsgen executes as user dumpsgen the command ['/bin/bash', '-c', '/usr/local/bin/hdfs-rsync -r -t --delete --exclude "readme.html" --chmod=go-w hd
May 06 05:00:13 labstore1007 kerberos-run-command[23838]: Error: Argument parsing error:
May 06 05:00:13 labstore1007 kerberos-run-command[23838]:       Error validating src list:
May 06 05:00:13 labstore1007 kerberos-run-command[23838]:               hdfs:///wmf/data/archive/mediawiki/history/2020-04 does not exist
May 06 05:00:13 labstore1007 kerberos-run-command[23838]: Try --help for more information.
May 06 05:00:13 labstore1007 systemd[1]: analytics-dumps-fetch-mediawiki_history_dumps.service: Main process exited, code=exited, status=1/FAILURE
May 06 05:00:13 labstore1007 systemd[1]: analytics-dumps-fetch-mediawiki_history_dumps.service: Failed with result 'exit-code'.
May 07 05:00:05 labstore1007 systemd[1]: Started Copy mediawiki_history_dumps files from Hadoop HDFS..
May 07 05:00:05 labstore1007 kerberos-run-command[24469]: User dumpsgen executes as user dumpsgen the command ['/bin/bash', '-c', '/usr/local/bin/hdfs-rsync -r -t --delete --exclude "readme.html" --chmod=go-w hd
May 07 05:00:07 labstore1007 kerberos-run-command[24469]: Error: Argument parsing error:
May 07 05:00:07 labstore1007 kerberos-run-command[24469]:       Error validating src list:
May 07 05:00:07 labstore1007 kerberos-run-command[24469]:               hdfs:///wmf/data/archive/mediawiki/history/2020-04 does not exist
May 07 05:00:07 labstore1007 kerberos-run-command[24469]: Try --help for more information.
May 07 05:00:07 labstore1007 systemd[1]: analytics-dumps-fetch-mediawiki_history_dumps.service: Main process exited, code=exited, status=1/FAILURE
May 07 05:00:07 labstore1007 systemd[1]: analytics-dumps-fetch-mediawiki_history_dumps.service: Failed with result 'exit-code'.
May 08 05:00:10 labstore1007 systemd[1]: Started Copy mediawiki_history_dumps files from Hadoop HDFS..
May 08 05:00:10 labstore1007 kerberos-run-command[26489]: User dumpsgen executes as user dumpsgen the command ['/bin/bash', '-c', '/usr/local/bin/hdfs-rsync -r -t --delete --exclude "readme.html" --chmod=go-w hd
May 08 05:00:12 labstore1007 kerberos-run-command[26489]: Error: Argument parsing error:
May 08 05:00:12 labstore1007 kerberos-run-command[26489]:       Error validating src list:
May 08 05:00:12 labstore1007 kerberos-run-command[26489]:               hdfs:///wmf/data/archive/mediawiki/history/2020-04 does not exist
May 08 05:00:12 labstore1007 kerberos-run-command[26489]: Try --help for more information.
May 08 05:00:12 labstore1007 systemd[1]: analytics-dumps-fetch-mediawiki_history_dumps.service: Main process exited, code=exited, status=1/FAILURE
May 08 05:00:12 labstore1007 systemd[1]: analytics-dumps-fetch-mediawiki_history_dumps.service: Failed with result 'exit-code'.