Page MenuHomePhabricator

hdfs-rsync of mediawiki history dumps fails due to source not present (yet)
Closed, ResolvedPublic

Description

Due to the delayed new mw history snapshot, the related dumps on labstore nodes are failing since May 1st due to:

May 05 05:00:01 labstore1006 kerberos-run-command[19045]: User dumpsgen executes as user dumpsgen the command ['/bin/bash', '-c', '/usr/local/bin/hdfs-rsync -r -t --delete --exclude "readme.html" --chmod=go-w hdfs:///wmf/data/archive/mediawiki/history/{$(/bin/date --date="$
May 05 05:00:07 labstore1006 kerberos-run-command[19045]: Error: Argument parsing error:
May 05 05:00:07 labstore1006 kerberos-run-command[19045]:         Error validating src list:
May 05 05:00:07 labstore1006 kerberos-run-command[19045]:                 hdfs:///wmf/data/archive/mediawiki/history/2020-04 does not exist
May 05 05:00:07 labstore1006 kerberos-run-command[19045]: Try --help for more information.

Desired solution: Check for folder existence in HDFS before launching hdfs-rsync (see comments below).

Event Timeline

elukey triaged this task as Medium priority.May 5 2020, 6:09 AM
elukey created this task.
JAllemandou renamed this task from Mediawiki history dumps fails due to source not present (yet) to hdfs-rsync of mediawiki history dumps fails due to source not present (yet).May 5 2020, 12:54 PM

Should we update the job to look at new month only after a few days? Or shall we add a hdfs call to only launch hdfs-rsync if the source is present?

Should we update the job to look at new month only after a few days? Or shall we add a hdfs call to only launch hdfs-rsync if the source is present?

I would prefer to launch the rsync only if there is data, so if we are delayed like this month no alarm with fire.

Makes sense @elukey - this also means that we'll never get an error if no data shows up. I assume it's ok, as we have SLA emails on the production side.

Makes sense @elukey - this also means that we'll never get an error if no data shows up. I assume it's ok, as we have SLA emails on the production side.

Yes I think that if we want to alarm about data not showing up we need a different alarm (like the SLA etc..)

Well here we are, decision is made :)Task description will be updated with the desired solution.

mforns added a project: Analytics-Kanban.
mforns moved this task from Next Up to In Progress on the Analytics-Kanban board.

Change 594773 had a related patch set uploaded (by Mforns; owner: Mforns):
[operations/puppet@production] web::fetches::analytics::job: do not rsync mediawiki if missing source

https://gerrit.wikimedia.org/r/594773

Change 594773 merged by Elukey:
[operations/puppet@production] web::fetches::analytics::job: do not rsync mediawiki if missing source

https://gerrit.wikimedia.org/r/594773

Change 601316 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] dumos::web::fetches::analytics::job: fix rsync bash script

https://gerrit.wikimedia.org/r/601316

Change 601316 merged by Elukey:
[operations/puppet@production] dumps::web::fetches::analytics::job: fix rsync bash script

https://gerrit.wikimedia.org/r/601316

Seems to work!

Jun 01 09:51:22 labstore1006 kerberos-run-command[12033]: Ignoring missing hdfs source hdfs:///wmf/data/archive/mediawiki/history/{$(/bin/date --date="$(/bin/date +%Y-%m-15) -1 month" +"%Y-%m"),$(/bin/date --date="$(/bin/date +%Y-%m-15) -2 month" +"%Y-%m")}
Jun 01 09:51:22 labstore1006 systemd[1]: analytics-dumps-fetch-mediawiki_history_dumps.service: Succeeded.

Let's keep this open for a few days to see if when the time comes the rsync does its job correctly.