Page MenuHomePhabricator

SystemdUnitDownForLong clouddumps1001:9100 Unit rsync_enterprise_htmldumps.service on node clouddumps1001 has been down for long.
Closed, ResolvedPublic

Description

Common information

  • alertname: SystemdUnitDownForLong
  • cluster: wmcs
  • instance: clouddumps1001:9100
  • job: node
  • name: rsync_enterprise_htmldumps.service
  • prometheus: ops
  • severity: task
  • site: eqiad
  • source: prometheus
  • state: failed
  • team: wmcs
  • type: simple

Firing alerts


Event Timeline

Restricted Application added subscribers: dcaro, Aklapper. · View Herald Transcript
Mar 02 08:30:01 clouddumps1001 rsync[2392835]: usage: systemd-timer-mail-wrapper [-h] [-T MAIL_TO] -s SUBJECT
Mar 02 08:30:01 clouddumps1001 rsync[2392835]:                                   [--only-on-error]
Mar 02 08:30:01 clouddumps1001 rsync[2392835]:                                   ...
Mar 02 08:30:01 clouddumps1001 rsync[2392835]: systemd-timer-mail-wrapper: error: the following arguments are required: -s/--subject
Mar 02 08:30:01 clouddumps1001 systemd[1]: rsync_enterprise_htmldumps.service: Main process exited, code=exited, status=2/INVALIDARGUMENT
Mar 02 08:30:01 clouddumps1001 systemd[1]: rsync_enterprise_htmldumps.service: Failed with result 'exit-code'.
Mar 21 08:30:01 clouddumps1001 systemd[1]: Started Twice monthly rsync after download of Wikimedia Enterprise HTML dumps.
Mar 21 08:30:01 clouddumps1001 rsync[392196]: usage: systemd-timer-mail-wrapper [-h] [-T MAIL_TO] -s SUBJECT
Mar 21 08:30:01 clouddumps1001 rsync[392196]:                                   [--only-on-error]
Mar 21 08:30:01 clouddumps1001 rsync[392196]:                                   ...
Mar 21 08:30:01 clouddumps1001 rsync[392196]: systemd-timer-mail-wrapper: error: the following arguments are required: -s/--subject
Mar 21 08:30:01 clouddumps1001 systemd[1]: rsync_enterprise_htmldumps.service: Main process exited, code=exited, status=2/INVALIDARGUMENT
Mar 21 08:30:01 clouddumps1001 systemd[1]: rsync_enterprise_htmldumps.service: Failed with result 'exit-code'.
Andrew added a subscriber: ArielGlenn.

@ArielGlenn have time to take a look at this? If not assign back to me and I'll see what I can figure out.

Hey @Andrew I'm guessing this patch is related: https://gerrit.wikimedia.org/r/c/operations/puppet/+/902833

I do see that the latest directory /srv/dumps/xmldatadumps/public/other/enterprise_html/runs/20230320 did not make it over to clouddumps1002 for whatever reason. In fact nothing from 2023 has made it over, so this has been going on for awhile.

I'll look into this more tomorrow (feeling like crap today) unless Taavi's patch makes it obvious to you what's going on.

Thanks @ArielGlenn! I merged Taavi's patch, I'm happy to reset-failed on those hosts and wait to see if things are fixed :)