Saw some weird behavior when trying to run a couple of dumps recompress jobs just now from the command line on our test host. Went to dumpsdata1001 and saw some 'bash: ls: write error' messages in syslog. Shot the job with the relevant pid (the rolling rsync), ran puppet to restart it, here's the result:
Apr 1 20:06:19 dumpsdata1001 puppet-agent[32018]: Using configured environment 'production' Apr 1 20:06:19 dumpsdata1001 puppet-agent[32018]: Retrieving pluginfacts Apr 1 20:06:19 dumpsdata1001 puppet-agent[32018]: Retrieving plugin Apr 1 20:06:20 dumpsdata1001 puppet-agent[32018]: Loading facts Apr 1 20:06:27 dumpsdata1001 puppet-agent[32018]: Caching catalog for dumpsdata1001.eqiad.wmnet Apr 1 20:06:28 dumpsdata1001 puppet-agent[32018]: (/Stage[main]/Base::Environment/Tidy[/var/tmp/core]) Tidying 0 files Apr 1 20:06:28 dumpsdata1001 puppet-agent[32018]: Applying configuration version '1522613183' Apr 1 20:06:29 dumpsdata1001 crontab[32246]: (root) LIST (root) Apr 1 20:06:29 dumpsdata1001 crontab[32248]: (root) LIST (prometheus) Apr 1 20:06:29 dumpsdata1001 crontab[32250]: (root) LIST (dumpsgen) Apr 1 20:06:32 dumpsdata1001 systemd[1]: Reloading. Apr 1 20:06:32 dumpsdata1001 systemd[1]: Started ACPI event daemon. Apr 1 20:06:32 dumpsdata1001 systemd[1]: Listening on ACPID Listen Socket. Apr 1 20:06:32 dumpsdata1001 systemd[1]: Mounted /data. Apr 1 20:06:32 dumpsdata1001 systemd[1]: Mounted /. Apr 1 20:06:32 dumpsdata1001 systemd[1]: Created slice system-systemd\x2dfsck.slice. Apr 1 20:06:32 dumpsdata1001 systemd[1]: Started File System Check on /dev/mapper/dumpsdata1001--vg-data. Apr 1 20:06:32 dumpsdata1001 systemd[1]: Found device /dev/mapper/dumpsdata1001--vg-data. Apr 1 20:06:32 dumpsdata1001 systemd[1]: Found device /sys/devices/virtual/block/dm-1. Apr 1 20:06:32 dumpsdata1001 systemd[1]: Found device /dev/dm-1. Apr 1 20:06:32 dumpsdata1001 systemd[1]: Found device /dev/disk/by-id/dm-name-dumpsdata1001--vg-data. Apr 1 20:06:32 dumpsdata1001 systemd[1]: Found device /dev/disk/by-id/dm-uuid-LVM-VTvoMeEuXXfcVQYa4SA1qNFIFu1joqDnMoheU3qiU5HXrS778c0gCAo59ZvKeivQ. Apr 1 20:06:32 dumpsdata1001 systemd[1]: Found device /dev/disk/by-uuid/c99c516f-72cd-4f53-83a9-31591a9d1b4a. Apr 1 20:06:32 dumpsdata1001 systemd[1]: Found device /dev/dumpsdata1001-vg/data. Apr 1 20:06:32 dumpsdata1001 systemd[1]: Starting Dumps rsyncer service... Apr 1 20:06:32 dumpsdata1001 systemd[1]: Failed to reset devices.list on /system.slice: Invalid argument Apr 1 20:06:32 dumpsdata1001 systemd[1]: Started Dumps rsyncer service. Apr 1 20:06:32 dumpsdata1001 bash[32427]: ls: write error: Broken pipe Apr 1 20:06:32 dumpsdata1001 puppet-agent[32018]: (/Stage[main]/Dumps::Generation::Server::Rsyncer/Base::Service_unit[dumps-rsyncer]/Service[dumps-rsyncer]/ensure) ensure changed 'stopped' to 'running' Apr 1 20:06:32 dumpsdata1001 puppet-agent[32018]: (/Stage[main]/Dumps::Generation::Server::Rsyncer/Base::Service_unit[dumps-rsyncer]/Service[dumps-rsyncer]) Unscheduling refresh on Service[dumps-rsyncer] Apr 1 20:06:36 dumpsdata1001 puppet-agent[32018]: Applied catalog in 7.60 seconds
I see nothing in dmesg, no error emails either.