Page MenuHomePhabricator

Trash cleanup cron spams on an-test hosts
Open, LowPublic

Description

From: Cron Daemon <root@an-test-client1001.eqiad.wmnet>
To: root@an-test-client1001.eqiad.wmnet
Subject: Cron <root@an-test-client1001> for user in $(ls /srv/home); do rm -rf /srv/home/$user/.local/share/Trash/*; done
Date: Sun, 11 Jul 2021 00:00:01 +0000
X-Cron-Env: <SHELL=/bin/sh>
X-Cron-Env: <HOME=/root>
X-Cron-Env: <PATH=/usr/bin:/bin>
X-Cron-Env: <LOGNAME=root>
Message-Id: <E1m2Mt7-0006Yy-63@an-test-client1001.eqiad.wmnet>
Date: Sun, 11 Jul 2021 00:00:01 +0000

ls: cannot access '/srv/home': No such file or directory

Related Objects

StatusSubtypeAssignedTask
OpenNone
Openjbond
OpenNone
OpenNone
Resolvedelukey
Resolvedelukey
Resolvedfaidon
OpenNone
Resolvedfaidon
Resolvedherron
Resolvedherron
ResolvedAndrew
Resolvedfgiunchedi
DeclinedNone
OpenNone
Resolvedjcrespo
ResolvedNone
Resolvedelukey
ResolvedNone
ResolvedDzahn
Resolved ema
ResolvedMoritzMuehlenhoff
ResolvedPRODUCTION ERRORCatrope
ResolvedNone
Resolvedelukey
DuplicateNone
ResolvedNone
ResolvedNone
ResolvedDzahn
Resolvedfaidon
DuplicateNone
Resolvedfgiunchedi
DeclinedNone
DeclinedNone
DeclinedNone
ResolvedNone
Resolvedelukey
OpenNone
ResolvedNone
Resolvedfgiunchedi
ResolvedJoe
ResolvedBBlack
Resolvedfgiunchedi
DuplicateNone
Resolvedelukey
Declinedfaidon
ResolvedMoritzMuehlenhoff
Resolvedfgiunchedi
Resolvedcolewhite
ResolvedPRODUCTION ERRORjcrespo
Resolved Gilles
Resolved Gilles
Resolvedfgiunchedi
ResolvedNone
DuplicateNone
Resolved chasemp
Resolvedjijiki
ResolvedJoe
ResolvedAndrew
Resolvedjcrespo
Resolvedmmodell
ResolvedNone
Declinedaaron
ResolvedMarostegui
Resolvedjbond
ResolvedGTirloni
Resolvedelukey
ResolvedMoritzMuehlenhoff
Resolvedjbond
Resolvedfgiunchedi
ResolvedVolans
ResolvedArielGlenn
Resolvedaaron
ResolvedJMeybohm
Resolveddpifke
DuplicateNone
Resolvedcolewhite
Resolved chasemp
OpenNone
ResolvedAndrew
Resolvedfgiunchedi
OpenNone
ResolvedJelto
OpenVgutierrez
OpenNone

Event Timeline

Is it coming from puppet? It should be migrated to systemd timer if that's the case: T273673: replace all puppet crons with systemd timers

Yes, https://gerrit.wikimedia.org/g/operations/puppet/+/4a3bf542618f4550dfbe450452ddc9e6294ed1d3/modules/profile/manifests/analytics/jupyterhub.pp#61 is the cron

But I'm not sure migrating it to a timer fixes the underlying issue, which is that sometimes(?) /srv/home is missing.

Actually it's not sometimes, it's always missing. We've been getting this since the end of June at least, which is when I last cleaned out my root@ folder.

But I'm not sure migrating it to a timer fixes the underlying issue, which is that sometimes(?) /srv/home is missing.

Nvm, it would. Even though the subshell is failing, it still exits with status code 0, so it's just the stderr that's triggering the email.

Change 708183 had a related patch set uploaded (by Legoktm; author: Legoktm):

[operations/puppet@production] analytics: Migrate clean_jupyter_user_local_trash to systemd timer

https://gerrit.wikimedia.org/r/708183

I think it's just this single host an-test-client1001 that's sending this daily logspam now, isn't it?
To me, the issue looks like we should just be setting up /home as a symlink to /srv/home on this box, like we do on all of the other analytics-client (stat100[4-8]) boxes.

btullis@an-test-client1001:~$ ls -ld /home
drwxr-xr-x 240 root root 4096 Aug  3 15:26 /home
btullis@stat1008:~$ ls -ld /home
lrwxrwxrwx 1 root root 9 Mar 12  2020 /home -> /srv/home

I've been searching through puppet, but I can't seem to find where this symlink is configured. Once I find it, we can add an-test-client1001.eqiad.wmnet to the list and rebuild it, or point /home to /srv/home manually.

@BTullis we have been doing it manually for the stat100x boxes so far, nothing on puppet!

OK, in that case I've done the following to clear this bit of cron spam temporarily.

btullis@an-test-client1001:~$ ls -l /srv/home
ls: cannot access '/srv/home': No such file or directory
btullis@an-test-client1001:~$ sudo mkdir /srv/home
btullis@an-test-client1001:~$ ls -ld /srv/home
drwxr-xr-x 2 root root 4096 Aug 25 11:09 /srv/home

We could decide to move /home to /srv/home at another time, but this should stop the emails I think.

Could we just make the script use /home which is everywhere?

Change 708183 abandoned by Legoktm:

[operations/puppet@production] analytics: Migrate clean_jupyter_user_local_trash to systemd timer

Reason:

https://gerrit.wikimedia.org/r/708183