There is an issue on stat1007 for a systemd timer currently:
elukey@stat1007:~$ sudo systemctl -a | grep failed ● archive-maxmind-geoip-database.service loaded failed failed Archives Maxmind GeoIP files elukey@stat1007:~$ sudo journalctl -u archive-maxmind-geoip-database.service -- Logs begin at Mon 2020-09-28 10:00:42 UTC, end at Wed 2020-09-30 06:06:27 UTC. -- Sep 29 05:30:00 stat1007 systemd[1]: Started Archives Maxmind GeoIP files. Sep 29 05:30:00 stat1007 kerberos-run-command[119440]: The user keytab that you are trying to use (/etc/security/keytabs/analytics/analytics.keytab) doesn't exist or .. Sep 29 05:30:00 stat1007 systemd[1]: archive-maxmind-geoip-database.service: Main process exited, code=exited, status=1/FAILURE Sep 29 05:30:00 stat1007 systemd[1]: archive-maxmind-geoip-database.service: Unit entered failed state. Sep 29 05:30:00 stat1007 systemd[1]: archive-maxmind-geoip-database.service: Failed with result 'exit-code'.
A while ago we decided not to deploy the analytics user's keytab on stat100x hosts, but only to have it on analytics-admins-only nodes (like an-launcher1002). I didn't remove the files manually, and puppet didn't too, so the keytab left available until Tobias reimaged stat1007 to Buster.
There are some possibilities:
- We deploy again the analytics user keytab on all stat boxes. This could be handy for the team since we wouldn't need to remember that the analytics user needs to run only on an-launcher/coord nodes, and this timer would restart working.
- We run the timer as a different user, like analytics-privatedata, that is also present on the stat100x hosts.
- We do something more radical and refactor the script that the timer runs to avoid a huge backup on the host in which it runs (~80G) and only uploads snapshots of the MaxMind db on hdfs. Then we move the timer to a host that is meant to execute jobs, like an-launcher1002 (timers on stat100x hosts are only present on 1007 due to old use cases, in theory there shouldn't be any, reducing them would be nice to reduce tech debt).
@razzi @fdans is it something that you could work on during this week?