⚓ T210818 Move admin cron jobs to systemd timers

Subject	Repo	Branch	Lines +/-
wmflib: add systemd.timer OnCalendar support to cron_splay	operations/puppet	production	+38 -17
profile::base::labs - Fix timer definition	operations/puppet	production	+1 -1
profile::base::labs - Convert cronjobs to systemd timers	operations/puppet	production	+33 -7
openstack::keystone::cleanup - Do not hide `keystone-manage token_flush` output	operations/puppet	production	+1 -1
openstack::glance::image_sync - Fix systemd timer user	operations/puppet	production	+1 -1
openstack - Fix errors in timers definitions	operations/puppet	production	+15 -10
openstack - Convert cron jobs to systemd timers	operations/puppet	production	+211 -99
labstore: convert our first systemd timer to the new format	operations/puppet	production	+10 -15
wmcs::monitoring - Fix typo	operations/puppet	production	+1 -1
wmcs::monitoring - Convert cronjob to systemd timer	operations/puppet	production	+21 -10
toolforge::clush::master - Fix typo	operations/puppet	production	+1 -1
toolforge::clush::master - Fix systemd timer definition	operations/puppet	production	+2 -1
toolforge::clush::master - Convert cronjob to systemd timer	operations/puppet	production	+15 -4

Status	Assigned	Task
Open	None	T294906 Puppet Improvements
Duplicate	jbond	T265138 Work required to prepare for puppet 7
Resolved	SLyngshede-WMF	T273673 replace all puppet crons with systemd timers
Resolved	None	T210818 Move admin cron jobs to systemd timers

• Bstorm created this task.Nov 30 2018, 1:27 AM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptNov 30 2018, 1:27 AM

systemd timers look like a good replacement for cron, if a bit more complex to set up (and ignoring systemd annoyances since almost all Linux distros have decided to live with it). I like that stdout/stderr are captured and sent to journald and also the ability to monitor jobs more easily.

Would this be a good rough estimation of the number of places we would have to touch with this change? I'm surprised by the low number and feel like I'm missing something obvious.

$ find . -name '*.pp' | grep -E '(openstack|lab|wmcs|cloud|tool)' | xargs -i% grep -EH 'cron.*{' %  | wc -l
42

Since I'm just thinking of crons we puppetize (and it's pretty easy to do timers using puppet now), that's probably it. We've tended to move things into services when they would have been a cron.

Throwing this in the discussion column, though I don't think it will be very controversial as a background activity.

zhuyifei1999 awarded a token.Nov 30 2018, 3:50 PM

zhuyifei1999 subscribed.Nov 30 2018, 4:35 PM

• Bstorm moved this task from Needs discussion to Epics on the cloud-services-team (Kanban) board.Dec 4 2018, 4:44 PM

• GTirloni unsubscribed.Dec 20 2018, 6:52 PM

• GTirloni claimed this task.Feb 7 2019, 5:07 PM

• GTirloni triaged this task as Medium priority.

Change 489393 had a related patch set uploaded (by GTirloni; owner: GTirloni):
[operations/puppet@production] toolforge::clush::master - Convert cronjob to systemd timer

https://gerrit.wikimedia.org/r/489393

gerritbot added a project: Patch-For-Review.Feb 9 2019, 7:08 PM

Change 489394 had a related patch set uploaded (by GTirloni; owner: GTirloni):
[operations/puppet@production] wmcs::monitoring - Convert cronjob to systemd timer

https://gerrit.wikimedia.org/r/489394

• GTirloni mentioned this in T215417: labmon1001: archive-instances not working.Feb 11 2019, 6:03 PM

Change 489393 merged by GTirloni:
[operations/puppet@production] toolforge::clush::master - Convert cronjob to systemd timer

https://gerrit.wikimedia.org/r/489393

Change 490052 had a related patch set uploaded (by GTirloni; owner: GTirloni):
[operations/puppet@production] toolforge::clush::master - Fix systemd timer definition

https://gerrit.wikimedia.org/r/490052

Change 490052 merged by GTirloni:
[operations/puppet@production] toolforge::clush::master - Fix systemd timer definition

https://gerrit.wikimedia.org/r/490052

Change 490056 had a related patch set uploaded (by GTirloni; owner: GTirloni):
[operations/puppet@production] toolforge::clush::master - Fix typo

https://gerrit.wikimedia.org/r/490056

Change 490056 merged by GTirloni:
[operations/puppet@production] toolforge::clush::master - Fix typo

https://gerrit.wikimedia.org/r/490056

cron { 'update_tools_clush':
    ensure  => absent,
}

systemd::timer::job { 'toolfoge_clush_update':
    ensure                    => present,
    description               => 'Update list of Toolforge servers for clush',
    command                   => "/usr/local/sbin/tools-clush-generator /etc/clustershell/tools.yaml --observer-pass ${observer_pass}",
    interval                  => {
        'start'    => 'OnCalendar',
        'interval' => '*-*-* *:00:00', # hourly
    },
    logging_enabled           => false,
    monitoring_enabled        => true,
    monitoring_contact_groups => 'wmcs-team',
    user                      => 'root',
}

# systemctl status toolfoge_clush_update.timer --no-pager
● toolfoge_clush_update.timer - Periodic execution of toolfoge_clush_update.service
   Loaded: loaded (/lib/systemd/system/toolfoge_clush_update.timer; enabled; vendor preset: enabled)
   Active: active (waiting) since Tue 2019-02-12 13:49:51 UTC; 25min ago

Feb 12 13:49:51 tools-clushmaster-02 systemd[1]: Started Periodic execution of toolfoge_clush_update.service.

# systemctl status toolfoge_clush_update.service --no-pager
● toolfoge_clush_update.service - Update list of Toolforge servers for clush
   Loaded: loaded (/lib/systemd/system/toolfoge_clush_update.service; static; vendor preset: enabled)
   Active: inactive (dead) since Tue 2019-02-12 14:00:05 UTC; 15min ago
  Process: 17932 ExecStart=/usr/local/sbin/tools-clush-generator /etc/clustershell/tools.yaml --observer-pass Fs6Dq2RtG8KwmM2Z (code=exited, status=0/SUCCESS)
 Main PID: 17932 (code=exited, status=0/SUCCESS)

Feb 12 14:00:01 tools-clushmaster-02 systemd[1]: Started Update list of Toolforge servers for clush.

# ls -l /etc/clustershell/tools.yaml
-rw-r--r-- 1 root root 26977 Feb 12 14:00 /etc/clustershell/tools.yaml

Yay!

Change 490112 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[operations/puppet@production] labstore: convert our first systemd timer to the new format

https://gerrit.wikimedia.org/r/490112

• GTirloni updated the task description. (Show Details)Feb 12 2019, 7:11 PM

Change 489394 merged by GTirloni:
[operations/puppet@production] wmcs::monitoring - Convert cronjob to systemd timer

https://gerrit.wikimedia.org/r/489394

Change 490137 had a related patch set uploaded (by GTirloni; owner: GTirloni):
[operations/puppet@production] wmcs::monitoring - Fix typo

https://gerrit.wikimedia.org/r/490137

Change 490137 merged by GTirloni:
[operations/puppet@production] wmcs::monitoring - Fix typo

https://gerrit.wikimedia.org/r/490137

• GTirloni updated the task description. (Show Details)Feb 12 2019, 8:10 PM

Change 490197 had a related patch set uploaded (by GTirloni; owner: GTirloni):
[operations/puppet@production] openstack - Convert cron jobs to systemd timers

https://gerrit.wikimedia.org/r/490197

Mentioned in SAL (#wikimedia-operations) [2019-03-06T18:04:46Z] <bstorm_> disabled puppet and downtimed labstore2004 while deploying a change for T210818

Change 490112 merged by Bstorm:
[operations/puppet@production] labstore: convert our first systemd timer to the new format

https://gerrit.wikimedia.org/r/490112

Mentioned in SAL (#wikimedia-operations) [2019-03-06T18:08:52Z] <bstorm_> re-enabled puppet after observing the change works well on the partner for labstore2004 and T210818

Change 490197 had a related patch set uploaded (by GTirloni; owner: GTirloni):
[operations/puppet@production] openstack - Convert cron jobs to systemd timers

https://gerrit.wikimedia.org/r/490197

Change 490197 merged by GTirloni:
[operations/puppet@production] openstack - Convert cron jobs to systemd timers

https://gerrit.wikimedia.org/r/490197

Mentioned in SAL (#wikimedia-operations) [2019-03-21T13:18:00Z] <gtirloni> downtimed cloudcontrol*, cloudservices*, labcontrol*, labweb* (T210818)

Change 498085 had a related patch set uploaded (by GTirloni; owner: GTirloni):
[operations/puppet@production] openstack - Fix errors in timers definitions

https://gerrit.wikimedia.org/r/498085

Change 498085 merged by GTirloni:
[operations/puppet@production] openstack - Fix errors in timers definitions

https://gerrit.wikimedia.org/r/498085

Mentioned in SAL (#wikimedia-cloud) [2019-03-21T13:49:10Z] <gtirloni> converted openstack cronjobs to systemd timers (T210818)

• GTirloni updated the task description. (Show Details)Mar 21 2019, 1:50 PM

Change 498141 had a related patch set uploaded (by GTirloni; owner: GTirloni):
[operations/puppet@production] profile::base::labs - Convert cronjobs to systemd timers

https://gerrit.wikimedia.org/r/498141

Change 498193 had a related patch set uploaded (by GTirloni; owner: GTirloni):
[operations/puppet@production] openstack::glance::image_sync - Fix systemd timer user

https://gerrit.wikimedia.org/r/498193

Change 498193 merged by GTirloni:
[operations/puppet@production] openstack::glance::image_sync - Fix systemd timer user

https://gerrit.wikimedia.org/r/498193

Change 498199 had a related patch set uploaded (by GTirloni; owner: GTirloni):
[operations/puppet@production] openstack::keystone::cleanup - Do not hide keystone-manage token_flush output

https://gerrit.wikimedia.org/r/498199

Change 498199 merged by GTirloni:
[operations/puppet@production] openstack::keystone::cleanup - Do not hide keystone-manage token_flush output

https://gerrit.wikimedia.org/r/498199

Mentioned in SAL (#wikimedia-operations) [2019-03-21T23:53:49Z] <gtirloni> downtimed systemd check in labwen1001 (T210818)

Change 498141 merged by GTirloni:
[operations/puppet@production] profile::base::labs - Convert cronjobs to systemd timers

https://gerrit.wikimedia.org/r/498141

Change 498358 had a related patch set uploaded (by GTirloni; owner: GTirloni):
[operations/puppet@production] profile::base::labs - Fix timer definition

https://gerrit.wikimedia.org/r/498358

Change 498358 merged by GTirloni:
[operations/puppet@production] profile::base::labs - Fix timer definition

https://gerrit.wikimedia.org/r/498358

• GTirloni updated the task description. (Show Details)Mar 22 2019, 12:21 PM

• GTirloni removed • GTirloni as the assignee of this task.Apr 3 2019, 10:22 AM

• GTirloni subscribed.

• GTirloni unsubscribed.Apr 3 2019, 2:51 PM

• jcrespo mentioned this in T254127: peek is incorrectly configured to run every minute every 1st of the month, creating large amounts of cronspam.Jun 1 2020, 6:55 AM

Change 600928 had a related patch set uploaded (by Cwhite; owner: Cwhite):
[operations/puppet@production] wmflib: add systemd.timer onCalendar support to cron_splay

https://gerrit.wikimedia.org/r/600928

Change 600928 merged by Cwhite:
[operations/puppet@production] wmflib: add systemd.timer OnCalendar support to cron_splay

https://gerrit.wikimedia.org/r/600928

Dzahn added a parent task: T273673: replace all puppet crons with systemd timers.Feb 11 2022, 7:17 PM

fnegri edited projects, added cloud-services-team; removed cloud-services-team (Kanban).Jan 18 2023, 6:45 PM

fnegri moved this task from Kanban to Epics on the cloud-services-team board.

fnegri moved this task to Inbox on the cloud-services-team board.Jan 18 2023, 9:58 PM

• taavi closed this task as Resolved.Feb 9 2023, 11:16 AM