Page MenuHomePhabricator

diamond crashing on hosts using systemd-timesyncd
Closed, ResolvedPublic

Description

The ntp package is removed from hosts running systemd-timesyncd for clock synchronization. That makes diamond fail as follows:

Feb 06 14:41:47 cp4002 diamond[941]: Unable to run ['/usr/bin/ntpq', '-np']
Feb 06 14:41:47 cp4002 diamond[941]: Traceback (most recent call last):
Feb 06 14:41:47 cp4002 diamond[941]: File "/usr/share/diamond/collectors/ntpd/ntpd.py", line 50, in run_command
Feb 06 14:41:47 cp4002 diamond[941]: stdout=subprocess.PIPE).communicate()[0]
Feb 06 14:41:47 cp4002 diamond[941]: File "/usr/lib/python2.7/subprocess.py", line 710, in __init__
Feb 06 14:41:47 cp4002 diamond[941]: errread, errwrite)
Feb 06 14:41:47 cp4002 diamond[941]: File "/usr/lib/python2.7/subprocess.py", line 1335, in _execute_child
Feb 06 14:41:47 cp4002 diamond[941]: raise child_exception
Feb 06 14:41:47 cp4002 diamond[941]: OSError: [Errno 2] No such file or directory
Feb 06 14:41:47 cp4002 diamond[941]: Unable to run ['/usr/bin/ntpdc', '-c', 'kerninfo']
Feb 06 14:41:47 cp4002 diamond[941]: Traceback (most recent call last):
Feb 06 14:41:47 cp4002 diamond[941]: File "/usr/share/diamond/collectors/ntpd/ntpd.py", line 50, in run_command
Feb 06 14:41:47 cp4002 diamond[941]: stdout=subprocess.PIPE).communicate()[0]
Feb 06 14:41:47 cp4002 diamond[941]: File "/usr/lib/python2.7/subprocess.py", line 710, in __init__
Feb 06 14:41:47 cp4002 diamond[941]: errread, errwrite)
Feb 06 14:41:47 cp4002 diamond[941]: File "/usr/lib/python2.7/subprocess.py", line 1335, in _execute_child
Feb 06 14:41:47 cp4002 diamond[941]: raise child_exception
Feb 06 14:41:47 cp4002 diamond[941]: OSError: [Errno 2] No such file or directory

Should we remove /usr/share/diamond/collectors/ntpd/ if systemd-timesyncd is in use?

Event Timeline

Should we remove /usr/share/diamond/collectors/ntpd/ if systemd-timesyncd is in use?

If that isn't too messy on the puppet level I think it'd make sense. The other being making the ntpd collector fail gracefully in cases where ntpd isn't installed at all.

Change 337009 had a related patch set uploaded (by Muehlenhoff):
Don't enable the Diamond ntpd collector if systemd-timesyncd is used

https://gerrit.wikimedia.org/r/337009

FWIW, stretch's version (4.0.515-3) doesn't crash, but complains about being unable to connect to the NTP server every few minutes.

Change 337009 merged by Muehlenhoff:
Only add the Diamond collector if ISC ntpd is used

https://gerrit.wikimedia.org/r/337009

Closing, the collector isn't applied to systems using timesyncd any longer (and this is fixed on the code level in diamond 4 as well)