Page MenuHomePhabricator

ntp restart sometimes unrealiable
Closed, ResolvedPublic

Description

Some weeks ago ntp failed to start on some servers. At that time I've tracked it down to the fact that we disable installing "recommended" packages. so the ntp init script doesn't doesn't have the lockfile-progs available, falling back to a racy startup. This leads to an error like this:

ntpd[850]: ntpd 4.2.6p5@1.2349-o Wed Oct 28 20:16:08 UTC 2015 (1)
ntp[834]: Starting NTP server: ntpd.
ntpd[874]: proto: precision = 0.267 usec
ntpd[874]: unable to bind to wildcard address 0.0.0.0 - another process may be running - EXITING

This was fixed with https://phabricator.wikimedia.org/rOPUP823499e7f52eb8ab57f58869ed2c630f7a5e4b2d

However, the same error occured when restarting ntp on a jessie system which had lockfile-progs installed. ntp doesn't have a systemd unit, but uses the sysv init script. My initial assumption was that this compat mode might somehow not use the codepath using lockfile-progs, but it seems it is in fact correctly used according to my tests.

This bug for now is mostly for tracking: This seems to be specific to restarting ntp on a running system; during the mass reboot caused by the "keyring Linux bug", ntp always started up properly.

Event Timeline

MoritzMuehlenhoff raised the priority of this task from to Needs Triage.
MoritzMuehlenhoff updated the task description. (Show Details)
MoritzMuehlenhoff subscribed.

ntpd restarts on trusty are also fairly unreliable; when restarting the ntp service on mw2*, 6 out of 213 hosts failed to restart the ntpd service since they had a stale PID in /var/run/ntpd.pid

This seems fixed, the error did not reoccur in the latest round of ntp updates.