Page MenuHomePhabricator

Cannot verify NTP status asw1-b12-drmrs
Closed, ResolvedPublic

Description

While rolling out the updated loopback filter in drmrs I discovered I could not display the status of NTP peers on asw1-b12-drmrs:

cmooney@asw1-b12-drmrs> show ntp associations no-resolve 
localhost: timed out, nothing received
***Request timed out

Knowing in the backgrround this is probably just running "ntpq -p", which connects to the local NTP daemon on port 123, I assumed the modifications to the loopback filter (T304553) had caused the issue. However I was unable to work out what was wrong, a 'monitor traffic' on the loopback interface does show the connection going out, and indeed no response is received:

12:41:06.335039  In IP 185.15.58.131.63975 > 185.15.58.131.123: NTPv2, Reserved, length 12

The source and destination IP is the loopback address, which is part of 185.15.56.0/24, which is part of the "production4" prefix list the filter matches. So it should be allowed.

In tests the loopback filter did not seem to be causing the issue. Even with an "allow all" on the loopback filter, which I temporarily configured after some other tests (making it more permissive each time), the command still fails. Same with the filter removed completely from lo0.

The updated loopback filter is now applied to asw1-b13-drmrs, and the command runs fine there:

cmooney@asw1-b13-drmrs> show ntp associations no-resolve 
   remote         refid           st t when poll reach   delay   offset  jitter
===============================================================================
+208.80.154.10    170.187.158.81   3 -    9   64    1   85.265   -0.475   0.031
-208.80.155.108   45.79.214.107    3 -    8   64    1   85.278    0.648   0.035
-208.80.153.77    38.229.56.9      3 -    7   64    1  116.169    0.193   0.054
+208.80.153.111   104.156.229.103  3 -    6   64    1  116.091   -0.191   0.862
+91.198.174.61    91.198.174.62    3 -    5   64    1  166.121   -0.400   0.044
*91.198.174.62    83.137.149.135   2 -    4   64    1  166.191   -0.423   0.031

Configuration on both devices, in terms of NTP, looks identical. So I'm a bit at a loss as to what to do. Potentially ntpd is in some crashed/bad state and if we restart/remove and re-add the config it will clear.

I don't think today's changes to the filter have affected the situation anyway, I think whatever it is was going on beforehand too. Will discuss in netops and decide what the best way forward is.

Event Timeline

cmooney created this task.
Restricted Application added a subscriber: Aklapper. · View Herald Transcript
Volans renamed this task from Cannot verify NTP satus asw1-b12-drmrs to Cannot verify NTP status asw1-b12-drmrs.Apr 11 2022, 12:57 PM

I had a quick look as well, but didn't make any progress.

I tried to bounce NTP with:

[edit system]
+   processes {
+       ntp disable;
+   }
!    inactive: ntp { ... }

Deactivated the v4 and v6 loopback filters, forced the source address to be the loopback.

At this point I'd guess it's a Junos bug and suggest opening a low urgency JTAC case.

I've opened a case with Juniper, let's see what they say.

cmooney claimed this task.

After a bit of back-and-forth with Juniper they eventually suggests just killing the ntpd process from a root shell.

Which has done the job.

% ps aux | grep ntp
root     8932   0.0  0.1    8584   2228  -  I    26Oct21      0:25.49 /usr/sbin/tnp.sntpd -Ji -N
root    11613   0.0  0.4   67828  16900  -  T    26Oct21      4:56.96 /usr/sbin/xntpd -j -N -g (ntpd)
cmooney 75777   0.0  0.1   16604   2280  0  S+   12:12        0:00.00 grep ntp
% 
% 
% kill 11613
11613: Operation not permitted
% su -
Password:
root@asw1-b12-drmrs:RE:0% 
root@asw1-b12-drmrs:RE:0% 
root@asw1-b12-drmrs:RE:0% kill 11613
root@asw1-b12-drmrs:RE:0% 
root@asw1-b12-drmrs:RE:0% 
root@asw1-b12-drmrs:RE:0% ps aux | grep ntp
root     8932   0.0  0.1    8584   2228  -  S    26Oct21      0:25.49 /usr/sbin/tnp.sntpd -Ji -N
root    76024   0.0  0.4   67828  16900  -  S    12:13        0:00.01 /usr/sbin/xntpd -j -N -g (ntpd)
root    76026   0.0  0.1   16604   2280  0  S+   12:13        0:00.00 grep ntp
root@asw1-b12-drmrs:RE:0% 
root@asw1-b12-drmrs:RE:0% 
root@asw1-b12-drmrs:RE:0% exit
logout
% exit
exit

{master:0}
cmooney@asw1-b12-drmrs> show ntp associations no-resolve    
   remote         refid           st t when poll reach   delay   offset  jitter
===============================================================================
+208.80.154.10    170.187.158.81   3 -   37   64    7   89.347    0.082   0.091
+208.80.155.108   45.79.214.107    3 -   41   64    7   89.260   -0.184   0.037
 208.80.153.77    162.159.200.1    4 -  117  128    7    0.000  -348.21 349.042
+208.80.153.111   162.159.200.1    4 -   42   64    7  119.829    0.666   0.050
 91.198.174.61    185.238.130.233  3 -  120  128    7    0.000  -349.84 349.207
*91.198.174.62    46.243.26.34     2 -   40   64    7  170.043   -0.580   0.038