Page MenuHomePhabricator

nfacctd segfaulting on netflow2001
Closed, ResolvedPublic

Description

Happens since 12:26 today. First got SIGABRT for some reason, now consistently segfaults after running for just a minute or two.

Jun 30 12:26:01 netflow2001 nfacctd[14318]: INFO ( default_kafka/kafka ): [/etc/pmacct/librdkafka.conf] Reading librdkafka global config.
Jun 30 12:26:01 netflow2001 nfacctd[14318]: INFO ( default_kafka/kafka ): *** Purging cache - START (PID: 14318) ***
Jun 30 12:26:04 netflow2001 nfacctd[14318]: INFO ( default_kafka/kafka ): *** Purging cache - END (PID: 14318, QN: 5061/5061, ET: 0) ***
Jun 30 12:26:47 netflow2001 systemd[1]: nfacctd.service: Main process exited, code=killed, status=6/ABRT
Jun 30 12:26:47 netflow2001 systemd[1]: nfacctd.service: Failed with result 'signal'.
/var/log/syslog:Jun 30 12:32:18 netflow2001 kernel: [4228819.586764] nfacctd[15310]: segfault at 7fcbcefaa008 ip 00007fcbe659f28a sp 00007fcbe568aa50 error 6 in libc-2.28.so[7fcbe653e000+148000]                
/var/log/syslog:Jun 30 13:02:42 netflow2001 kernel: [4230643.380476] nfacctd[18020]: segfault at 7fc0a95b8018 ip 00007fc0d00d028a sp 00007fc0cf1bb9f0 error 6 in libc-2.28.so[7fc0d006f000+148000]                
/var/log/syslog:Jun 30 13:32:54 netflow2001 kernel: [4232455.096526] nfacctd[20887]: segfault at 7fbc33d97048 ip 00007fbc49b9628a sp 00007fbc48c81aa0 error 6 in libc-2.28.so[7fbc49b35000+148000]
/var/log/syslog:Jun 30 14:03:17 netflow2001 kernel: [4234277.712698] nfacctd[23426]: segfault at 7f76a902a008 ip 00007f76c9baf28a sp 00007f76c8c9aae0 error 6 in libc-2.28.so[7f76c9b4e000+148000]
/var/log/syslog:Jun 30 14:33:31 netflow2001 kernel: [4236092.440528] nfacctd[26449]: segfault at 7f81e5234008 ip 00007f820c2cf28a sp 00007f820b3ba9f0 error 6 in libc-2.28.so[7f820c26e000+148000]
/var/log/syslog:Jun 30 15:03:54 netflow2001 kernel: [4237915.137993] nfacctd[28875]: segfault at 7fd3d9429028 ip 00007fd400be328a sp 00007fd3ffcce9f0 error 6 in libc-2.28.so[7fd400b82000+148000]
/var/log/syslog:Jun 30 15:34:13 netflow2001 kernel: [4239733.568781] nfacctd[31748]: segfault at 7fc22d097048 ip 00007fc24eee228a sp 00007fc24dfcd980 error 6 in libc-2.28.so[7fc24ee81000+148000]
/var/log/syslog:Jun 30 16:01:59 netflow2001 kernel: [4241399.977073] nfacctd[1898]: segfault at 7f33f91bb018 ip 00007f340a1bc28a sp 00007f34092a7a50 error 6 in libc-2.28.so[7f340a15b000+148000]
/var/log/syslog:Jun 30 16:32:22 netflow2001 kernel: [4243222.558480] nfacctd[4811]: segfault at 7f89fd273008 ip 00007f8a0852828a sp 00007f8a07613980 error 6 in libc-2.28.so[7f8a084c7000+148000]
/var/log/syslog:Jun 30 17:02:36 netflow2001 kernel: [4245037.119738] nfacctd[7443]: segfault at 7f7785422008 ip 00007f779ad2728a sp 00007f7799e129f0 error 6 in libc-2.28.so[7f779acc6000+148000]
/var/log/syslog:Jun 30 17:32:59 netflow2001 kernel: [4246860.013527] nfacctd[10245]: segfault at 7f8291712018 ip 00007f82a6f0428a sp 00007f82a5fefa80 error 6 in libc-2.28.so[7f82a6ea3000+148000]
/var/log/syslog:Jun 30 17:38:46 netflow2001 kernel: [4247206.419012] nfacctd[11368]: segfault at 7f2bf1007018 ip 00007f2c17b6528a sp 00007f2c16c50a80 error 6 in libc-2.28.so[7f2c17b04000+148000]
/var/log/syslog:Jun 30 17:46:44 netflow2001 kernel: [4247685.373720] nfacctd[11683]: segfault at 7f19e2964008 ip 00007f19fa68c28a sp 00007f19f9777980 error 6 in libc-2.28.so[7f19fa62b000+148000]

As a guess from logs this seems to often be associated with specifically cr2-eqdfw connecting as a BGP peer, but it is hard to be sure.

Event Timeline

Restricted Application added a subscriber: Aklapper. ยท View Herald TranscriptJun 30 2020, 5:56 PM

Mentioned in SAL (#wikimedia-operations) [2020-06-30T18:05:30Z] <cdanis> installing libc6-dbg on netflow2001 T256790

Okay, here are some backtraces:

1[New LWP 13072]
2[New LWP 13069]
3[Thread debugging using libthread_db enabled]
4Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
5Core was generated by `nfacctd: Core Process [default] '.
6Program terminated with signal SIGSEGV, Segmentation fault.
7#0 _int_malloc (av=av@entry=0x7f4904000020, bytes=bytes@entry=91) at malloc.c:4108
8[Current thread is 1 (Thread 0x7f490a363700 (LWP 13072))]
9#0 _int_malloc (av=av@entry=0x7f4904000020, bytes=bytes@entry=91) at malloc.c:4108
10#1 0x00007f490b16956a in __GI___libc_malloc (bytes=91) at malloc.c:3057
11#2 0x000055f446f4804e in ?? ()
12#3 0x000055f446f48705 in aspath_dup ()
13#4 0x000055f446f48769 in ?? ()
14#5 0x000055f446f4a0bb in hash_get ()
15#6 0x000055f446f4891d in aspath_parse ()
16#7 0x000055f446f194ff in bgp_attr_parse_aspath ()
17#8 0x000055f446f199d2 in bgp_attr_parse ()
18#9 0x000055f446f1a951 in bgp_parse_update_msg ()
19#10 0x000055f446f1ae6a in bgp_parse_msg ()
20#11 0x000055f446f125b2 in skinny_bgp_daemon_online ()
21#12 0x000055f446f3f5d2 in thread_runner ()
22#13 0x00007f490b2adfa3 in start_thread (arg=<optimized out>) at pthread_create.c:486
23#14 0x00007f490b1de4cf in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
24
25[New LWP 15972]
26[New LWP 15971]
27[Thread debugging using libthread_db enabled]
28Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
29Core was generated by `nfacctd: Core Process [default] '.
30Program terminated with signal SIGSEGV, Segmentation fault.
31#0 _int_malloc (av=av@entry=0x7fa838000020, bytes=bytes@entry=24) at malloc.c:4108
32[Current thread is 1 (Thread 0x7fa83d1ad700 (LWP 15972))]
33#0 _int_malloc (av=av@entry=0x7fa838000020, bytes=bytes@entry=24) at malloc.c:4108
34#1 0x00007fa83dfb356a in __GI___libc_malloc (bytes=24) at malloc.c:3057
35#2 0x000055bf90edab7c in ?? ()
36#3 0x000055bf90edac66 in ?? ()
37#4 0x000055bf90edacc9 in ?? ()
38#5 0x000055bf90edb6f4 in aspath_dup ()
39#6 0x000055bf90edb769 in ?? ()
40#7 0x000055bf90edd0bb in hash_get ()
41#8 0x000055bf90edb91d in aspath_parse ()
42#9 0x000055bf90eac4ff in bgp_attr_parse_aspath ()
43#10 0x000055bf90eac9d2 in bgp_attr_parse ()
44#11 0x000055bf90ead951 in bgp_parse_update_msg ()
45#12 0x000055bf90eade6a in bgp_parse_msg ()
46#13 0x000055bf90ea55b2 in skinny_bgp_daemon_online ()
47#14 0x000055bf90ed25d2 in thread_runner ()
48#15 0x00007fa83e0f7fa3 in start_thread (arg=<optimized out>) at pthread_create.c:486
49#16 0x00007fa83e0284cf in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
50
51[New LWP 16298]
52[New LWP 16297]
53[Thread debugging using libthread_db enabled]
54Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
55Core was generated by `nfacctd: Core Process [default] '.
56Program terminated with signal SIGSEGV, Segmentation fault.
57#0 _int_malloc (av=av@entry=0x7f09c8000020, bytes=bytes@entry=72) at malloc.c:4108
58[Current thread is 1 (Thread 0x7f09ce919700 (LWP 16298))]
59#0 _int_malloc (av=av@entry=0x7f09c8000020, bytes=bytes@entry=72) at malloc.c:4108
60#1 0x00007f09cf7201a2 in __libc_calloc (n=<optimized out>, elem_size=<optimized out>) at malloc.c:3428
61#2 0x000055cdaaf34b72 in ?? ()
62#3 0x000055cdaaf351eb in bgp_node_get ()
63#4 0x000055cdaaf32b34 in bgp_process_update ()
64#5 0x000055cdaaf33380 in bgp_nlri_parse ()
65#6 0x000055cdaaf338bb in bgp_parse_update_msg ()
66#7 0x000055cdaaf33e6a in bgp_parse_msg ()
67#8 0x000055cdaaf2b5b2 in skinny_bgp_daemon_online ()
68#9 0x000055cdaaf585d2 in thread_runner ()
69#10 0x00007f09cf863fa3 in start_thread (arg=<optimized out>) at pthread_create.c:486
70#11 0x00007f09cf7944cf in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
71
72[New LWP 16399]
73[New LWP 16397]
74[Thread debugging using libthread_db enabled]
75Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
76Core was generated by `nfacctd: Core Process [default] '.
77Program terminated with signal SIGSEGV, Segmentation fault.
78#0 _int_malloc (av=av@entry=0x7f9a64000020, bytes=bytes@entry=24) at malloc.c:4108
79[Current thread is 1 (Thread 0x7f9a6c02d700 (LWP 16399))]
80#0 _int_malloc (av=av@entry=0x7f9a64000020, bytes=bytes@entry=24) at malloc.c:4108
81#1 0x00007f9a6ce3356a in __GI___libc_malloc (bytes=24) at malloc.c:3057
82#2 0x0000561c7b108b7c in ?? ()
83#3 0x0000561c7b108c66 in ?? ()
84#4 0x0000561c7b108cc9 in ?? ()
85#5 0x0000561c7b1096f4 in aspath_dup ()
86#6 0x0000561c7b109769 in ?? ()
87#7 0x0000561c7b10b0bb in hash_get ()
88#8 0x0000561c7b10991d in aspath_parse ()
89#9 0x0000561c7b0da4ff in bgp_attr_parse_aspath ()
90#10 0x0000561c7b0da9d2 in bgp_attr_parse ()
91#11 0x0000561c7b0db951 in bgp_parse_update_msg ()
92#12 0x0000561c7b0dbe6a in bgp_parse_msg ()
93#13 0x0000561c7b0d35b2 in skinny_bgp_daemon_online ()
94#14 0x0000561c7b1005d2 in thread_runner ()
95#15 0x00007f9a6cf77fa3 in start_thread (arg=<optimized out>) at pthread_create.c:486
96#16 0x00007f9a6cea84cf in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
97
98[New LWP 17455]
99[New LWP 17454]
100[Thread debugging using libthread_db enabled]
101Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
102Core was generated by `nfacctd: Core Process [default] '.
103Program terminated with signal SIGSEGV, Segmentation fault.
104#0 _int_malloc (av=av@entry=0x7f8c1c000020, bytes=bytes@entry=40) at malloc.c:4108
105[Current thread is 1 (Thread 0x7f8c21409700 (LWP 17455))]
106#0 _int_malloc (av=av@entry=0x7f8c1c000020, bytes=bytes@entry=40) at malloc.c:4108
107#1 0x00007f8c222101a2 in __libc_calloc (n=<optimized out>, elem_size=<optimized out>) at malloc.c:3428
108#2 0x0000563bf2afc44a in bgp_info_new ()
109#3 0x0000563bf2af7e5e in bgp_process_update ()
110#4 0x0000563bf2af8380 in bgp_nlri_parse ()
111#5 0x0000563bf2af88bb in bgp_parse_update_msg ()
112#6 0x0000563bf2af8e6a in bgp_parse_msg ()
113#7 0x0000563bf2af05b2 in skinny_bgp_daemon_online ()
114#8 0x0000563bf2b1d5d2 in thread_runner ()
115#9 0x00007f8c22353fa3 in start_thread (arg=<optimized out>) at pthread_create.c:486
116#10 0x00007f8c222844cf in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
117

When I saw crashes in malloc and then installed libc6-dbg to get arguments, I was hoping that the issue was malloc being invoked with a ridiculous parameter.

Seeing crashes for things like malloc(24) is much worse; that very likely means the heap got corrupted somewhere earlier in the program's execution...

Mentioned in SAL (#wikimedia-operations) [2020-06-30T18:31:25Z] <cdanis> T256790 โœ”๏ธ cdanis@netflow2001.codfw.wmnet ~ ๐Ÿ•โ˜• sudo apt install valgrind

1Jun 30 18:34:50 netflow2001 valgrind[19849]: ==19864== Thread 2:
2Jun 30 18:34:50 netflow2001 valgrind[19849]: ==19864== Invalid write of size 2
3Jun 30 18:34:50 netflow2001 valgrind[19849]: ==19864== at 0x1FF951: ecommunity_ecom2str (in /usr/sbin/nfacctd)
4Jun 30 18:34:50 netflow2001 valgrind[19849]: ==19864== by 0x1FFCDF: ecommunity_intern (in /usr/sbin/nfacctd)
5Jun 30 18:34:50 netflow2001 valgrind[19849]: ==19864== by 0x1CF59E: bgp_attr_parse_ecommunity (in /usr/sbin/nfacctd)
6Jun 30 18:34:50 netflow2001 valgrind[19849]: ==19864== by 0x1CF8D9: bgp_attr_parse (in /usr/sbin/nfacctd)
7Jun 30 18:34:50 netflow2001 valgrind[19849]: ==19864== by 0x1D0950: bgp_parse_update_msg (in /usr/sbin/nfacctd)
8Jun 30 18:34:50 netflow2001 valgrind[19849]: ==19864== by 0x1D0E69: bgp_parse_msg (in /usr/sbin/nfacctd)
9Jun 30 18:34:50 netflow2001 valgrind[19849]: ==19864== by 0x1C85B1: skinny_bgp_daemon_online (in /usr/sbin/nfacctd)
10Jun 30 18:34:50 netflow2001 valgrind[19849]: ==19864== by 0x1F55D1: thread_runner (in /usr/sbin/nfacctd)
11Jun 30 18:34:50 netflow2001 valgrind[19849]: ==19864== by 0x573CFA2: start_thread (pthread_create.c:486)
12Jun 30 18:34:50 netflow2001 valgrind[19849]: ==19864== by 0x584F4CE: clone (clone.S:95)
13Jun 30 18:34:50 netflow2001 valgrind[19849]: ==19864== Address 0x33acd638 is 0 bytes after a block of size 56 alloc'd
14Jun 30 18:34:50 netflow2001 valgrind[19849]: ==19864== at 0x4837D7B: realloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
15Jun 30 18:34:50 netflow2001 valgrind[19849]: ==19864== by 0x1FF9DD: ecommunity_ecom2str (in /usr/sbin/nfacctd)
16Jun 30 18:34:50 netflow2001 valgrind[19849]: ==19864== by 0x1FFCDF: ecommunity_intern (in /usr/sbin/nfacctd)
17Jun 30 18:34:50 netflow2001 valgrind[19849]: ==19864== by 0x1CF59E: bgp_attr_parse_ecommunity (in /usr/sbin/nfacctd)
18Jun 30 18:34:50 netflow2001 valgrind[19849]: ==19864== by 0x1CF8D9: bgp_attr_parse (in /usr/sbin/nfacctd)
19Jun 30 18:34:50 netflow2001 valgrind[19849]: ==19864== by 0x1D0950: bgp_parse_update_msg (in /usr/sbin/nfacctd)
20Jun 30 18:34:50 netflow2001 valgrind[19849]: ==19864== by 0x1D0E69: bgp_parse_msg (in /usr/sbin/nfacctd)
21Jun 30 18:34:50 netflow2001 valgrind[19849]: ==19864== by 0x1C85B1: skinny_bgp_daemon_online (in /usr/sbin/nfacctd)
22Jun 30 18:34:50 netflow2001 valgrind[19849]: ==19864== by 0x1F55D1: thread_runner (in /usr/sbin/nfacctd)
23Jun 30 18:34:50 netflow2001 valgrind[19849]: ==19864== by 0x573CFA2: start_thread (pthread_create.c:486)
24Jun 30 18:34:50 netflow2001 valgrind[19849]: ==19864== by 0x584F4CE: clone (clone.S:95)
25Jun 30 18:34:50 netflow2001 valgrind[19849]: ==19864==
26Jun 30 18:34:50 netflow2001 valgrind[19849]: ==19864== Invalid write of size 1
27Jun 30 18:34:50 netflow2001 valgrind[19849]: ==19864== at 0x1FF96D: ecommunity_ecom2str (in /usr/sbin/nfacctd)
28Jun 30 18:34:50 netflow2001 valgrind[19849]: ==19864== by 0x1FFCDF: ecommunity_intern (in /usr/sbin/nfacctd)
29Jun 30 18:34:50 netflow2001 valgrind[19849]: ==19864== by 0x1CF59E: bgp_attr_parse_ecommunity (in /usr/sbin/nfacctd)
30Jun 30 18:34:50 netflow2001 valgrind[19849]: ==19864== by 0x1CF8D9: bgp_attr_parse (in /usr/sbin/nfacctd)
31Jun 30 18:34:50 netflow2001 valgrind[19849]: ==19864== by 0x1D0950: bgp_parse_update_msg (in /usr/sbin/nfacctd)
32Jun 30 18:34:50 netflow2001 valgrind[19849]: ==19864== by 0x1D0E69: bgp_parse_msg (in /usr/sbin/nfacctd)
33Jun 30 18:34:50 netflow2001 valgrind[19849]: ==19864== by 0x1C85B1: skinny_bgp_daemon_online (in /usr/sbin/nfacctd)
34Jun 30 18:34:50 netflow2001 valgrind[19849]: ==19864== by 0x1F55D1: thread_runner (in /usr/sbin/nfacctd)
35Jun 30 18:34:50 netflow2001 valgrind[19849]: ==19864== by 0x573CFA2: start_thread (pthread_create.c:486)
36Jun 30 18:34:50 netflow2001 valgrind[19849]: ==19864== by 0x584F4CE: clone (clone.S:95)
37Jun 30 18:34:50 netflow2001 valgrind[19849]: ==19864== Address 0x33acd639 is 1 bytes after a block of size 56 alloc'd
38Jun 30 18:34:50 netflow2001 valgrind[19849]: ==19864== at 0x4837D7B: realloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
39Jun 30 18:34:50 netflow2001 valgrind[19849]: ==19864== by 0x1FF9DD: ecommunity_ecom2str (in /usr/sbin/nfacctd)
40Jun 30 18:34:50 netflow2001 valgrind[19849]: ==19864== by 0x1FFCDF: ecommunity_intern (in /usr/sbin/nfacctd)
41Jun 30 18:34:50 netflow2001 valgrind[19849]: ==19864== by 0x1CF59E: bgp_attr_parse_ecommunity (in /usr/sbin/nfacctd)
42Jun 30 18:34:50 netflow2001 valgrind[19849]: ==19864== by 0x1CF8D9: bgp_attr_parse (in /usr/sbin/nfacctd)
43Jun 30 18:34:50 netflow2001 valgrind[19849]: ==19864== by 0x1D0950: bgp_parse_update_msg (in /usr/sbin/nfacctd)
44Jun 30 18:34:50 netflow2001 valgrind[19849]: ==19864== by 0x1D0E69: bgp_parse_msg (in /usr/sbin/nfacctd)
45Jun 30 18:34:50 netflow2001 valgrind[19849]: ==19864== by 0x1C85B1: skinny_bgp_daemon_online (in /usr/sbin/nfacctd)
46Jun 30 18:34:50 netflow2001 valgrind[19849]: ==19864== by 0x1F55D1: thread_runner (in /usr/sbin/nfacctd)
47Jun 30 18:34:50 netflow2001 valgrind[19849]: ==19864== by 0x573CFA2: start_thread (pthread_create.c:486)
48Jun 30 18:34:50 netflow2001 valgrind[19849]: ==19864== by 0x584F4CE: clone (clone.S:95)
49Jun 30 18:34:50 netflow2001 valgrind[19849]: ==19864==
50Jun 30 18:34:51 netflow2001 valgrind[19849]: ==19864== Invalid write of size 2
51Jun 30 18:34:51 netflow2001 valgrind[19849]: ==19864== at 0x1FFAD9: ecommunity_ecom2str (in /usr/sbin/nfacctd)
52Jun 30 18:34:51 netflow2001 valgrind[19849]: ==19864== by 0x1FFCDF: ecommunity_intern (in /usr/sbin/nfacctd)
53Jun 30 18:34:51 netflow2001 valgrind[19849]: ==19864== by 0x1CF59E: bgp_attr_parse_ecommunity (in /usr/sbin/nfacctd)
54Jun 30 18:34:51 netflow2001 valgrind[19849]: ==19864== by 0x1CF8D9: bgp_attr_parse (in /usr/sbin/nfacctd)
55Jun 30 18:34:51 netflow2001 valgrind[19849]: ==19864== by 0x1D0950: bgp_parse_update_msg (in /usr/sbin/nfacctd)
56Jun 30 18:34:51 netflow2001 valgrind[19849]: ==19864== by 0x1D0E69: bgp_parse_msg (in /usr/sbin/nfacctd)
57Jun 30 18:34:51 netflow2001 valgrind[19849]: ==19864== by 0x1C85B1: skinny_bgp_daemon_online (in /usr/sbin/nfacctd)
58Jun 30 18:34:51 netflow2001 valgrind[19849]: ==19864== by 0x1F55D1: thread_runner (in /usr/sbin/nfacctd)
59Jun 30 18:34:51 netflow2001 valgrind[19849]: ==19864== by 0x573CFA2: start_thread (pthread_create.c:486)
60Jun 30 18:34:51 netflow2001 valgrind[19849]: ==19864== by 0x584F4CE: clone (clone.S:95)
61Jun 30 18:34:51 netflow2001 valgrind[19849]: ==19864== Address 0x33acd63a is 2 bytes after a block of size 56 alloc'd
62Jun 30 18:34:51 netflow2001 valgrind[19849]: ==19864== at 0x4837D7B: realloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
63Jun 30 18:34:51 netflow2001 valgrind[19849]: ==19864== by 0x1FF9DD: ecommunity_ecom2str (in /usr/sbin/nfacctd)
64Jun 30 18:34:51 netflow2001 valgrind[19849]: ==19864== by 0x1FFCDF: ecommunity_intern (in /usr/sbin/nfacctd)
65Jun 30 18:34:51 netflow2001 valgrind[19849]: ==19864== by 0x1CF59E: bgp_attr_parse_ecommunity (in /usr/sbin/nfacctd)
66Jun 30 18:34:51 netflow2001 valgrind[19849]: ==19864== by 0x1CF8D9: bgp_attr_parse (in /usr/sbin/nfacctd)
67Jun 30 18:34:51 netflow2001 valgrind[19849]: ==19864== by 0x1D0950: bgp_parse_update_msg (in /usr/sbin/nfacctd)
68Jun 30 18:34:51 netflow2001 valgrind[19849]: ==19864== by 0x1D0E69: bgp_parse_msg (in /usr/sbin/nfacctd)
69Jun 30 18:34:51 netflow2001 valgrind[19849]: ==19864== by 0x1C85B1: skinny_bgp_daemon_online (in /usr/sbin/nfacctd)
70Jun 30 18:34:51 netflow2001 valgrind[19849]: ==19864== by 0x1F55D1: thread_runner (in /usr/sbin/nfacctd)
71Jun 30 18:34:51 netflow2001 valgrind[19849]: ==19864== by 0x573CFA2: start_thread (pthread_create.c:486)
72Jun 30 18:34:51 netflow2001 valgrind[19849]: ==19864== by 0x584F4CE: clone (clone.S:95)
73Jun 30 18:34:51 netflow2001 valgrind[19849]: ==19864==

Likely segfaulting on netflow1001 now too.

Upstream has a patch; I'll attempt backporting it to our version in the morning. If it works I'll also file in Debian BTS with the patch and see about getting it backported to stable.

Mentioned in SAL (#wikimedia-operations) [2020-07-01T08:29:12Z] <XioNoX> disable BGP to nfacct in eqiad - T256790

Mentioned in SAL (#wikimedia-operations) [2020-07-01T12:55:36Z] <cdanis> T256790 โœ”๏ธ cdanis@apt1001.wikimedia.org ~ ๐Ÿ•˜โ˜• sudo -E reprepro -C main include buster-wikimedia pmacct_1.7.2-3+wmf1_amd64.changes

Mentioned in SAL (#wikimedia-operations) [2020-07-01T12:58:50Z] <cdanis> T256790 โœ”๏ธ cdanis@cumin1001.eqiad.wmnet ~ ๐Ÿ•˜โ˜• sudo debdeploy deploy -u 2020-07-01-pmacct.yaml -s netflow

Mentioned in SAL (#wikimedia-operations) [2020-07-01T13:03:25Z] <cdanis> T256790 โœ”๏ธ cdanis@cumin1001.eqiad.wmnet ~ ๐Ÿ•˜โ˜• sudo cumin 'netflow[3-5]001*' 'systemctl restart nfacctd'

CDanis claimed this task.

Backport deployed, all seems well (fortunately netflow1001 was still receiving the triggering BGP data when I went to test, so verified old version still crashed, verified new version didn't, verified old version still still crashed, verified new version still didn't).

Also quilt patch submitted to the Debian maintainer at https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=964083