I've seen some spikes in the DB errors on Kibana all related to db2055:
https://logstash.wikimedia.org/#/dashboard/elasticsearch/wfLogDBError
See attached screenshot:
I've found quite an amount of dropped packets on the network interface. For reference similar servers have ~10-100 dropped packets.
eth0 Link encap:Ethernet HWaddr 40:a8:f0:35:4d:1c
inet addr:10.192.48.7 Bcast:10.192.51.255 Mask:255.255.252.0
inet6 addr: 2620:0:860:104:42a8:f0ff:fe35:4d1c/64 Scope:Global
inet6 addr: fe80::42a8:f0ff:fe35:4d1c/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:15868076040 errors:0 dropped:175349 overruns:0 frame:0
TX packets:2688172933 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:3124888269011 (2.8 TiB) TX bytes:1069230584257 (995.7 GiB)
Interrupt:26From dmesg:
[Mon Apr 11 08:01:49 2016] Process accounting resumed [Tue Apr 12 08:03:17 2016] Process accounting resumed [Wed Apr 13 08:04:46 2016] Process accounting resumed [Thu Apr 14 08:06:14 2016] Process accounting resumed [Fri Apr 15 08:07:43 2016] Process accounting resumed [Sat Apr 16 08:09:11 2016] Process accounting resumed [Sun Apr 17 08:10:39 2016] Process accounting resumed [Mon Apr 18 08:12:08 2016] Process accounting resumed [Mon Apr 18 13:08:00 2016] systemd-sysv-generator[58201]: Overwriting existing symlink /run/systemd/generator.late/mysql.service with real service [Mon Apr 18 13:08:01 2016] systemd-sysv-generator[58769]: Overwriting existing symlink /run/systemd/generator.late/mysql.service with real service [Mon Apr 18 13:08:02 2016] systemd-sysv-generator[58853]: Overwriting existing symlink /run/systemd/generator.late/mysql.service with real service [Mon Apr 18 13:08:02 2016] systemd-sysv-generator[58883]: Overwriting existing symlink /run/systemd/generator.late/mysql.service with real service [Tue Apr 19 08:13:36 2016] Process accounting resumed [Tue Apr 19 18:26:13 2016] perf interrupt took too long (76972 > 71428), lowering kernel.perf_event_max_sample_rate to 1750 [Wed Apr 20 08:15:05 2016] Process accounting resumed [Wed Apr 20 10:32:56 2016] INFO: NMI handler (perf_event_nmi_handler) took too long to run: 1.344 msecs [Wed Apr 20 19:43:35 2016] INFO: NMI handler (perf_event_nmi_handler) took too long to run: 1.352 msecs [Thu Apr 21 08:16:33 2016] Process accounting resumed
I didn't see the dropped packets increase while checking the server, but we can check after the next peak (if any) to see if they increased.
Nothing on mysql error log and load of the server from grafana, although it's role is vslow hence has some long queries running from time to time.
