Page MenuHomePhabricator

New cronspam from db clusters
Closed, ResolvedPublic

Description

Since Feb 13th, cron has been sending messages like the following from invocations of wmf-auto-restart on db1106 and db2085:

Cron <root@db2085> /usr/local/sbin/wmf-auto-restart -s ssh
---
Showing one /org/freedesktop/systemd1/unit/ssh_2eservice
Sent message type=method_call sender=n/a destination=org.freedesktop.systemd1 object=/org/freedesktop/systemd1/unit/ssh_2eservice interface=org.freedesktop.DBus.Properties member=GetAll cookie=1 reply_cookie=0 error=n/a
Got message type=method_return sender=n/a destination=n/a object=n/a interface=n/a member=n/a cookie=1 reply_cookie=1 error=n/a

It appears a kernel update was applied on those servers that aren't on any other database:

Start-Date: 2019-02-13  07:51:49
Commandline: apt-get install linux-image-amd64
Requested-By: jmm (11984)
Install: linux-image-amd64:amd64 (4.9+80+deb9u6), linux-image-4.9.0-8-amd64:amd64 (4.9.130-2, automatic)
End-Date: 2019-02-13  07:52:05

The main difference I could find is the affected servers are running the newer kernel 4.9.130-2.

Event Timeline

We probably just need to reboot them without the kernel running debug mode as spoken on Friday

Mentioned in SAL (#wikimedia-operations) [2019-02-18T06:49:31Z] <marostegui> Reboot db2085 to disable debug mode on kernel T216273

db2085 has been rebooted - let's see if that stops the amount of emails.

db2085 has been rebooted - let's see if that stops the amount of emails.

I re-ran the auto restarts manually on db2085 and that didn't lead to any new Cron mails, so I think we can close this task onece db1106 is also rebooted back to non-debug mode.

I will take care of db1106 as I need to depool it anyways today or tomorrow.

Mentioned in SAL (#wikimedia-operations) [2019-02-19T07:46:32Z] <marostegui> Reboot db1106 for kernel upgrade (and remove debug from kernel) T216240 T216273

I have rebooted db1106, I will give it sometime to confirm the spam is gone before closing this task.

Sounds good, on db2085 there's been no further occasion after the reboot.

Nothing has arrived since the restart without debug, so I think we are good