Page MenuHomePhabricator

replica labsdb1001 down
Closed, ResolvedPublic

Description

Reported on IRC. Some bots and tools are failing as a result.

Related Objects

Event Timeline

Restricted Application added subscribers: Zppix, Aklapper. · View Herald Transcript

It was swapping, we rebooted it and it is back up. Seems to be an XFS related memory leak that @Dzahn remembers as having stuck elsewhere too. Lots of repeated:

Jun  1 22:27:59 labsdb1001 kernel: [47163569.167879] XFS: possible memory allocation deadlock in kmem_alloc (mode:0x250)

in kern.log.

After it was rebooted mysql was not running yet:

18:37 < mutante> root@labsdb1001:~# /etc/init.d/mysql status
18:37 < mutante> /opt/wmf-mariadb10 * MySQL is not running
18:37 < mutante> root@labsdb1001:~# /etc/init.d/mysql start
18:37 < mutante> /opt/wmf-mariadb10
18:37 < mutante> Starting MySQL
18:37 < mutante> ............

took a while and then

18:39 < mutante> ok, it is done
18:39 < YuviPanda> yay
18:39 < mutante> * Manager of pid-file quit without updating file.
18:39 < YuviPanda> it seems back
18:39 < mutante> running now

This XFS bug was reported here: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1382333
The fix landed in 3.13.0-40.69, but before the crash occured labsdb1001 was still running the previous
trusty kernel release. With the reboot it's now running the 3.13.0-83 kernel so this should not happen again.

yuvipanda claimed this task.