Page MenuHomePhabricator

libup-db02 is in error state
Closed, ResolvedPublic

Description

rtriy5eslxo.svc.trove.eqiad1.wikimedia.cloud is in Operating Status Error

xref, but not necessarily the cause of T345930: LibUp hasn't run since 5 June 2023.

Event Timeline

The database has filled up, which explains why it does not work:

/dev/sdb        9.8G  9.3G     0 100% /var/lib/mysql

But I can't tell if the resize failure is due to that or the guest agent being bad at keeping a RabbitMQ connection.

It looks like we might need a disk quota increase, so we can increase the volume on that host, and try and get it back operational?

It's running again, but would be nice to make sure this doesn't break everything (again?) in the near future :)

So the instance seems to be in Active/Healthy again. The main thing consuming disk is the logs table.

root@libup-db02:/var/lib/mysql/data/libup# du -sh * | sort -hr
9.1G	logs.ibd

I wonder if we really need to store all of the logs forever. Anyhow I think we can bump the quota for now to get things running again.

taavi claimed this task.
Filesystem      Size  Used Avail Use% Mounted on
/dev/sdb         30G  9.3G   19G  33% /var/lib/mysql