Page MenuHomePhabricator

postgresql is stuck on cloudbackup2001
Closed, ResolvedPublic

Description

It seems like running any PostgreSQL command on cloudbackup2001 just gets stuck. This is also blocking Puppet runs.

Event Timeline

taavi triaged this task as High priority.Oct 9 2023, 7:30 AM
taavi created this task.

Not sure if related, but the host is also almost out of disk space:

/dev/mapper/backup-cinder--backups   80T   75T  1.5T  99% /srv/cinder-backups

the postgres log is full of:

2023-10-09 01:44:03 GMT LOG:  using stale statistics instead of current ones because stats collector is not responding
2023-10-09 01:44:13 GMT LOG:  using stale statistics instead of current ones because stats collector is not responding
2023-10-09 01:44:23 GMT LOG:  using stale statistics instead of current ones because stats collector is not responding
2023-10-09 01:44:34 GMT LOG:  using stale statistics instead of current ones because stats collector is not responding
2023-10-09 01:44:44 GMT LOG:  using stale statistics instead of current ones because stats collector is not responding
2023-10-09 01:44:54 GMT LOG:  using stale statistics instead of current ones because stats collector is not responding

Mentioned in SAL (#wikimedia-cloud) [2023-10-09T07:35:14Z] <taavi> restart postgresql on cloudbackup2001 T348431

I restarted Postgres. It's clearly doing something on an 80G pgsql_tmp directory according to a strace, but that's taking a while. I'll come back to it later.

It seems like Postgres is back up. Andrew also did something to the data directory and now it's about halfway full instead of 99% full.