This happened on toolsbeta again, the DB failed, so the core service was up but giving login errors.
Opening this task to record what I did.
From the harbor core service logs:
root@toolsbeta-harbor-1:/srv/ops/harbor# docker logs --tail 1000 -f harbor-core
...
2023-08-07T07:23:21Z [ERROR] [/lib/http/error.go:54]: {"errors":[{"code":"UNKNOWN","message":"unknown: deal with /service/notifications/tasks/41 request in transaction failed: failed to connect to `host=ttg4ncgzifw.svc.trove.eqiad1.wikimedia.cloud user=harbor database=harbor`: dial error (dial tcp 172.16.5.95:5432: connect: connection refused)"}]}Then from the trove DB:
(guest-agent-venv) root@harbordb:/root# docker logs -n 100 -f database ... 2023-08-07 08:43:57.280 UTC [16] PANIC: could not write to file "pg_logical/replorigin_checkpoint.tmp": No space left on device 2023-08-07 08:43:57.305 UTC [1] LOG: startup process (PID 16) was terminated by signal 6: Aborted 2023-08-07 08:43:57.305 UTC [1] LOG: aborting startup due to startup process failure 2023-08-07 08:43:57.320 UTC [1] LOG: database system is shut down
There I was looking to see what was taking space, it turned out to be the pg_wal directory (not wal_archive):
(guest-agent-venv) root@harbordb:/root# df -h ... /dev/sdb 4.9G 4.7G 0 100% /var/lib/postgresql ... (guest-agent-venv) root@harbordb:/var/lib/postgresql/data/pgdata# du -hs * ... 4.6G pg_wal ...
So I changed the actual postrges configuration manually to disable the pg_wal and archives:
(guest-agent-venv) root@harbordb:/var/lib/postgresql/data/pgdata/pg_wal# grep wal_level /etc/postgresql/postgresql.conf #wal_level = replica # minimal, replica, or logical wal_level = minimal # minimal, replica, or logical (guest-agent-venv) root@harbordb:/var/lib/postgresql/data/pgdata/pg_wal# grep max_wal_senders /etc/postgresql/postgresql.conf #max_wal_senders = 10 # max number of walsender processes max_wal_senders = 0 # max number of walsender processes (guest-agent-venv) root@harbordb:/var/lib/postgresql/data/pgdata/pg_wal# grep archive_mode /etc/postgresql/postgresql.conf archive_mode = on # enables archiving; off, on, or always archive_mode = off # enables archiving; off, on, or always
That gives us no replication support (see https://www.postgresql.org/docs/current/runtime-config-wal.html#GUC-WAL-LEVEL), and no archiving (point-in-time backup), but we are not using it anyhow, and frees a lot of space:
# had to clean up a few wal files to have some space to start the db: (guest-agent-venv) root@harbordb:/var/lib/postgresql/data/pgdata/pg_wal# rm 00000001000000010000001* (guest-agent-venv) root@harbordb:/var/lib/postgresql/data/pgdata/pg_wal# docker start database ... (guest-agent-venv) root@harbordb:/var/lib/postgresql/data/pgdata/pg_wal# df -h ... /dev/sdb 4.9G 523M 4.1G 12% /var/lib/postgresql ...