There is an alert on postgresql replication lag for maps1001 (POSTGRES_HOT_STANDBY_DELAY WARNING: DB "template1" (host:localhost) 1209248 and 0 seconds).
This needs to be investigated and fixed.
There is an alert on postgresql replication lag for maps1001 (POSTGRES_HOT_STANDBY_DELAY WARNING: DB "template1" (host:localhost) 1209248 and 0 seconds).
This needs to be investigated and fixed.
running select * from pg_stat_wal_receiver; on maps1001 returns empty. This means postgres slave is not receiving update from master. Also master only show two nodes connected instead of three:
pid | usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start | backend_xmin | state | sent_location | write_location | flush_location | replay_location | sync_priority | sync_state -------+----------+-------------+------------------+--------------+-----------------+-------------+-------------------------------+--------------+-----------+---------------+----------------+----------------+-----------------+---------------+------------ 26380 | 17995 | replication | walreceiver | 10.64.16.42 | | 60606 | 2019-07-18 12:30:00.972721+00 | | streaming | 240/A4EE7488 | 240/A4EE7488 | 240/A4EE7488 | 240/A4EE7488 | 0 | async 30016 | 17995 | replication | walreceiver | 10.64.32.117 | | 48256 | 2019-07-18 14:55:59.900838+00 | | streaming | 240/A4EE7488 | 240/A4EE7488 | 240/A4EE7488 | 240/A4EE7488 | 0 | async (2 rows)
We should reinit postgresql on maps1001.
Mentioned in SAL (#wikimedia-operations) [2019-08-05T18:06:34Z] <onimisionipe> reinit postgres on maps1001 - T229788
Mentioned in SAL (#wikimedia-operations) [2019-08-06T07:10:47Z] <onimisionipe> pool maps1001. Postgres init complete - T229788
Postgres reinitialization was performed to bring this slave back up. I'll close this task for now and investigate more if it re-occurs.