Page MenuHomePhabricator

postgresql replication issues on maps1001
Closed, ResolvedPublic

Description

There is an alert on postgresql replication lag for maps1001 (POSTGRES_HOT_STANDBY_DELAY WARNING: DB "template1" (host:localhost) 1209248 and 0 seconds).

This needs to be investigated and fixed.

Event Timeline

running select * from pg_stat_wal_receiver; on maps1001 returns empty. This means postgres slave is not receiving update from master. Also master only show two nodes connected instead of three:

pid  | usesysid |   usename   | application_name | client_addr  | client_hostname | client_port |         backend_start         | backend_xmin |   state   | sent_location | write_location | flush_location | replay_location | sync_priority | sync_state 
-------+----------+-------------+------------------+--------------+-----------------+-------------+-------------------------------+--------------+-----------+---------------+----------------+----------------+-----------------+---------------+------------
 26380 |    17995 | replication | walreceiver      | 10.64.16.42  |                 |       60606 | 2019-07-18 12:30:00.972721+00 |              | streaming | 240/A4EE7488  | 240/A4EE7488   | 240/A4EE7488   | 240/A4EE7488    |             0 | async
 30016 |    17995 | replication | walreceiver      | 10.64.32.117 |                 |       48256 | 2019-07-18 14:55:59.900838+00 |              | streaming | 240/A4EE7488  | 240/A4EE7488   | 240/A4EE7488   | 240/A4EE7488    |             0 | async
(2 rows)

We should reinit postgresql on maps1001.

Mentioned in SAL (#wikimedia-operations) [2019-08-05T18:06:34Z] <onimisionipe> reinit postgres on maps1001 - T229788

Mentioned in SAL (#wikimedia-operations) [2019-08-06T07:10:47Z] <onimisionipe> pool maps1001. Postgres init complete - T229788

Mathew.onipe claimed this task.

Postgres reinitialization was performed to bring this slave back up. I'll close this task for now and investigate more if it re-occurs.