Page MenuHomePhabricator

Puppet change at each run on postgres replicas
Open, MediumPublic

Description

On netboxdb2002 and puppetdb2002:

Jun 22 2022 - 17:33:39 	Puppet 	notice 	Replication not initialised please run: resync_replica 	
Jun 22 2022 - 17:33:39 	/Stage[main]/Postgresql::Slave/Notify[Replication not initialised please run: resync_replica]/message 	notice, notify, class, postgresql::slave, postgresql, slave, profile::netbox::db, profile, netbox, db, role::netbox::database, role, database 	defined 'message' as 'Replication not initialised please run: resync_replica' 	/etc/puppet/modules/postgresql/manifests/slave.pp:89

Adding @hnowlan based on rOPUPa9024b8175b71227c77893bab52dbd62b07f7d50

  • How to know if it's safe to run that script? Is the replica currently useless?
  • That warning should be moved to an Icinga or Prometheus alert instead for better visibility and puppet to not complain

Event Timeline

ayounsi triaged this task as Medium priority.Wed, Jun 22, 3:50 PM
ayounsi created this task.
Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Change 807553 had a related patch set uploaded (by Jbond; author: jbond):

[operations/puppet@production] C:postgresql: grab the data directory from postgresql

https://gerrit.wikimedia.org/r/807553

I think https://gerrit.wikimedia.org/r/c/operations/puppet/+/807553 should fix this issue

How to know if it's safe to run that script? Is the replica currently useless?

This should only be needed when the server is first (re)imaged

That warning should be moved to an Icinga or Prometheus alert instead for better visibility and puppet to not complain

Although i think this can be improved somewhat in this instance i'm not sure icinga/promethous is the best places as its a bootstrapping step. I believe that there is already a check to make sure the replicas don't lag to far behind the primary which would also catch this issue (if there isin't we should add that)