Page MenuHomePhabricator

Error in postgres puppettization for new installation (was Netbox: postgres cannot be restarted w/ current config)
Closed, ResolvedPublic

Description

With the current Puppetization for Netbox, Postgres is not able to start. This happened today after a reboot for the kernel security update.

To make it running again I had to manually modify the configurations as follows:

  • comment include 'tuning.conf' in /etc/postgresql/9.6/main/postgresql.conf
  • comment checkpoint_segments = 64 in /etc/postgresql/9.6/main/master.conf

Few action items here:

  • Puppet is broken on netmon2001 because postgres is not installed
  • Postgres DB was empty after I was able to have it restarted. No tables defined, but a netbox DB is defined.
  • Icinga doesn't have any check that Postgres is up and running, see T185504
  • Icinga doens't have any check that Netbox is up and running, see T185505
  • The Postgres module uses a master.conf.erb file that defines checkpoint_segments = <%= @checkpoint_segments %>, but the checkpoint_segments setting was deprecated in 9.5 and removed in 9.6, see https://www.postgresql.org/docs/9.6/static/release-9-5.html.
  • Netbox profile call the Postgres one with includes => ['tuning.conf'],, but it doesn't install the tuning.conf file, as required by our current puppetization. Does it need it? See modules/postgresql/manifests/server.pp:
#   includes
#       An array of files that will be included in the config. It is
#       the caller's responsibility to provide these
  • A bit unrelated to this error, but came out while debugging, in modules/role/manifests/postgres/common.pp the version 9.4 of Postgresql is hardcoded
  • Puppet was failing because the command defined in $userexists inside modules/postgresql/manifests/user.pp was failing, and the real reason (postgres not started) is a bit hidden in the middle of the message. I was wondering if it should had fail earlier. Here the log:
Notice: /Stage[main]/Profile::Netbox/Postgresql::User[replication@netmon2001]/Exec[create_user-replication@netmon2001]/returns: could not change directory to "/root": Permission denied
Notice: /Stage[main]/Profile::Netbox/Postgresql::User[replication@netmon2001]/Exec[create_user-replication@netmon2001]/returns: createuser: could not connect to database postgres: could not connect to server: No such file or directory
Notice: /Stage[main]/Profile::Netbox/Postgresql::User[replication@netmon2001]/Exec[create_user-replication@netmon2001]/returns:        Is the server running locally and accepting
Notice: /Stage[main]/Profile::Netbox/Postgresql::User[replication@netmon2001]/Exec[create_user-replication@netmon2001]/returns:        connections on Unix domain socket "/var/run/postgresql/.s.PGSQL.5432"?
Error: /usr/bin/createuser --no-superuser --no-createdb --no-createrole replication returned 1 instead of one of [0]
Error: /Stage[main]/Profile::Netbox/Postgresql::User[replication@netmon2001]/Exec[create_user-replication@netmon2001]/returns: change from notrun to 0 failed: /usr/bin/createuser --no-superuser --no-createdb --no-createrole replication returned 1 instead of one of [0]

Event Timeline

Change 404504 had a related patch set uploaded (by Ayounsi; owner: Ayounsi):
[operations/puppet@production] Netbox: remove tunning.conf include; PGSQL fix 9.6 deprecated option

https://gerrit.wikimedia.org/r/404504

Change 404504 merged by Ayounsi:
[operations/puppet@production] Netbox: remove tunning.conf include; PGSQL fix 9.6 deprecated option

https://gerrit.wikimedia.org/r/404504

ayounsi updated the task description. (Show Details)

Change 404516 had a related patch set uploaded (by Ayounsi; owner: Ayounsi):
[operations/puppet@production] Postgres: remove hardcoded version

https://gerrit.wikimedia.org/r/404516

About

Puppet is broken on netmon2001 because postgres is not installed

Is because package installation is done in https://github.com/wikimedia/puppet/blob/a867c8f6e95f5a6862bb911f76c54ac6449307ac/modules/postgresql/manifests/server.pp#L43

But https://github.com/wikimedia/puppet/blob/a867c8f6e95f5a6862bb911f76c54ac6449307ac/modules/postgresql/manifests/slave.pp#L70 says:

before  => Class['postgresql::server'],
require => Exec["pg_basebackup-${master_server}"],

So Puppet tries to execute a Postgres command before installing the packages, which fails.

From the file history, maybe @akosiaris can help here.

About:

Postgres DB was empty after I was able to have it restarted. No tables defined, but a netbox DB is defined.

I re-ran the scap script which recreated the tables properly. Will monitor to see if that happens again.

Alex found the issue!

The data was in /var/lib/postgres/9.6 (default location). The restart made postgres use the "proper" location (set by puppet) of /srv/postgresql/9.6.
I moved the data over and everything is back as normal.

Nice! So I guess that our puppetization is not correct and should restart Postgres after the first configuration change to ensure that the new data directory is used from the start.

Change 407415 had a related patch set uploaded (by Alexandros Kosiaris; owner: Alexandros Kosiaris):
[operations/puppet@production] postgresql::slave: Granular relationships

https://gerrit.wikimedia.org/r/407415

Change 407415 merged by Alexandros Kosiaris:
[operations/puppet@production] postgresql::slave: Granular relationships

https://gerrit.wikimedia.org/r/407415

Change 407432 had a related patch set uploaded (by Alexandros Kosiaris; owner: Alexandros Kosiaris):
[operations/puppet@production] netbox: Allow slaves to to connect to master

https://gerrit.wikimedia.org/r/407432

Change 407432 merged by Alexandros Kosiaris:
[operations/puppet@production] netbox: Allow slaves to to connect to master

https://gerrit.wikimedia.org/r/407432

Change 407438 had a related patch set uploaded (by Alexandros Kosiaris; owner: Alexandros Kosiaris):
[operations/puppet@production] netbox: Add IPv6 ferm rules as well

https://gerrit.wikimedia.org/r/407438

Change 407438 merged by Alexandros Kosiaris:
[operations/puppet@production] netbox: Add IPv6 ferm rules as well

https://gerrit.wikimedia.org/r/407438

Change 407446 had a related patch set uploaded (by Alexandros Kosiaris; owner: Alexandros Kosiaris):
[operations/puppet@production] netbox: Add IPv6 postgresql::user resource

https://gerrit.wikimedia.org/r/407446

Change 407446 merged by Alexandros Kosiaris:
[operations/puppet@production] netbox: Add IPv6 postgresql::user resource

https://gerrit.wikimedia.org/r/407446

The above series of patches fixed netmon2001 not running puppet.

Dzahn triaged this task as High priority.Feb 1 2018, 9:45 PM

Change 404516 merged by Ayounsi:
[operations/puppet@production] Postgres: remove hardcoded version

https://gerrit.wikimedia.org/r/404516

Dzahn subscribed.

Checked the monitoring check box. We have that now. Details in subtask.

Is rebooted the Netbox hosts (1002, 2001) for the MDS kernel issues this week and that does not seem to be an issue any more. Can this bug be closed or is there anything else actionable?

Most things were indeed fixed, I'm not sure on the status of the last 2 in the description checkboxes list. But they shouldn't affect anymore reboots/restarts but at most new installations. I'll rename it

Volans renamed this task from Netbox: postgres cannot be restarted w/ current config to Error in postgres puppettization for new installation (was Netbox: postgres cannot be restarted w/ current config).Jul 5 2019, 3:04 PM
Volans lowered the priority of this task from High to Low.
ayounsi claimed this task.

Talked with John who is working on Postgres for PuppetDB, the last issue is not happening anymore.