Page MenuHomePhabricator

postgresql::ganglia on puppetdb servers - authentication failed
Closed, ResolvedPublic

Description

Faidon pointed this out:

10:43 < paravoid> faidon@nitrogen:~$ sudo grep -c gmond /var/log/syslog

10:43 < paravoid> Jul  6 17:38:17 nitrogen gmond[1773]: Could not connect to database FATAL:  password authentication failed for user "ganglia_stats"
10:43 < paravoid> Jul  6 17:38:17 nitrogen gmond[1773]: FATAL:  password authentication failed for user "ganglia_stats"

I found that we have the same issue on both nitrogen and nihal. Their role is puppetmaster:;puppetdb.

puppetdb/database.pp sets up a postgresql database.

module postgresql has manifests/ganglia.pp with postgresql::ganglia which sets:

5                     $pgstats_user,
6                     $pgstats_pass,
7                     $pgstats_db = 'template1',
8                     $pgstats_host = '127.0.0.1',
9                     $pgstats_port = '5432',

The template from the puppet class gets realized as /etc/ganglia/conf.d/postgresql.pyconf

There is a (random-looking) password in there that is the same on both servers, but using it doesn't give you access, as the error says:

psql --username ganglia_stats -W template1
Password for user ganglia_stats: 
psql: FATAL:  Peer authentication failed for user "ganglia_stats"

I can see in syslog*.gz that this has been going on since June 30 or earlier (maybe always has been?).

Do we still need this Ganglia plugin or should we simply remove it since Ganglia is deprecated?

Do you know if it has worked before and somehow changed or if it was simply never finished?

Has it maybe been replaced by prometheus? Or should i change the password / try to fix the access to let it write stats?

1 # == Class postgresql::ganglia
2 # This installs a Ganglia plugin for postgresql

Details

Related Gerrit Patches:

Event Timeline

Dzahn created this task.Jul 7 2017, 12:13 AM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJul 7 2017, 12:13 AM

Do we still need this Ganglia plugin or should we simply remove it since Ganglia is deprecated?

We should remove it.

Do you know if it has worked before and somehow changed or if it was simply never finished?

The plugin itself is working fine for the labsdb postgres databases. Since puppetdb postgres uses the same classes, it was probably brought in by the puppetization but the user needs to be created manually and never was. So in the puppetdb context it never worked.

Has it maybe been replaced by prometheus?

Partially. The replication lag metric has been implemented as a textfile collector and it's in modules/postgresql/files/prometheus/postgresql_replication_lag.sh. The other metrics have not been implemented.

Or should i change the password / try to fix the access to let it write stats?

We 've deprecated ganglia, so no. We are better removing it.

Dzahn claimed this task.Jul 7 2017, 2:50 PM

thanks @akosiaris gotcha!

Change 365887 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] puppetdb: remove postgres::ganglia from puppetdb role

https://gerrit.wikimedia.org/r/365887

Change 365887 merged by Dzahn:
[operations/puppet@production] puppetdb: remove postgres::ganglia from puppetdb role

https://gerrit.wikimedia.org/r/365887

Mentioned in SAL (#wikimedia-operations) [2017-07-18T02:13:19Z] <mutante> nitrogen/nihal - rm /usr/lib/ganglia/python_modules/postgresql.py ; rm /etc/ganglia/conf.d/* ; restart gmond (T169953)

Dzahn closed this task as Resolved.Jul 18 2017, 2:15 AM

removed from nitrogen and nihal - cleaned up - do not see it anymore in logs now. for the scope of this ticket, should be done.

Dzahn added a subscriber: faidon.Jul 18 2017, 2:17 AM

@faidon You reported it and asked me to take a look, i saw you weren't on the ticket, so fyi now. Should be gone.