Page MenuHomePhabricator

Could not find class ::profile::swap for an-test-client1001.eqiad.wmnet
Closed, ResolvedPublic

Description

profile::swap has been removed, but I noticed the following alert on alerts.wikimedia.org:

CRITICAL: the following (6) node(s) change every puppet run: an-test-client1001.eqiad.wmnet

Running sudo -i puppet agent -tv on an-test-client1001.eqiad.wmnet showed the error:

Error: Could not retrieve catalog from remote server: Error 500 on SERVER: Server Error: Evaluation Error: Error while evaluating a Function Call, Could not find class ::profile::swap for an-test-client1001.eqiad.wmnet (file: /etc/puppet/modules/role/manifests/analytics_test_cluster/client.pp, line: 27, column: 5) on node an-test-client1001.eqiad.wmnet

We should cleanup any remaining profile::swap references.

Event Timeline

Change 685066 had a related patch set uploaded (by Razzi; author: Razzi):

[operations/puppet@production] swap: remove references to profile::swap

https://gerrit.wikimedia.org/r/685066

Change 685066 merged by Razzi:

[operations/puppet@production] swap: remove references to profile::swap

https://gerrit.wikimedia.org/r/685066

Hm, that patch fixed the underlying issue, and running the check manually produces the intended result:

razzi@puppetdb1002:~$ /usr/lib/nagios/plugins/check_puppet_run_changes
CRITICAL: the following (5) node(s) change every puppet run: snapshot1015.eqiad.wmnet, snapshot1014.eqiad.wmnet, maps1009.eqiad.wmnet, wdqs1011.eqiad.wmnet, webperf1001.eqiad.wmnet

However on alerts.wikimedia.org / icinga the alert is still showing 6 hosts, including an-test-client1001.eqiad.wmnet. @elukey is there a way to make icinga notice the change in hosts that are alerting? Perhaps this will resolve itself soon enough

@razzi each check has its own interval, check_puppet_run_changes might run every X hours so it may be slow to update. If you want to get fresh results you can force a reschedule of the check via Icinga UI (you should find the option in the dropdown menu where Acknowledge etc.. options are).

Hey sorry yall! I thought I had done a code search and removed all occurrences...must not have noticed this on an-test-client somehow. Thank you.

Ok, sure enough, the alert has removed an-test-client from its erroring nodes.