Page MenuHomePhabricator

[ceph] make puppet catalog compiler pass for all ceph hosts
Closed, ResolvedPublic

Description

Write the description below

Currently some hosts fail to run the compiler for diferent reasons:

  • Non-existing nodes (ex. cloudceph2001-dev.wikimedia.org)
  • Missing variables (ex. cloudcephosd2001-dev.codfw.wmnet)

See https://puppet-compiler.wmflabs.org/compiler1002/28941/

Event Timeline

dcaro triaged this task as High priority.Apr 8 2021, 10:32 AM
dcaro created this task.

This is working now :), Thanks Jbond! He handled everything actually.

For the missing variables, they were just misplaced:

https://gerrit.wikimedia.org/r/c/labs/private/+/677848

For the non-existing nodes, Jbond handled it too.

Non-existing nodes (ex. cloudceph2001-dev.wikimedia.org)

tl;dr nodes should get purged by running the fact sync script once the node has been purged from puppetdb. however there was a stale directory which had been left around. deleting that file has removed the old entries.

There are a few different things involved to completely purge a node from PCC and it depends a bit on which host variable override you use. the main things at play are:

  1. puppetdb used by compiler-update-facts:
    • the compiler-update-facts script uses the puppetdb API to export a list of active nodes, a node is considered act if it has submitted a report to puppetdb in the last 14days (this is the production value at least)
  2. when you last ran compiler-update-facts
    • compiler-update-facts exports the node data from puppetdb then rsync's it using --delete to /var/lib/catalog-differ/puppet/yaml/ this purging any old nodes
  3. The puppetdb used by the compiler
    • the puppet compiler also has a puppetdb instance which expires nodes after 7days of inactivity, however there are cron jobs to make sure this is fresh which run every nigh. i.e. unless manually purged nodes won't get removed from here until 7days after step 2

Depending on the host variable override you use affects which of theses constraints you will hit.

  • if using an empty Hosts list then PCC calculates the hosts from the site.pp file so it shouldn't have any issues with old nodes (as long as they have been removed from site.pp). however you may need to run compiler-update-facts for new hosts
  • If providing a an explicit list of hosts then again none of this matters but you may need to run compiler-update-facts for new hosts
  • if using the re: selector then pcc scans /var/lib/catalog-differ/puppet/yaml/ looking for hosts matching the regex as such the actions upto step 2 would be required to purge the node
  • if using any of the other selectors then pcc queries puppetdb for matching hosts as such you need to complete all steps to purge the node

Missing variables (ex. cloudcephosd2001-dev.codfw.wmnet)

Thie missing key was in the private repo but in the wrong location

I re-open this and its working now, closing but please re-open if you need more info