Page MenuHomePhabricator

update systems to use new puppetdb instance
Closed, ResolvedPublic

Description

Now that the puppetdb 7 is being populated we should move the services which use puppetdb to the new instance. specifically

  • cumin
  • spicerack
    • tested sre.host.reimage
  • netbox
  • puppetboard

Details

SubjectRepoBranchLines +/-
operations/puppetproduction+12 -19
operations/puppetproduction+15 -27
operations/puppetproduction+7 -0
operations/puppetproduction+6 -14
operations/puppetproduction+5 -0
operations/puppetproduction+2 -2
operations/puppetproduction+4 -4
operations/puppetproduction+4 -2
operations/cookbooksmaster+1 -1
operations/puppetproduction+27 -27
operations/cookbooksmaster+1 -0
operations/puppetproduction+1 -0
operations/puppetproduction+1 -1
operations/puppetproduction+0 -3
operations/puppetproduction+9 -2
operations/puppetproduction+1 -1
operations/dnsmaster+2 -0
operations/puppetproduction+24 -1
operations/puppetproduction+2 -0
operations/puppetproduction+3 -0
operations/puppetproduction+1 -0
operations/puppetproduction+4 -5
Show related patches Customize query in gerrit

Related Objects

StatusSubtypeAssignedTask
OpenNone
Resolvedjbond

Event Timeline

jbond changed the task status from Open to In Progress.Jul 19 2023, 10:29 AM
jbond triaged this task as Medium priority.
jbond created this task.

Change 939655 had a related patch set uploaded (by Jbond; author: jbond):

[operations/puppet@production] services: swap puppetboard and puppetboard-next

https://gerrit.wikimedia.org/r/939655

Change 939655 merged by Jbond:

[operations/puppet@production] services: swap puppetboard and puppetboard-next

https://gerrit.wikimedia.org/r/939655

Change 939675 had a related patch set uploaded (by Jbond; author: jbond):

[operations/puppet@production] puppetboard: add -next domain to tls certs

https://gerrit.wikimedia.org/r/939675

Change 939675 merged by Jbond:

[operations/puppet@production] puppetboard: add -next domain to tls certs

https://gerrit.wikimedia.org/r/939675

Change 939678 had a related patch set uploaded (by Jbond; author: jbond):

[operations/dns@master] puppetdb-api-next: add new discovery record for testing puppetdb-api

https://gerrit.wikimedia.org/r/939678

Change 939679 had a related patch set uploaded (by Jbond; author: jbond):

[operations/puppet@production] puppetdb-api-next: Add new puppetdb-api discovery record

https://gerrit.wikimedia.org/r/939679

Change 939685 had a related patch set uploaded (by Jbond; author: jbond):

[operations/puppet@production] puppetdb: Set X-Client headers

https://gerrit.wikimedia.org/r/939685

Change 939685 merged by Jbond:

[operations/puppet@production] puppetdb: Set X-Client headers

https://gerrit.wikimedia.org/r/939685

Change 939689 had a related patch set uploaded (by Jbond; author: jbond):

[operations/puppet@production] puppetdb: add allow-header-cert-info: true to auth.conf

https://gerrit.wikimedia.org/r/939689

Change 939689 merged by Jbond:

[operations/puppet@production] puppetdb: add allow-header-cert-info: true to auth.conf

https://gerrit.wikimedia.org/r/939689

Change 939679 merged by Jbond:

[operations/puppet@production] puppetdb-api-next: Add new puppetdb-api discovery record

https://gerrit.wikimedia.org/r/939679

Change 939678 merged by Jbond:

[operations/dns@master] puppetdb-api-next: add new discovery record for testing puppetdb-api

https://gerrit.wikimedia.org/r/939678

Change 939698 had a related patch set uploaded (by Jbond; author: jbond):

[operations/puppet@production] puppetdb::microservice: add -next domain

https://gerrit.wikimedia.org/r/939698

Change 939698 merged by Jbond:

[operations/puppet@production] puppetdb::microservice: add -next domain

https://gerrit.wikimedia.org/r/939698

Change 939706 had a related patch set uploaded (by Jbond; author: jbond):

[operations/puppet@production] netbox: make the puppetdb microservic domain configurable

https://gerrit.wikimedia.org/r/939706

Change 939709 had a related patch set uploaded (by Jbond; author: jbond):

[operations/puppet@production] netbox: drop pupetdb_host as its not used

https://gerrit.wikimedia.org/r/939709

Change 939706 merged by Jbond:

[operations/puppet@production] netbox: make the puppetdb microservic domain configurable

https://gerrit.wikimedia.org/r/939706

Change 939709 merged by Jbond:

[operations/puppet@production] netbox: drop pupetdb_host as its not used

https://gerrit.wikimedia.org/r/939709

Change 939710 had a related patch set uploaded (by Jbond; author: jbond):

[operations/puppet@production] netbox::standalone: switch to using new puppetdb api

https://gerrit.wikimedia.org/r/939710

Change 939712 had a related patch set uploaded (by Jbond; author: jbond):

[operations/puppet@production] netbox: actully use puppetdb_microservice_fqdn

https://gerrit.wikimedia.org/r/939712

Change 939712 merged by Jbond:

[operations/puppet@production] netbox: actully use puppetdb_microservice_fqdn

https://gerrit.wikimedia.org/r/939712

Change 939710 merged by Jbond:

[operations/puppet@production] netbox::standalone: switch to using new puppetdb api

https://gerrit.wikimedia.org/r/939710

for puppetdb-api. i have updated netbox-next and tested the following:

Reports

scripts

Change 939725 had a related patch set uploaded (by Jbond; author: jbond):

[operations/cookbooks@master] sre.discovery.datacenter: exclude puppetdb-api-next

https://gerrit.wikimedia.org/r/939725

Change 939726 had a related patch set uploaded (by Jbond; author: jbond):

[operations/cookbooks@master] DO NOT MERGE: Change to test new puppetdb-api-next

https://gerrit.wikimedia.org/r/939726

Cookbook cookbooks.sre.hosts.reimage was started by jbond@cumin1001 for host sretest1002.eqiad.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by jbond@cumin1001 for host sretest1002.eqiad.wmnet with OS bullseye executed with errors:

  • sretest1002 (FAIL)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • The reimage failed, see the cookbook logs for the details

Change 939725 merged by jenkins-bot:

[operations/cookbooks@master] sre.discovery.datacenter: exclude puppetdb-api-next

https://gerrit.wikimedia.org/r/939725

Change 939741 had a related patch set uploaded (by Jbond; author: jbond):

[operations/puppet@production] proifile::puppetdb::microservice: add allowed_roles

https://gerrit.wikimedia.org/r/939741

Change 939741 merged by Jbond:

[operations/puppet@production] proifile::puppetdb::microservice: add allowed_roles

https://gerrit.wikimedia.org/r/939741

Cookbook cookbooks.sre.hosts.reimage was started by jbond@cumin1001 for host sretest1002.eqiad.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by jbond@cumin1001 for host sretest1002.eqiad.wmnet with OS bullseye executed with errors:

  • sretest1002 (FAIL)
    • Downtimed on Icinga/Alertmanager
    • Unable to disable Puppet, the host may have been unreachable
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • The reimage failed, see the cookbook logs for the details

Cookbook cookbooks.sre.hosts.reimage was started by jbond@cumin1001 for host sretest1002.eqiad.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by jbond@cumin1001 for host sretest1002.eqiad.wmnet with OS bullseye executed with errors:

  • sretest1002 (FAIL)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • The reimage failed, see the cookbook logs for the details

Cookbook cookbooks.sre.hosts.reimage was started by jbond@cumin1001 for host sretest1002.eqiad.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by jbond@cumin1001 for host sretest1002.eqiad.wmnet with OS bullseye executed with errors:

  • sretest1002 (FAIL)
    • Downtimed on Icinga/Alertmanager
    • Unable to disable Puppet, the host may have been unreachable
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • The reimage failed, see the cookbook logs for the details

Cookbook cookbooks.sre.hosts.reimage was started by jbond@cumin1001 for host sretest1002.eqiad.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by jbond@cumin1001 for host sretest1002.eqiad.wmnet with OS bullseye executed with errors:

  • sretest1002 (FAIL)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • The reimage failed, see the cookbook logs for the details

Cookbook cookbooks.sre.hosts.reimage was started by jbond@cumin1001 for host sretest1002.eqiad.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by jbond@cumin1001 for host sretest1002.eqiad.wmnet with OS bullseye executed with errors:

  • sretest1002 (FAIL)
    • Downtimed on Icinga/Alertmanager
    • Unable to disable Puppet, the host may have been unreachable
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • The reimage failed, see the cookbook logs for the details

Cookbook cookbooks.sre.hosts.reimage was started by jbond@cumin1001 for host sretest1002.eqiad.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by jbond@cumin1001 for host sretest1002.eqiad.wmnet with OS bullseye executed with errors:

  • sretest1002 (FAIL)
    • Downtimed on Icinga/Alertmanager
    • Unable to disable Puppet, the host may have been unreachable
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • The reimage failed, see the cookbook logs for the details

Cookbook cookbooks.sre.hosts.reimage was started by jbond@cumin1001 for host sretest1002.eqiad.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by jbond@cumin1001 for host sretest1002.eqiad.wmnet with OS bullseye completed:

  • sretest1002 (WARN)
    • Downtimed on Icinga/Alertmanager
    • Unable to disable Puppet, the host may have been unreachable
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202307201605_jbond_2598166_sretest1002.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage was started by jbond@cumin1001 for host sretest1002.eqiad.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by jbond@cumin1001 for host sretest1002.eqiad.wmnet with OS bullseye executed with errors:

  • sretest1002 (FAIL)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • The reimage failed, see the cookbook logs for the details

Change 939726 abandoned by Jbond:

[operations/cookbooks@master] DO NOT MERGE: Change to test new puppetdb-api-next

Reason:

abanoned, confirmed working

https://gerrit.wikimedia.org/r/939726

Change 939726 restored by Jbond:

[operations/cookbooks@master] DO NOT MERGE: Change to test new puppetdb-api-next

https://gerrit.wikimedia.org/r/939726

Cookbook cookbooks.sre.hosts.reimage was started by jbond@cumin1001 for host sretest1002.eqiad.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by jbond@cumin1001 for host sretest1002.eqiad.wmnet with OS bullseye completed:

  • sretest1002 (WARN)
    • Downtimed on Icinga/Alertmanager
    • Unable to disable Puppet, the host may have been unreachable
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202307201725_jbond_2620913_sretest1002.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Change 940384 had a related patch set uploaded (by Jbond; author: jbond):

[operations/puppet@production] puppetdb-api: swap the production and next environments

https://gerrit.wikimedia.org/r/940384

Change 940396 had a related patch set uploaded (by Jbond; author: jbond):

[operations/puppet@production] cumin::unprivmaster: Test using puppetdbapi-next

https://gerrit.wikimedia.org/r/940396

Change 940396 merged by Jbond:

[operations/puppet@production] cumin::unprivmaster: Test using puppetdbapi-next

https://gerrit.wikimedia.org/r/940396

Change 940384 merged by Jbond:

[operations/puppet@production] puppetdb-api: swap the production and next environments

https://gerrit.wikimedia.org/r/940384

jbond claimed this task.
jbond updated the task description. (Show Details)

This is now in place

Change 954622 had a related patch set uploaded (by Jbond; author: jbond):

[operations/puppet@production] puppetmaster: update to use new puppetdb servers

https://gerrit.wikimedia.org/r/954622

Change 954647 had a related patch set uploaded (by Jbond; author: jbond):

[operations/puppet@production] puppetdb-api: switch dev sevices back to puppetdb-api

https://gerrit.wikimedia.org/r/954647

Change 954669 had a related patch set uploaded (by Jbond; author: jbond):

[operations/puppet@production] puppetmaster: add parameter to change the port that puppetdb runs

https://gerrit.wikimedia.org/r/954669

Change 954647 merged by Jbond:

[operations/puppet@production] puppetdb-api: switch dev sevices back to puppetdb-api

https://gerrit.wikimedia.org/r/954647

Change 954669 merged by Jbond:

[operations/puppet@production] puppetmaster: add parameter to change the port that puppetdb runs

https://gerrit.wikimedia.org/r/954669

Change 954622 merged by Jbond:

[operations/puppet@production] puppetmaster: update to use new puppetdb servers

https://gerrit.wikimedia.org/r/954622

Change 955303 had a related patch set uploaded (by Jbond; author: jbond):

[operations/puppet@production] puppetdb: add a motd to inform users that theses serveres are no longer live

https://gerrit.wikimedia.org/r/955303

Change 955303 merged by Jbond:

[operations/puppet@production] puppetdb: add a motd to inform users that theses serveres are no longer live

https://gerrit.wikimedia.org/r/955303

Icinga downtime and Alertmanager silence (ID=90305a26-47b2-42a2-abe5-284f8035bf3b) set by jmm@cumin2002 for 3 days, 0:00:00 on 1 host(s) and their services with reason: Disable puppetdb/postgres on old nodes to ensure nothing hits them anyway

puppetdb1002.eqiad.wmnet

Icinga downtime and Alertmanager silence (ID=2af641c9-48a3-42b7-8c75-56c12506718a) set by jmm@cumin2002 for 3 days, 0:00:00 on 1 host(s) and their services with reason: Disable puppetdb/postgres/nginx on old nodes to ensure nothing hits them anyway

puppetdb2002.codfw.wmnet

Icinga downtime and Alertmanager silence (ID=11ec6d55-6d8f-4537-a398-4863d7f38c9c) set by jmm@cumin2002 for 5 days, 0:00:00 on 1 host(s) and their services with reason: Disable puppetdb/postgres/nginx on old nodes to ensure nothing hits them anyway

puppetdb2002.codfw.wmnet

Icinga downtime and Alertmanager silence (ID=708cd0d4-307e-4f35-acfa-ddae4ae88236) set by jmm@cumin2002 for 5 days, 0:00:00 on 1 host(s) and their services with reason: Disable puppetdb/postgres/nginx on old nodes to ensure nothing hits them anyway

puppetdb1002.eqiad.wmnet

Change 959696 had a related patch set uploaded (by Muehlenhoff; author: Muehlenhoff):

[operations/puppet@production] os-reports: Stop configuring a puppetdb server and switch to discovery record

https://gerrit.wikimedia.org/r/959696

Change 960015 had a related patch set uploaded (by Muehlenhoff; author: Muehlenhoff):

[operations/puppet@production] Move os-reports to the puppetdb host(s)

https://gerrit.wikimedia.org/r/960015

Change 960015 merged by Muehlenhoff:

[operations/puppet@production] Move os-reports to the puppetdb host(s)

https://gerrit.wikimedia.org/r/960015

Change 959696 abandoned by Muehlenhoff:

[operations/puppet@production] os-reports: Stop configuring a puppetdb server and switch to discovery record

Reason:

Obsoleted by https://gerrit.wikimedia.org/r/c/operations/puppet/+/960015

https://gerrit.wikimedia.org/r/959696

Icinga downtime and Alertmanager silence (ID=73525cca-1535-4d44-89d8-fcd584ea67a9) set by jmm@cumin2002 for 5 days, 0:00:00 on 1 host(s) and their services with reason: Disable puppetdb/postgres/nginx on old nodes to ensure nothing hits them anyway

puppetdb1002.eqiad.wmnet

Icinga downtime and Alertmanager silence (ID=69921077-8a56-48de-9905-0d3d1b91d292) set by jmm@cumin2002 for 5 days, 0:00:00 on 1 host(s) and their services with reason: Disable puppetdb/postgres/nginx on old nodes to ensure nothing hits them anyway

puppetdb2002.codfw.wmnet