Page MenuHomePhabricator

eqiad: 1 VM requested for karapace in support of datahub in staging
Closed, ResolvedPublic

Description

Site/Location: eqiad
Number of systems: 1
Service: datahub in staging
Networking Requirements: internal IP
Processor Requirements: 2 VCPUs
Memory: 2 GB of RAM
Disks: 20 GB disk
Other Requirements:

We have discovered that the production and staging deployments of datahub share an instance of karapace, which is proving problematic.
Having this additional machine will help to separate the deployments.

Related Objects

Event Timeline

BTullis created this task.

Cookbook cookbooks.sre.hosts.reimage was started by btullis@cumin1001 for host karapace1002.eqiad.wmnet with OS bullseye

Change 936706 had a related patch set uploaded (by Btullis; author: Btullis):

[operations/puppet@production] Add a second karapace VM

https://gerrit.wikimedia.org/r/936706

Change 936706 merged by Btullis:

[operations/puppet@production] Add a second karapace VM

https://gerrit.wikimedia.org/r/936706

Cookbook cookbooks.sre.hosts.reimage started by btullis@cumin1001 for host karapace1002.eqiad.wmnet with OS bullseye completed:

  • karapace1002 (PASS)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via gnt-instance
    • Host up (Debian installer)
    • Set boot media to disk
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202307101339_btullis_4053396_karapace1002.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
BTullis moved this task from In Progress to Needs Reporting on the Data-Platform-SRE board.