Page MenuHomePhabricator

Site: eqiad 1 VM for Matomo
Closed, ResolvedPublic

Description

Cloud VPS Project Tested: n/a
Site/Location: eqiad
Number of systems: 1
Service: Matomo
Networking Requirements: internal
Processor Requirements: 4
Memory: 8 GB
Disks: 40 GB root, 80 GB /var/lib/mysql
Other Requirements: This is a direct replacement for the existing virtual machine - matomo1002 - The older machine will be decommissioned once the new server has been put into service.

Event Timeline

I'll add the second disk after the initial creation by the cookbook. This will be useful to allow us to retain MariaDB data during an in-place reimage.

The Ganeti cluster report looks like it's fairly evenly balanced at the moment.

DRY-RUN: START - Cookbook sre.ganeti.resource-report
+-------+-------+-----------+----------+-----------+---------+-----------+
| Group | Nodes | Instances |  MFree   | MFree avg |  DFree  | DFree avg |
+-------+-------+-----------+----------+-----------+---------+-----------+
|   A   |   8   |     35    | 291.7GiB |  36.5GiB  | 16.6TiB |   2.1TiB  |
|   B   |   7   |     36    | 232.2GiB |  33.2GiB  | 11.9TiB |   1.7TiB  |
|   C   |   8   |     37    | 289.2GiB |  36.1GiB  | 15.6TiB |   1.9TiB  |
|   D   |   7   |     32    | 276.7GiB |  39.5GiB  | 13.1TiB |   1.9TiB  |
+-------+-------+-----------+----------+-----------+---------+-----------+

matomo1002 is currently in cluster group C, so I suppose that if I use this group then it will remain balanced after I decom the old host.

Change #1018270 had a related patch set uploaded (by Btullis; author: Btullis):

[operations/puppet@production] Add puppet7 data for new host matomo1003.

https://gerrit.wikimedia.org/r/1018270

Cookbook cookbooks.sre.hosts.reimage was started by btullis@cumin1002 for host matomo1003.eqiad.wmnet with OS bookworm

Change #1018270 merged by Btullis:

[operations/puppet@production] Add puppet7 data for new host matomo1003.

https://gerrit.wikimedia.org/r/1018270

Cookbook cookbooks.sre.hosts.reimage started by btullis@cumin1002 for host matomo1003.eqiad.wmnet with OS bookworm executed with errors:

  • matomo1003 (FAIL)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via gnt-instance
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Set boot media to disk
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • The reimage failed, see the cookbook logs for the details,You can also try typing "install-console" matomo1003.eqiad.wmnet to get a root shellbut depending on the failure this may not work.

Cookbook cookbooks.sre.hosts.reimage was started by btullis@cumin1002 for host matomo1003.eqiad.wmnet with OS bookworm

Cookbook cookbooks.sre.hosts.reimage started by btullis@cumin1002 for host matomo1003.eqiad.wmnet with OS bookworm completed:

  • matomo1003 (WARN)
    • Downtimed on Icinga/Alertmanager
    • Unable to disable Puppet, the host may have been unreachable
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via gnt-instance
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Set boot media to disk
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202404091545_btullis_1610211_matomo1003.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

I'm adding the second disk now.

btullis@ganeti1027:~$ sudo gnt-instance modify --disk add:size=80g matomo1003.eqiad.wmnet
Thu Apr 11 09:07:30 2024  - INFO: Waiting for instance matomo1003.eqiad.wmnet to sync disks
Thu Apr 11 09:07:30 2024  - INFO: - device disk/1:  0.10% done, 1h 7m 12s remaining (estimated)
Thu Apr 11 09:08:31 2024  - INFO: - device disk/1:  2.80% done, 35m 15s remaining (estimated)

I think that I will mount this as /srv and try to make the mariadb configuration more like one of our standard server setup.