Upgrade Grafana Instances to Debian Bookworm
Overview
This task tracks the upgrade and details the upgrade steps of our Grafana instances to Debian Bookworm.
- Active Host: grafana1002
- Standby Host: grafana2001
Package Upgrade Requirements
The following table lists the Grafana-related packages to be upgraded, including their current installed versions and the target versions available upstream:
Package | Installed Version | Upstream Version | Compatibility |
---|---|---|---|
grafana | v9.4.14 | v10.2.3 | Yes |
grafana-loki | v2.5.0 | v2.9.3 | Yes |
grafana-plugins | v0.6 | NA | Yes |
grizzly | v0.1.0 | v0.3.0 | Yes |
1. Prerequisites
- Set up a Bookworm host in Pontoon.
- Confirm the Puppet catalog compiles without errors and all packages are available.
- Validate general functionality of Grafana services.
2. Upgrade steps:
- On cumin2002:
- Reimage:
- $ sudo cookbook sre.hosts.reimage --os bookworm -t T352665 grafana2001
- Verify services:
- $ sudo cumin 'grafana2001*' 'systemctl is-active grafana-server'
- $ sudo cumin 'grafana2001*' 'systemctl is-active grafana-loki'
- Reimage:
2.2 Failover to grafana2001
- On cumin2002, stop services:
- $ sudo cumin 'grafana2001*' 'systemctl stop grafana-server'
- $ sudo cumin 'grafana2001*' 'systemctl stop grafana-loki'
- Sync data from active to passive host.
- $ sudo cumin 'grafana2001*' 'sudo systemctl start rsync-var-lib-grafana'
- $ sudo cumin 'grafana2001*' 'sudo systemctl start rsync-loki-data'
- On cumin2002, stop services:
- $ sudo cumin 'grafana2001*' 'systemctl start grafana-server'
- $ sudo cumin 'grafana2001*' 'systemctl start grafana-loki'
- Merge patches for failover.
- grafana: Failover from grafana1002 to grafana2001 (Change 992710).
- grafana: Ensure user traffic goes to grafana2001 (Change 992719).
- Run puppet on the Grafana hosts and verify service status:
- Run Puppet:
- $ sudo cumin 'A:grafana' 'run-puppet-agent'
- $ sudo cumin 'A:cp' 'run-puppet-agent'
- Verify services:
- $ sudo cumin 'A:grafana' 'systemctl is-active grafana-server'
- $ sudo cumin 'A:grafana' 'systemctl is-active grafana-loki'
- Run Puppet:
- Access Grafana via web browser to confirm functionality.
2.3 Reimage Standby Host (grafana1002)
- Merge the following patch:
- grafana: Create the grafana sysuser with a reserved UID/GID (Change 990795).
- On cumin2002:
- Reimage:
- $ sudo cookbook sre.hosts.reimage --os bookworm -t T352665 grafana1002
- Verify services:
- $ sudo cumin 'grafana1002*' 'systemctl is-active grafana-server'
- $ sudo cumin 'grafana1002*' 'systemctl is-active grafana-loki'
- Reimage:
2.4 Failover Back to grafana1002
- On cumin2002, stop services:
- $ sudo cumin 'grafana1002*' 'systemctl stop grafana-server'
- $ sudo cumin 'grafana1002*' 'systemctl stop grafana-loki'
- Sync data from active to passive host.
- $ sudo cumin 'grafana1002*' 'sudo systemctl start rsync-var-lib-grafana'
- $ sudo cumin 'grafana1002*' 'sudo systemctl start rsync-loki-data'
- On cumin2002, start services:
- $ sudo cumin 'grafana1002*' 'systemctl start grafana-server'
- $ sudo cumin 'grafana1002*' 'systemctl start grafana-loki'
- Merge patches for failover.
- Revert grafana: Failover from grafana2001 to grafana1002 (Change 992710).
- Revert grafana: Ensure user traffic goes to grafana1002 (Change 992719).
- Revert hieradata: move grafana-next from codfw to eqiad ([Change 1002569]https://gerrit.wikimedia.org/r/c/operations/puppet/+/1002569)
- Run puppet on the Grafana hosts and verify service status:
- Run Puppet:
- $ sudo cumin 'A:grafana' 'run-puppet-agent'
- $ sudo cumin 'A:cp' 'run-puppet-agent'
- Verify services:
- $ sudo cumin 'A:grafana' 'systemctl is-active grafana-server'
- $ sudo cumin 'A:grafana' 'systemctl is-active grafana-loki'
- Run Puppet:
- Access Grafana via web browser to confirm functionality.
3. Post-Upgrade Actions:
- Document failover procedure on Wikitech. Failing over from the active to the passive host
- Re-enable stunnel for data migration.
- Upgrade grafana-loki to the latest version.
- Upgrade grafana to the latest version.
- Upgrade grizzly to the latest version.
4. Additional Notes
- Compatibility confirmed for all required packages on Debian Bookworm.
- Issue with rsync-var-lib-grafana.service daemon failing due to SSL chain verification on standby host.
- Reported packaging issue to upstream with proposed patch for the Debian package to respect GRAFANA_HOME variable.
- Observed grafana-loki.service failure on grafana2001: T357026.