Page MenuHomePhabricator

Deploy wmfmariadbpy 0.7
Closed, ResolvedPublic

Description

A new release has been created: https://gerrit.wikimedia.org/g/operations/software/wmfmariadbpy/+/refs/tags/v0.7
The packages are built, and uploaded to apt.wm.o.

This release needs to be coordinated with a puppet change: https://gerrit.wikimedia.org/r/c/operations/puppet/+/665324

Prep:

  • Log that the upgrade is starting: !log Deploying wmfmariadbpy 0.7 T283228
  • Disable puppet on all db primaries: sudo cumin A:db-role-master 'disable-puppet "Deploying wmfmariadbpy 0.7 - kormat - T283228"'
  • Merge puppet CR

First primary:

  • Upgrade wmfmariadbpy packages on db2142 (x2 primary in codfw): sudo apt install wmfmariadbpy-common
  • Run puppet manually (puppet will not start the systemd service for pt-heartbeat itself, by design, so this should be safe)
  • pkill -f pt-heartbeat-wikimedia && systemctl start pt-heartbeat-wikimedia
  • Check the heartbeat table to ensure that writes are happening.
  • Check that sudo -u nagios db-check-health succeeds.

cumin2001:

  • Upgrade wmfmariadbpy packages on cumin2001: sudo apt install wmfmariadbpy-common
  • Check that mysql.py and db-replication-tree work correctly.

At this point there should be a clear go/no go to continue the deployment. If everything looks good:

  • Use debdeploy to upgrade packages on all db primaries: sudo debdeploy deploy -u $SPECFILE -s db-role-master
  • Run puppet on all db primaries: sudo cumin A:db-role-master 'run-puppet-agent -e "Deploying wmfmariadbpy 0.7 - kormat - T283228"'

For each primary (in both DCs):

ps -wwwf `pgrep -f pt-heartbeat-wikimedia`; pkill -f pt-heartbeat-wikimedia && systemctl start pt-heartbeat-wikimedia; systemctl status pt-heartbeat-wikimedia
mysql heartbeat -e 'select * from heartbeat'

Everywhere else:

  • Use debdeploy to upgrade packages on all other machines that use them.

Event Timeline

Kormat moved this task from Triage to In progress on the DBA board.

pt-heartbeat-wikimedia fails to start on db2093 with:

DBD::mysql::st execute failed: Cannot execute statement: impossible to write to binary log since BINLOG_FORMAT = STATEMENT and at least one table uses a storage engine limited to row-based logging. InnoDB is limited to row-logging when transaction isolation level is READ COMMITTED or READ UNCOMMITTED.

This is due to the unusual config of the dbinventory section.

Heartbeat restarted on all primaries.

pt-heartbeat-wikimedia fails to start on db2093 with:

https://gerrit.wikimedia.org/r/c/operations/puppet/+/693162 merged to fix this for both dbinventory hosts.

Kormat updated the task description. (Show Details)

Deployment complete:

kormat@cumin1001:~(0:0)$ sudo debdeploy deploy -u 2021-05-20-wmfmariadbpy.yaml -Q C:wmfmariadbpy
Rolling out wmfmariadbpy:
Non-daemon update, no service restart needed

The update spec doesn't apply to the OS of the following hosts:
  cumin2002.codfw.wmnet (1 hosts)

These hosts are already up-to-date:
  alert[1001,2001].wikimedia.org,an-coord[1001-1002].eqiad.wmnet,an-
test-coord1001.eqiad.wmnet,clouddb[1013-1021].eqiad.wmnet,cloudservice
s[2002-2003]-dev.wikimedia.org,cloudservices[1003-1004].wikimedia.org,
cumin2001.codfw.wmnet,cumin1001.eqiad.wmnet,db[2071-2150,2152].codfw.w
mnet,db[1087,1096,1098-1124,1126-1175,1177-1184].eqiad.wmnet,dbprov[20
01-2003].codfw.wmnet,dbprov[1001-1003].eqiad.wmnet,dbstore[1003-1005].
eqiad.wmnet,es[2020-2034].codfw.wmnet,es[1020-1034].eqiad.wmnet,matomo
1002.eqiad.wmnet,pc[2007-2010].codfw.wmnet,pc[1007-1010].eqiad.wmnet
(236 hosts)

For posterity, here's the script i used for the heartbeat changes:

1#!/bin/bash
2fqdn="${1:?}"
3
4sudo -H SSH_AUTH_SOCK=/run/keyholder/proxy.sock ssh -t -o ConnectTimeout=3 root@$fqdn 'set -ex;
5ps -wwwf `pgrep -P 1 -f pt-heartbeat-wikimedia`;
6systemctl is-active pt-heartbeat-wikimedia && exit 0
7pkill -P 1 -f pt-heartbeat-wikimedia;
8systemctl start pt-heartbeat-wikimedia;
9sleep 2;
10systemctl is-active pt-heartbeat-wikimedia'
11sudo -H mysql.py -h $fqdn heartbeat -e 'select TIMEDIFF(UTC_TIMESTAMP(6), ts) from heartbeat'