Page MenuHomePhabricator

Deploy wmfmariadbpy 0.7
Closed, ResolvedPublic

Description

A new release has been created: https://gerrit.wikimedia.org/g/operations/software/wmfmariadbpy/+/refs/tags/v0.7
The packages are built, and uploaded to apt.wm.o.

This release needs to be coordinated with a puppet change: https://gerrit.wikimedia.org/r/c/operations/puppet/+/665324

Prep:

  • Log that the upgrade is starting: !log Deploying wmfmariadbpy 0.7 T283228
  • Disable puppet on all db primaries: sudo cumin A:db-role-master 'disable-puppet "Deploying wmfmariadbpy 0.7 - kormat - T283228"'
  • Merge puppet CR

First primary:

  • Upgrade wmfmariadbpy packages on db2142 (x2 primary in codfw): sudo apt install wmfmariadbpy-common
  • Run puppet manually (puppet will not start the systemd service for pt-heartbeat itself, by design, so this should be safe)
  • pkill -f pt-heartbeat-wikimedia && systemctl start pt-heartbeat-wikimedia
  • Check the heartbeat table to ensure that writes are happening.
  • Check that sudo -u nagios db-check-health succeeds.

cumin2001:

  • Upgrade wmfmariadbpy packages on cumin2001: sudo apt install wmfmariadbpy-common
  • Check that mysql.py and db-replication-tree work correctly.

At this point there should be a clear go/no go to continue the deployment. If everything looks good:

  • Use debdeploy to upgrade packages on all db primaries: sudo debdeploy deploy -u $SPECFILE -s db-role-master
  • Run puppet on all db primaries: sudo cumin A:db-role-master 'run-puppet-agent -e "Deploying wmfmariadbpy 0.7 - kormat - T283228"'

For each primary (in both DCs):

ps -wwwf `pgrep -f pt-heartbeat-wikimedia`; pkill -f pt-heartbeat-wikimedia && systemctl start pt-heartbeat-wikimedia; systemctl status pt-heartbeat-wikimedia
mysql heartbeat -e 'select * from heartbeat'

Everywhere else:

  • Use debdeploy to upgrade packages on all other machines that use them.

Event Timeline

Kormat updated the task description. (Show Details)
Kormat moved this task from Triage to In progress on the DBA board.
Kormat triaged this task as Medium priority.May 20 2021, 12:44 PM

pt-heartbeat-wikimedia fails to start on db2093 with:

DBD::mysql::st execute failed: Cannot execute statement: impossible to write to binary log since BINLOG_FORMAT = STATEMENT and at least one table uses a storage engine limited to row-based logging. InnoDB is limited to row-logging when transaction isolation level is READ COMMITTED or READ UNCOMMITTED.

This is due to the unusual config of the dbinventory section.

Kormat updated the task description. (Show Details)

Heartbeat restarted on all primaries.

pt-heartbeat-wikimedia fails to start on db2093 with:

https://gerrit.wikimedia.org/r/c/operations/puppet/+/693162 merged to fix this for both dbinventory hosts.

Kormat updated the task description. (Show Details)

Deployment complete:

kormat@cumin1001:~(0:0)$ sudo debdeploy deploy -u 2021-05-20-wmfmariadbpy.yaml -Q C:wmfmariadbpy
Rolling out wmfmariadbpy:
Non-daemon update, no service restart needed

The update spec doesn't apply to the OS of the following hosts:
  cumin2002.codfw.wmnet (1 hosts)

These hosts are already up-to-date:
  alert[1001,2001].wikimedia.org,an-coord[1001-1002].eqiad.wmnet,an-
test-coord1001.eqiad.wmnet,clouddb[1013-1021].eqiad.wmnet,cloudservice
s[2002-2003]-dev.wikimedia.org,cloudservices[1003-1004].wikimedia.org,
cumin2001.codfw.wmnet,cumin1001.eqiad.wmnet,db[2071-2150,2152].codfw.w
mnet,db[1087,1096,1098-1124,1126-1175,1177-1184].eqiad.wmnet,dbprov[20
01-2003].codfw.wmnet,dbprov[1001-1003].eqiad.wmnet,dbstore[1003-1005].
eqiad.wmnet,es[2020-2034].codfw.wmnet,es[1020-1034].eqiad.wmnet,matomo
1002.eqiad.wmnet,pc[2007-2010].codfw.wmnet,pc[1007-1010].eqiad.wmnet
(236 hosts)

For posterity, here's the script i used for the heartbeat changes:

1#!/bin/bash
2fqdn="${1:?}"
3
4sudo -H SSH_AUTH_SOCK=/run/keyholder/proxy.sock ssh -t -o ConnectTimeout=3 root@$fqdn 'set -ex;
5ps -wwwf `pgrep -P 1 -f pt-heartbeat-wikimedia`;
6systemctl is-active pt-heartbeat-wikimedia && exit 0
7pkill -P 1 -f pt-heartbeat-wikimedia;
8systemctl start pt-heartbeat-wikimedia;
9sleep 2;
10systemctl is-active pt-heartbeat-wikimedia'
11sudo -H mysql.py -h $fqdn heartbeat -e 'select TIMEDIFF(UTC_TIMESTAMP(6), ts) from heartbeat'