Page MenuHomePhabricator

Productionize db12[26-49]
Open, MediumPublic

Description

These are new hosts:

  • db1226
  • db1227
  • db1228
  • db1229
  • db1230
  • db1231
  • db1232
  • db1233 - Let's place it on s2 (it won't replace db1133)
  • db1234
  • db1235
  • db1236
  • db1237 - next (x1)
  • db1238
  • db1239
  • db1240
  • db1241 - done
  • db1242 - done
  • db1243 - repooling
  • db1244
  • db1245
  • db1246
  • db1247 - WIP
  • db1248 - Next
  • db1249 - Next

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Mentioned in SAL (#wikimedia-operations) [2023-11-06T09:56:21Z] <arnaudb@cumin1001> START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1136.eqiad.wmnet with reason: provisionning db1236.eqiad.wmnet - T344036

Icinga downtime and Alertmanager silence (ID=fcaea851-63aa-4d0a-b132-b9abebd052fb) set by arnaudb@cumin1001 for 1 day, 0:00:00 on 1 host(s) and their services with reason: provisionning db1236.eqiad.wmnet - T344036

db1136.eqiad.wmnet

Mentioned in SAL (#wikimedia-operations) [2023-11-06T09:56:36Z] <arnaudb@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1136.eqiad.wmnet with reason: provisionning db1236.eqiad.wmnet - T344036

Mentioned in SAL (#wikimedia-operations) [2023-11-06T09:56:39Z] <arnaudb@cumin1001> START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1236.eqiad.wmnet with reason: provisionning db1236.eqiad.wmnet - T344036

Icinga downtime and Alertmanager silence (ID=260c9e59-c75a-4e00-921f-9d52edf8563e) set by arnaudb@cumin1001 for 1 day, 0:00:00 on 1 host(s) and their services with reason: provisionning db1236.eqiad.wmnet - T344036

db1236.eqiad.wmnet

Mentioned in SAL (#wikimedia-operations) [2023-11-06T09:57:04Z] <arnaudb@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1236.eqiad.wmnet with reason: provisionning db1236.eqiad.wmnet - T344036

Mentioned in SAL (#wikimedia-operations) [2023-11-06T10:02:14Z] <arnaudb@cumin1001> dbctl commit (dc=all): 'Cloning db1136 in db1236 for T344036', diff saved to https://phabricator.wikimedia.org/P53139 and previous config saved to /var/cache/conftool/dbconfig/20231106-100213-arnaudb.json

Mentioned in SAL (#wikimedia-operations) [2023-11-06T10:06:26Z] <arnaudb@cumin1001> dbctl commit (dc=all): 'Cloning db1136 in db1236 for T344036', diff saved to https://phabricator.wikimedia.org/P53140 and previous config saved to /var/cache/conftool/dbconfig/20231106-100625-arnaudb.json

@jcrespo what are your plans with db1133?
It was supposed to be one of the backup testing hosts (it current lives in test-s4 as I was using it to test stuff).

Do you think we have to replace it with db1233 and give it the same role? Or can we use db1233 as production? Either solution is fine by me, just wanted to make sure we are on the same page.

It can be used for production for now, temporarily, but I will eventually need it for testing backups. Sadly, testing is not in the top of priorities, but the idea was to finally use it (or a replacement) for that this fiscal.

db1126 is no longer a candidate master so its replacement, db1226 can be done anytime now.

This comment was removed by Marostegui.

Change 972507 had a related patch set uploaded (by Arnaudb; author: Arnaudb):

[operations/puppet@production] mariadb: add db1238 and prepare db1138 retirement

https://gerrit.wikimedia.org/r/972507

Change 972511 had a related patch set uploaded (by Arnaudb; author: Arnaudb):

[operations/puppet@production] mariadb: re-enable notifications for db1236

https://gerrit.wikimedia.org/r/972511

Change 972511 merged by Arnaudb:

[operations/puppet@production] mariadb: re-enable notifications for db1236

https://gerrit.wikimedia.org/r/972511

Change 972512 had a related patch set uploaded (by Arnaudb; author: Arnaudb):

[operations/puppet@production] mariadb: removing master candidacy info for db1136

https://gerrit.wikimedia.org/r/972512

Change 972512 merged by Arnaudb:

[operations/puppet@production] mariadb: removing master candidacy info for db1136

https://gerrit.wikimedia.org/r/972512

Change 972507 merged by Arnaudb:

[operations/puppet@production] mariadb: add db1238 and prepare db1138 retirement

https://gerrit.wikimedia.org/r/972507

Mentioned in SAL (#wikimedia-operations) [2023-11-14T09:53:23Z] <arnaudb@cumin1001> START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1138.eqiad.wmnet with reason: provisionning db1238.eqiad.wmnet - T344036

Icinga downtime and Alertmanager silence (ID=534a8dbf-cdae-4315-9aea-b8042e037796) set by arnaudb@cumin1001 for 1 day, 0:00:00 on 1 host(s) and their services with reason: provisionning db1238.eqiad.wmnet - T344036

db1138.eqiad.wmnet

Mentioned in SAL (#wikimedia-operations) [2023-11-14T09:53:37Z] <arnaudb@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1138.eqiad.wmnet with reason: provisionning db1238.eqiad.wmnet - T344036

Mentioned in SAL (#wikimedia-operations) [2023-11-14T09:53:40Z] <arnaudb@cumin1001> START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1238.eqiad.wmnet with reason: provisionning db1238.eqiad.wmnet - T344036

Icinga downtime and Alertmanager silence (ID=858abb8b-0596-4c57-bb2a-8b5fd897cbfc) set by arnaudb@cumin1001 for 1 day, 0:00:00 on 1 host(s) and their services with reason: provisionning db1238.eqiad.wmnet - T344036

db1238.eqiad.wmnet

Mentioned in SAL (#wikimedia-operations) [2023-11-14T09:53:54Z] <arnaudb@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1238.eqiad.wmnet with reason: provisionning db1238.eqiad.wmnet - T344036

I have to switch masters before cloning the host as it will be depooled

I have to switch masters before cloning the host as it will be depooled

done via T351184

Mentioned in SAL (#wikimedia-operations) [2023-11-14T10:46:04Z] <arnaudb@cumin1001> dbctl commit (dc=all): 'migrate db1138 to db1238 - T344036', diff saved to https://phabricator.wikimedia.org/P53392 and previous config saved to /var/cache/conftool/dbconfig/20231114-104603-arnaudb.json

Change 973419 had a related patch set uploaded (by Arnaudb; author: Arnaudb):

[operations/puppet@production] mariadb: add config to db1238

https://gerrit.wikimedia.org/r/973419

Change 973419 merged by Arnaudb:

[operations/puppet@production] mariadb: add config to db1238

https://gerrit.wikimedia.org/r/973419

Change 974632 had a related patch set uploaded (by Arnaudb; author: Arnaudb):

[operations/puppet@production] mariadb: re-enable notifications for db1238

https://gerrit.wikimedia.org/r/974632

Change 974632 merged by Arnaudb:

[operations/puppet@production] mariadb: re-enable notifications for db1238

https://gerrit.wikimedia.org/r/974632

will proceed with db1241 as its source host has performed the schema update of T348183

$ sudo python3 upload_sizes_T348183_check.py --run --check 
Starting schema change on db1141
SQL of schema change: 
ALTER TABLE  /*_*/filearchive
CHANGE  fa_size fa_size BIGINT UNSIGNED DEFAULT 0;
ALTER TABLE  /*_*/image
CHANGE  img_size img_size BIGINT UNSIGNED DEFAULT 0 NOT NULL;
ALTER TABLE  /*_*/oldimage
CHANGE  oi_size oi_size BIGINT UNSIGNED DEFAULT 0 NOT NULL;
ALTER TABLE  /*_*/uploadstash
CHANGE  us_size us_size BIGINT UNSIGNED NOT NULL;

Start of schema change sql on db1141
2023-11-16 09:21:36.945370 db-mysql db1141 -N -e "show slave hosts;"
STDOUT
Already applied on commonswiki in db1141, skipping
Already applied on testcommonswiki in db1141, skipping
End of schema change sql on db1141
End of schema change on db1141
Result: {"already done in all dbs": ["db1141"]}

Change 974633 had a related patch set uploaded (by Arnaudb; author: Arnaudb):

[operations/puppet@production] mariadb: add db1241 and prepare db1141 retirement

https://gerrit.wikimedia.org/r/974633

Change 974633 merged by Arnaudb:

[operations/puppet@production] mariadb: add db1241 and prepare db1141 retirement

https://gerrit.wikimedia.org/r/974633

Mentioned in SAL (#wikimedia-operations) [2023-11-16T12:54:44Z] <arnaudb@cumin1001> START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1141.eqiad.wmnet with reason: provisionning db1241.eqiad.wmnet - T344036

Icinga downtime and Alertmanager silence (ID=b0093dad-2860-4607-9345-170a960dd1c4) set by arnaudb@cumin1001 for 1 day, 0:00:00 on 1 host(s) and their services with reason: provisionning db1241.eqiad.wmnet - T344036

db1141.eqiad.wmnet

Mentioned in SAL (#wikimedia-operations) [2023-11-16T12:54:49Z] <arnaudb@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1141.eqiad.wmnet with reason: provisionning db1241.eqiad.wmnet - T344036

Mentioned in SAL (#wikimedia-operations) [2023-11-16T12:54:53Z] <arnaudb@cumin1001> START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1241.eqiad.wmnet with reason: provisionning db1241.eqiad.wmnet - T344036

Icinga downtime and Alertmanager silence (ID=314fe381-63ce-4366-8b2f-45a1a22d8476) set by arnaudb@cumin1001 for 1 day, 0:00:00 on 1 host(s) and their services with reason: provisionning db1241.eqiad.wmnet - T344036

db1241.eqiad.wmnet

Mentioned in SAL (#wikimedia-operations) [2023-11-16T12:55:04Z] <arnaudb@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1241.eqiad.wmnet with reason: provisionning db1241.eqiad.wmnet - T344036

Change 974642 had a related patch set uploaded (by Arnaudb; author: Arnaudb):

[operations/puppet@production] mariadb: prepare copy of db1142 to db1242

https://gerrit.wikimedia.org/r/974642

Change 974642 merged by Arnaudb:

[operations/puppet@production] mariadb: prepare copy of db1142 to db1242

https://gerrit.wikimedia.org/r/974642

Mentioned in SAL (#wikimedia-operations) [2023-11-17T14:39:32Z] <arnaudb@cumin1001> START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1142.eqiad.wmnet with reason: provisionning db1242.eqiad.wmnet - T344036

Icinga downtime and Alertmanager silence (ID=c44de66d-49e0-4fb1-9ed8-8d7ec767b500) set by arnaudb@cumin1001 for 1 day, 0:00:00 on 1 host(s) and their services with reason: provisionning db1242.eqiad.wmnet - T344036

db1142.eqiad.wmnet

Mentioned in SAL (#wikimedia-operations) [2023-11-17T14:39:46Z] <arnaudb@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1142.eqiad.wmnet with reason: provisionning db1242.eqiad.wmnet - T344036

Mentioned in SAL (#wikimedia-operations) [2023-11-17T14:39:50Z] <arnaudb@cumin1001> START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1242.eqiad.wmnet with reason: provisionning db1242.eqiad.wmnet - T344036

Icinga downtime and Alertmanager silence (ID=1afae4ba-dba0-457c-8ffd-7fa6c6884e81) set by arnaudb@cumin1001 for 1 day, 0:00:00 on 1 host(s) and their services with reason: provisionning db1242.eqiad.wmnet - T344036

db1242.eqiad.wmnet

Mentioned in SAL (#wikimedia-operations) [2023-11-17T14:40:04Z] <arnaudb@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1242.eqiad.wmnet with reason: provisionning db1242.eqiad.wmnet - T344036

Mentioned in SAL (#wikimedia-operations) [2023-11-17T14:42:35Z] <arnaudb@cumin1001> dbctl commit (dc=all): 'Cloning db1142 in db1242 for T344036', diff saved to https://phabricator.wikimedia.org/P53547 and previous config saved to /var/cache/conftool/dbconfig/20231117-144234-arnaudb.json

Change 975746 had a related patch set uploaded (by Arnaudb; author: Arnaudb):

[operations/puppet@production] bashrc: add a function to quick show info

https://gerrit.wikimedia.org/r/975746

db1241 and 1242 are repooling

Change 975746 merged by Arnaudb:

[operations/puppet@production] bashrc: add a function to quick show info

https://gerrit.wikimedia.org/r/975746

Change 975747 had a related patch set uploaded (by Arnaudb; author: Arnaudb):

[operations/puppet@production] mariadb: replace db1143 with 1243

https://gerrit.wikimedia.org/r/975747

Change 975747 merged by Arnaudb:

[operations/puppet@production] mariadb: replace db1143 with 1243

https://gerrit.wikimedia.org/r/975747

Mentioned in SAL (#wikimedia-operations) [2023-11-20T10:00:18Z] <arnaudb@cumin1001> START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1143.eqiad.wmnet with reason: provisionning db1243.eqiad.wmnet - T344036

Icinga downtime and Alertmanager silence (ID=dff02286-9e95-42d5-86bf-549a52035dd0) set by arnaudb@cumin1001 for 1 day, 0:00:00 on 1 host(s) and their services with reason: provisionning db1243.eqiad.wmnet - T344036

db1143.eqiad.wmnet

Mentioned in SAL (#wikimedia-operations) [2023-11-20T10:00:32Z] <arnaudb@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1143.eqiad.wmnet with reason: provisionning db1243.eqiad.wmnet - T344036

Mentioned in SAL (#wikimedia-operations) [2023-11-20T10:00:43Z] <arnaudb@cumin1001> START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1243.eqiad.wmnet with reason: provisionning db1243.eqiad.wmnet - T344036

Icinga downtime and Alertmanager silence (ID=9ba161ea-75af-4ffd-9e08-390551e9c0b8) set by arnaudb@cumin1001 for 1 day, 0:00:00 on 1 host(s) and their services with reason: provisionning db1243.eqiad.wmnet - T344036

db1243.eqiad.wmnet

Mentioned in SAL (#wikimedia-operations) [2023-11-20T10:00:59Z] <arnaudb@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1243.eqiad.wmnet with reason: provisionning db1243.eqiad.wmnet - T344036

Mentioned in SAL (#wikimedia-operations) [2023-11-20T10:02:12Z] <arnaudb@cumin1001> dbctl commit (dc=all): 'T344036 add db1243', diff saved to https://phabricator.wikimedia.org/P53616 and previous config saved to /var/cache/conftool/dbconfig/20231120-100212-arnaudb.json

Change 975748 had a related patch set uploaded (by Arnaudb; author: Arnaudb):

[operations/puppet@production] mariadb: repooled servers should alert

https://gerrit.wikimedia.org/r/975748

Change 975748 merged by Arnaudb:

[operations/puppet@production] mariadb: repooled servers should alert

https://gerrit.wikimedia.org/r/975748

Change 976956 had a related patch set uploaded (by Arnaudb; author: Arnaudb):

[operations/puppet@production] mariadb: replace db1147 by db1247 on s4

https://gerrit.wikimedia.org/r/976956

Change 976956 merged by Arnaudb:

[operations/puppet@production] mariadb: replace db1147 by db1247 on s4

https://gerrit.wikimedia.org/r/976956