Page MenuHomePhabricator

Move wikireplicas under the new sanitarium hosts (db1154, db1155)
Open, Stalled, MediumPublic

Description

Once the new sanitarium hosts running Buster and MariaDB 10.4 + Stretch are ready, we need to start moving the wikireplicas under them

All these run 10.4 so they can be moved without any blockers

  • clouddb1013:3311
  • clouddb1013:3313
  • clouddb1014:3312
  • clouddb1014:3317
  • clouddb1015:3314
  • clouddb1015:3316
  • clouddb1016:3315
  • clouddb1016:3318
  • clouddb1017:3311
  • clouddb1017:3313
  • clouddb1018:3312
  • clouddb1018:3317
  • clouddb1019:3314
  • clouddb1019:3316
  • clouddb1020:3315
  • clouddb1020:3318

10.1 replicas, they cannot be moved until we've got the green light from cloud-services-team as replication might break anytime:

  • labsdb1009
  • labsdb1010
  • labsdb1011

This replica belongs to Analytics and it is probably better just to rebuild it as multi-instance+10.4+stretch rather than moving it under the new replicas T269211: Convert labsdb1012 from multi-source to multi-instance

labsdb1012

Update: labsdb1012 is no longer, it has been converted to clouddb1021.

Related Objects

StatusSubtypeAssignedTask
ResolvedMarostegui
ResolvedRobH
OpenNone
OpenBstorm
ResolvedBstorm
ResolvedMarostegui
ResolvedMarostegui
OpenNone
OpenNone
OpenNone
OpenMarostegui
ResolvedRobH
OpenNone
OpenNone
ResolvedMarostegui
StalledMarostegui
ResolvedCmjohnson
Resolveddcaro

Event Timeline

Marostegui triaged this task as Medium priority.Jan 14 2021, 9:05 AM
Marostegui moved this task from Triage to Ready on the DBA board.
Marostegui moved this task from Ready to In progress on the DBA board.

Mentioned in SAL (#wikimedia-operations) [2021-01-18T08:17:40Z] <marostegui@cumin1001> dbctl commit (dc=all): 'Depool db1106 to stop replication, place db1105:3311 temporarily in vslow T272008', diff saved to https://phabricator.wikimedia.org/P13787 and previous config saved to /var/cache/conftool/dbconfig/20210118-081740-marostegui.json

Mentioned in SAL (#wikimedia-operations) [2021-01-18T09:25:46Z] <marostegui@cumin1001> dbctl commit (dc=all): 'Depool db1074 to stop replication T272008', diff saved to https://phabricator.wikimedia.org/P13795 and previous config saved to /var/cache/conftool/dbconfig/20210118-092546-marostegui.json

I have moved 4 out of the new 16 instances under the new hosts. Won't move more today, to make sure everything runs fine.

Mentioned in SAL (#wikimedia-operations) [2021-01-19T06:57:49Z] <marostegui@cumin1001> dbctl commit (dc=all): 'Depool db1082 to stop replication T272008', diff saved to https://phabricator.wikimedia.org/P13821 and previous config saved to /var/cache/conftool/dbconfig/20210119-065748-marostegui.json

clouddb1016:3315 and clouddb1020:3315 moved

Mentioned in SAL (#wikimedia-operations) [2021-01-19T08:58:57Z] <marostegui@cumin1001> dbctl commit (dc=all): 'Depool db1078 to stop replication T272008', diff saved to https://phabricator.wikimedia.org/P13826 and previous config saved to /var/cache/conftool/dbconfig/20210119-085856-marostegui.json

Mentioned in SAL (#wikimedia-operations) [2021-01-19T09:01:00Z] <marostegui@cumin1001> dbctl commit (dc=all): 'Depool db1112 to stop replication T272008', diff saved to https://phabricator.wikimedia.org/P13828 and previous config saved to /var/cache/conftool/dbconfig/20210119-090100-marostegui.json

clouddb1013:3313 and clouddb1017:3313 moved

Mentioned in SAL (#wikimedia-operations) [2021-01-20T10:34:50Z] <marostegui@cumin1001> dbctl commit (dc=all): 'Depool db1079 to stop replication T272008', diff saved to https://phabricator.wikimedia.org/P13842 and previous config saved to /var/cache/conftool/dbconfig/20210120-103449-marostegui.json

clouddb1014:3317 and clouddb1018:3317 moved.

clouddb1016:3318 and clouddb1020:3318 moved.

clouddb1015:3316 moved - clouddb1019:3316 is down due to HW issues: T272125

Marostegui changed the task status from Open to Stalled.Jan 22 2021, 1:33 PM
Marostegui updated the task description. (Show Details)

clouddb1015:3314 moved.
The only pending host is clouddb1019 which is waiting for on-site maintenance as it is inaccessible (T272125)

Mentioned in SAL (#wikimedia-operations) [2021-01-27T07:05:03Z] <marostegui@cumin1001> dbctl commit (dc=all): 'Depool db1085 T272008', diff saved to https://phabricator.wikimedia.org/P13968 and previous config saved to /var/cache/conftool/dbconfig/20210127-070502-marostegui.json

clouddb1019:3316 moved under db1155:3316

Marostegui moved this task from In progress to Blocked on the DBA board.
Marostegui added subscribers: nskaggs, Bstorm.

clouddb1019:3314 moved under db1155:3314

All the new clouddb hosts are moved under the new 10.4 sanitariums. This task is now stalled - waiting on the green light to move labsdb* hosts under the new sanitarium once we are ready to afford that replication can break anytime when going from 10.4 to 10.1

This can happen after 15th April