Page MenuHomePhabricator

Relocate "old" s4 hosts
Closed, ResolvedPublic

Description

The following hosts have been replaced in s4, which hosts with larger disks:

eqiad:

  • db1081 decommissioned
  • db1084 (moved to s1)
  • db1091 (moved to s1)
  • db1097 (moved to m1) T254556
  • db1102 (currently backup source > moved to x1)
  • db1103 (moved to x1, to replace db1127 so db1127 can go to s7)
  • db1121
  • db1138

codfw:

  • db2090
  • db2073
  • db2091 (moved to s8)
  • db2099
  • db2084 (moved to s8)

They should be relocated to other places including:

At least:

  • extra hosts in s1 DONE: db1091
  • extra hosts in s8 (including extra vslow): db2084, db2091
  • extra host in s7 DONE: db1127
  • Maybe replace x1 hosts with these ones (smaller disks) DONE: db1103
  • db1133: for backup testing to replace the one that was taken months ago for core

Other movements

  • db1127 from x1 to s7
  • db1135 from m1 to s1 and db1080 to m2 (finally db1080 will be moved to m1 because of T256717)
  • db1132 to m3
  • db1128 to m5
  • db1133 to backup testing @jcrespo to decide its final destination
  • db1107 from s1 to m2

Related Objects

StatusSubtypeAssignedTask
Resolved Cmjohnson
ResolvedMarostegui
ResolvedMarostegui
ResolvedMarostegui

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Mentioned in SAL (#wikimedia-operations) [2020-06-17T12:40:35Z] <marostegui@cumin2001> dbctl commit (dc=all): 'Add db2091 to s8 T253217', diff saved to https://phabricator.wikimedia.org/P11566 and previous config saved to /var/cache/conftool/dbconfig/20200617-124034-marostegui.json

Script wmf-auto-reimage was launched by marostegui on cumin2001.codfw.wmnet for hosts:

['db2091.codfw.wmnet']

The log can be found in /var/log/wmf-auto-reimage/202006171246_marostegui_25785.log.

Completed auto-reimage of hosts:

['db2091.codfw.wmnet']

and were ALL successful.

Change 606311 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] db2091: Enable notifications

https://gerrit.wikimedia.org/r/606311

Change 606311 merged by Marostegui:
[operations/puppet@production] db2091: Enable notifications

https://gerrit.wikimedia.org/r/606311

I intend to "take" db1102, delete its data and setup x1 with buster on it to generate 10.4 backups.

I intend to "take" db1102, delete its data and setup x1 with buster on it to generate 10.4 backups.

Remember that you can also take db1084 anytime now (needs to be depooled first)

Change 607728 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] db1135: Disable notifications

https://gerrit.wikimedia.org/r/607728

Change 607728 merged by Marostegui:
[operations/puppet@production] db1135: Disable notifications

https://gerrit.wikimedia.org/r/607728

Change 608256 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] mariadb: Move db1135 to s1

https://gerrit.wikimedia.org/r/c/operations/puppet/ /608256

Change 608256 merged by Marostegui:
[operations/puppet@production] mariadb: Move db1135 to s1

https://gerrit.wikimedia.org/r/c/operations/puppet/ /608256

Script wmf-auto-reimage was launched by marostegui on cumin1001.eqiad.wmnet for hosts:

['db1135.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/202006290440_marostegui_27864.log.

Mentioned in SAL (#wikimedia-operations) [2020-06-29T04:57:08Z] <marostegui> Stop MySQL on db1080 to clone db1135 T253217

Completed auto-reimage of hosts:

['db1135.eqiad.wmnet']

and were ALL successful.

Change 608259 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] instances.yaml: Remove db1080, add db1135

https://gerrit.wikimedia.org/r/c/operations/puppet/ /608259

Change 608259 merged by Marostegui:
[operations/puppet@production] instances.yaml: Remove db1080, add db1135

https://gerrit.wikimedia.org/r/c/operations/puppet/ /608259

Mentioned in SAL (#wikimedia-operations) [2020-06-29T07:46:12Z] <marostegui@cumin1001> dbctl commit (dc=all): 'Add db1135 (depooled) to s1 T253217', diff saved to https://phabricator.wikimedia.org/P11684 and previous config saved to /var/cache/conftool/dbconfig/20200629-074611-marostegui.json

Change 608265 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] db1135: Enable notifications

https://gerrit.wikimedia.org/r/c/operations/puppet/ /608265

Change 608265 merged by Marostegui:
[operations/puppet@production] db1135: Enable notifications

https://gerrit.wikimedia.org/r/c/operations/puppet/ /608265

Mentioned in SAL (#wikimedia-operations) [2020-06-29T08:02:54Z] <marostegui@cumin1001> dbctl commit (dc=all): 'Slowly pool db1135 into s1 T253217', diff saved to https://phabricator.wikimedia.org/P11685 and previous config saved to /var/cache/conftool/dbconfig/20200629-080253-marostegui.json

Mentioned in SAL (#wikimedia-operations) [2020-06-29T08:26:35Z] <marostegui@cumin1001> dbctl commit (dc=all): 'Slowly pool db1135 into s1 T253217', diff saved to https://phabricator.wikimedia.org/P11686 and previous config saved to /var/cache/conftool/dbconfig/20200629-082635-marostegui.json

Mentioned in SAL (#wikimedia-operations) [2020-06-29T08:36:32Z] <marostegui@cumin1001> dbctl commit (dc=all): 'Slowly pool db1135 into s1 T253217', diff saved to https://phabricator.wikimedia.org/P11687 and previous config saved to /var/cache/conftool/dbconfig/20200629-083631-marostegui.json

Mentioned in SAL (#wikimedia-operations) [2020-06-29T08:48:28Z] <marostegui@cumin1001> dbctl commit (dc=all): 'Fully pool db1135 into s1 T253217', diff saved to https://phabricator.wikimedia.org/P11688 and previous config saved to /var/cache/conftool/dbconfig/20200629-084827-marostegui.json

Change 608508 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] mariadb: Move db1080 from s1 to m2

https://gerrit.wikimedia.org/r/c/operations/puppet/ /608508

Change 608508 merged by Marostegui:
[operations/puppet@production] mariadb: Move db1080 from s1 to m2

https://gerrit.wikimedia.org/r/c/operations/puppet/ /608508

Script wmf-auto-reimage was launched by marostegui on cumin1001.eqiad.wmnet for hosts:

['db1080.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/202006300451_marostegui_8504.log.

Completed auto-reimage of hosts:

['db1080.eqiad.wmnet']

Of which those FAILED:

['db1080.eqiad.wmnet']

Script wmf-auto-reimage was launched by marostegui on cumin1001.eqiad.wmnet for hosts:

['db1080.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/202006300504_marostegui_10430.log.

Completed auto-reimage of hosts:

['db1080.eqiad.wmnet']

Of which those FAILED:

['db1080.eqiad.wmnet']

As agreed on IRC, I am going to take db1084 to use it to upgrade m2, then old m2 master will be used to upgrade m3, old m3 master will be used to upgrade m5, and finally, that old m5 master will be used for backup testing.

Change 615331 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] mariadb: Move db1084 to s1

https://gerrit.wikimedia.org/r/615331

Change 615331 merged by Marostegui:
[operations/puppet@production] mariadb: Move db1084 to s1

https://gerrit.wikimedia.org/r/615331

Script wmf-auto-reimage was launched by marostegui on cumin1001.eqiad.wmnet for hosts:

['db1084.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/202007220557_marostegui_17008.log.

Completed auto-reimage of hosts:

['db1084.eqiad.wmnet']

and were ALL successful.

Change 615406 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] instances.yaml: Add db1084

https://gerrit.wikimedia.org/r/615406

Change 615406 merged by Marostegui:
[operations/puppet@production] instances.yaml: Add db1084

https://gerrit.wikimedia.org/r/615406

Mentioned in SAL (#wikimedia-operations) [2020-07-22T07:50:40Z] <marostegui@cumin1001> dbctl commit (dc=all): 'Add db1084 to s1, depooled T253217', diff saved to https://phabricator.wikimedia.org/P12005 and previous config saved to /var/cache/conftool/dbconfig/20200722-075040-marostegui.json

Change 615409 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] db1084: Enable notifications

https://gerrit.wikimedia.org/r/615409

Change 615409 merged by Marostegui:
[operations/puppet@production] db1084: Enable notifications

https://gerrit.wikimedia.org/r/615409

@jcrespo db1133 is ready for you. It was originally thought to be placed on backup testing, but feel free to move it around the backup infra wherever you prefer
It has notifications disabled.

Thanks, @Marostegui. It really would be helpful to test backup recoveries on eqiad, too without affecting database testing. I belive db1077 will "return" to you (mw db testing) soon: T187984#6433997

Change 625648 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] instances.yaml: Remove db1133

https://gerrit.wikimedia.org/r/625648

Change 625648 merged by Marostegui:
[operations/puppet@production] instances.yaml: Remove db1133

https://gerrit.wikimedia.org/r/625648

Mentioned in SAL (#wikimedia-operations) [2020-09-07T14:35:08Z] <marostegui@cumin1001> dbctl commit (dc=all): 'Remove db1133 from dbctl T253217', diff saved to https://phabricator.wikimedia.org/P12504 and previous config saved to /var/cache/conftool/dbconfig/20200907-143507-marostegui.json

Change 625860 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] mariadb: Move db1133 from m5 to core-test (backup testing db)

https://gerrit.wikimedia.org/r/625860

Change 625860 merged by Jcrespo:
[operations/puppet@production] mariadb: Move db1133 from m5 to core-test (backup testing db)

https://gerrit.wikimedia.org/r/625860

Change 625863 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] install_server: Reimage db1133 into buster

https://gerrit.wikimedia.org/r/625863

Change 625863 merged by Jcrespo:
[operations/puppet@production] install_server: Reimage db1133 into buster

https://gerrit.wikimedia.org/r/625863

Script wmf-auto-reimage was launched by jynus on cumin1001.eqiad.wmnet for hosts:

['db1133.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/202009081101_jynus_9028.log.

Completed auto-reimage of hosts:

['db1133.eqiad.wmnet']

and were ALL successful.

As far as service setup, db1133 was correctly reimaged into buster and populated with a backup of enwiki, and started replicating. This also tested backup at the same time. Tendril and zarcillo were updated accordingly. I think no more tasks related to this.

You may want to use some host here for mediawiki db testing? Although not sure if there are more ordered incoming. Leaving any pending tasks/decision to Manuel or Stevie, unless you ask me to help with something else.

Marostegui changed the task status from Open to Stalled.Nov 6 2020, 9:35 AM

Stalling as we are blocked on the migration to 10.4, the pending hosts are either masters or candidate masters.

This is mostly finished, the last move should be: T282535