Page MenuHomePhabricator

Setup newer machines and replace all old misc (m*) and x1 eqiad machines
Closed, ResolvedPublic

Description

db1059 (m3), substituting db1048 host, is already on stretch and with newer configuration, but we need to replace older machines:

misc

  • db1016 (replaced and ready to be decommissioned by DC Ops T190179)
  • db1001 (replaced and ready to be decommissioned by DC Ops T190262)
  • db1020 (replaced and ready to be decommissioned by DC Ops T189773)
  • db1043
  • db1009 (replaced and ready to be decommissioned by DC Ops T189216)

x1

  • db1029
  • db1031
  • tendril: db1011 (T184704 / T184703)
  • m4: db1047 and db1046 (already replaced)

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

maybe es2001? It has an older directory.

maybe es2001? It has an older directory.

I wanted to avoid cross-dc transfers....but it is not that big, so yeah. I will transfer it to es2001

I had another idea (not much dissimilar) having into account the new hosts that are coming- dumps will greatly benefit from larger servers (Ariel will be happier) but otherwise they will be under-utilized. I wanted to setup db1113 (and other upcoming hosts) as multisource slow hosts to increase its utilization, while giving more resources. In some cases, that would mean we change 1 server for 2 older hosts. That would give us immediately 2 hosts for m2/m1.

So probably we can set up db1113 in exchange for s5 vslow (db1051) and s6 (db1063)

ok to me- with "candidate hosts" and "statement hosts", the puzzle gets more and more difficult.

ok to me- with "candidate hosts" and "statement hosts", the puzzle gets more and more difficult.

Yeah, that is why I chose those two because:

  1. they are not candidate hosts
  2. they are not sanitarium masters

db1113:3315 and db1113:3316 are now compressing tables. I will pool this host on Monday and if it all goes fine for 24h, I will depool db1051 and db1063 and start moving them to m1 and m2 and start working on the replacement.

Change 418867 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] db1051,db1063: Disable notifications

https://gerrit.wikimedia.org/r/418867

Change 418867 merged by Marostegui:
[operations/puppet@production] db1051,db1063: Disable notifications

https://gerrit.wikimedia.org/r/418867

Change 418898 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/mediawiki-config@master] db-eqiad,db-codfw.php: Proposal for moving hosts

https://gerrit.wikimedia.org/r/418898

Change 419114 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/mediawiki-config@master] mariadb: Depool db1063

https://gerrit.wikimedia.org/r/419114

Change 419114 merged by jenkins-bot:
[operations/mediawiki-config@master] mariadb: Depool db1063

https://gerrit.wikimedia.org/r/419114

Change 419119 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/mediawiki-config@master] mariadb: Remove db1051 and db1063 from mediawiki

https://gerrit.wikimedia.org/r/419119

Change 419119 merged by jenkins-bot:
[operations/mediawiki-config@master] mariadb: Remove db1051 and db1063 from mediawiki

https://gerrit.wikimedia.org/r/419119

Change 419136 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] mariadb: Move db1063 and db1051 to m1 and m2 respectively

https://gerrit.wikimedia.org/r/419136

Change 419136 merged by Jcrespo:
[operations/puppet@production] mariadb: Move db1063 and db1051 to m1 and m2 respectively

https://gerrit.wikimedia.org/r/419136

Script wmf-auto-reimage was launched by jynus on neodymium.eqiad.wmnet for hosts:

['db1063.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201803131025_jynus_25551.log.

Script wmf-auto-reimage was launched by jynus on neodymium.eqiad.wmnet for hosts:

['db1051.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201803131026_jynus_25736.log.

Script wmf-auto-reimage was launched by jynus on neodymium.eqiad.wmnet for hosts:

['db1063.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201803131126_jynus_6245.log.

Completed auto-reimage of hosts:

['db1051.eqiad.wmnet']

Of which those FAILED:

['db1051.eqiad.wmnet']

Script wmf-auto-reimage was launched by jynus on neodymium.eqiad.wmnet for hosts:

['db1051.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201803131131_jynus_7137.log.

Installing...

Completed auto-reimage of hosts:

['db1063.eqiad.wmnet']

and were ALL successful.

Change 419216 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/dns@master] dbproxy: switchover m1 and m2 master reference

https://gerrit.wikimedia.org/r/419216

Change 419216 merged by Jcrespo:
[operations/dns@master] dbproxy: switchover m1 and m2 master reference

https://gerrit.wikimedia.org/r/419216

jcrespo claimed this task.Mar 15 2018, 12:39 PM
Marostegui updated the task description. (Show Details)Mar 15 2018, 12:40 PM

Change 419990 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/mediawiki-config@master] db-eqiad.php: Depool db1106

https://gerrit.wikimedia.org/r/419990

Change 419991 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] db1106.yaml: Disable notifications for db1106

https://gerrit.wikimedia.org/r/419991

Change 419990 merged by jenkins-bot:
[operations/mediawiki-config@master] db-eqiad.php: Depool db1106

https://gerrit.wikimedia.org/r/419990

Mentioned in SAL (#wikimedia-operations) [2018-03-16T09:15:34Z] <marostegui@tin> Synchronized wmf-config/db-eqiad.php: Depool db1106 - T183469 (duration: 00m 57s)

Change 419991 merged by Marostegui:
[operations/puppet@production] db1106.yaml: Disable notifications for db1106

https://gerrit.wikimedia.org/r/419991

Change 420277 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/mediawiki-config@master] db-eqiad.php: Move db1106 from s5 to s1

https://gerrit.wikimedia.org/r/420277

Change 420278 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] mariadb: Move db1106 from s5 to s1

https://gerrit.wikimedia.org/r/420278

Change 420277 merged by jenkins-bot:
[operations/mediawiki-config@master] db-eqiad.php: Move db1106 from s5 to s1

https://gerrit.wikimedia.org/r/420277

Mentioned in SAL (#wikimedia-operations) [2018-03-19T07:44:11Z] <marostegui@tin> Synchronized wmf-config/db-eqiad.php: Move db1106 from s5 to s1 - T183469 (duration: 01m 00s)

Change 420281 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/mediawiki-config@master] db-eqiad.php: Depool db1065

https://gerrit.wikimedia.org/r/420281

Change 420281 merged by jenkins-bot:
[operations/mediawiki-config@master] db-eqiad.php: Depool db1065

https://gerrit.wikimedia.org/r/420281

Change 420278 merged by Marostegui:
[operations/puppet@production] mariadb: Move db1106 from s5 to s1

https://gerrit.wikimedia.org/r/420278

Mentioned in SAL (#wikimedia-operations) [2018-03-19T07:55:48Z] <marostegui@tin> Synchronized wmf-config/db-eqiad.php: Depool db1065 - T183469 (duration: 00m 57s)

Change 420282 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/software@master] s1,s5.hosts: Move db1106 to s1

https://gerrit.wikimedia.org/r/420282

Change 420282 merged by jenkins-bot:
[operations/software@master] s1,s5.hosts: Move db1106 to s1

https://gerrit.wikimedia.org/r/420282

Mentioned in SAL (#wikimedia-operations) [2018-03-19T08:19:24Z] <marostegui> Reset slave on db1106 to get it ready for s1 - https://phabricator.wikimedia.org/T183469

Mentioned in SAL (#wikimedia-operations) [2018-03-19T10:37:31Z] <marostegui@tin> Synchronized wmf-config/db-eqiad.php: Repool db1065 - T183469 (duration: 00m 58s)

db1106 is now catching up after being recloned from db1065.
Once it has been replicating for another 24h, I would say we can change db1095 to replicate from db1106 and free up db1065.

Change 420643 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/mediawiki-config@master] db-eqiad.php: Pool db1106

https://gerrit.wikimedia.org/r/420643

Change 420644 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] db1106.yaml: Enable notifications

https://gerrit.wikimedia.org/r/420644

Change 420644 merged by Marostegui:
[operations/puppet@production] db1106.yaml: Enable notifications

https://gerrit.wikimedia.org/r/420644

Change 420643 merged by jenkins-bot:
[operations/mediawiki-config@master] db-eqiad.php: Pool db1106

https://gerrit.wikimedia.org/r/420643

Mentioned in SAL (#wikimedia-operations) [2018-03-20T07:49:09Z] <marostegui@tin> Synchronized wmf-config/db-eqiad.php: Pool db1106 in s1 - T183469 (duration: 00m 58s)

Change 420702 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/mediawiki-config@master] db-eqiad.php: Depool db1065

https://gerrit.wikimedia.org/r/420702

Change 420702 merged by jenkins-bot:
[operations/mediawiki-config@master] db-eqiad.php: Depool db1065

https://gerrit.wikimedia.org/r/420702

Mentioned in SAL (#wikimedia-operations) [2018-03-20T13:52:26Z] <marostegui@tin> Synchronized wmf-config/db-eqiad.php: Depool db1065, give main traffic to db1106 - T183469 (duration: 00m 58s)

Marostegui updated the task description. (Show Details)Mar 21 2018, 8:23 AM

Change 420964 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/mediawiki-config@master] db-eqiad.php: Depool db1106

https://gerrit.wikimedia.org/r/420964

Change 420964 merged by jenkins-bot:
[operations/mediawiki-config@master] db-eqiad.php: Depool db1106

https://gerrit.wikimedia.org/r/420964

Mentioned in SAL (#wikimedia-operations) [2018-03-21T08:36:29Z] <marostegui@tin> Synchronized wmf-config/db-eqiad.php: Depool db1106 - T183469 (duration: 01m 14s)

Mentioned in SAL (#wikimedia-operations) [2018-03-21T09:56:55Z] <marostegui@tin> Synchronized wmf-config/db-eqiad.php: Repool db1106 - T183469 (duration: 01m 15s)

Change 420975 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] db1065.yaml: Disable notifications

https://gerrit.wikimedia.org/r/420975

Change 420975 merged by Marostegui:
[operations/puppet@production] db1065.yaml: Disable notifications

https://gerrit.wikimedia.org/r/420975

Change 420976 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/mediawiki-config@master] db-eqiad,db-codfw.php: Remove db1065 from MW

https://gerrit.wikimedia.org/r/420976

Change 420976 merged by jenkins-bot:
[operations/mediawiki-config@master] db-eqiad,db-codfw.php: Remove db1065 from MW

https://gerrit.wikimedia.org/r/420976

Change 420979 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] mariadb: Move db1065 to misc

https://gerrit.wikimedia.org/r/420979

Change 420981 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/software@master] m1,s1.hosts: Move db1065 to m1

https://gerrit.wikimedia.org/r/420981

Change 420981 merged by jenkins-bot:
[operations/software@master] m1,s1.hosts: Move db1065 to m1

https://gerrit.wikimedia.org/r/420981

Change 420979 merged by Marostegui:
[operations/puppet@production] mariadb: Move db1065 to misc

https://gerrit.wikimedia.org/r/420979

Script wmf-auto-reimage was launched by marostegui on neodymium.eqiad.wmnet for hosts:

db1065.eqiad.wmnet

The log can be found in /var/log/wmf-auto-reimage/201803211030_marostegui_29984_db1065_eqiad_wmnet.log.

Mentioned in SAL (#wikimedia-operations) [2018-03-21T10:39:17Z] <marostegui@tin> Synchronized wmf-config/db-eqiad.php: Remove db1065 from config - T183469 (duration: 01m 15s)

Mentioned in SAL (#wikimedia-operations) [2018-03-21T10:40:43Z] <marostegui@tin> Synchronized wmf-config/db-codfw.php: Remove db1065 from config - T183469 (duration: 01m 15s)

Mentioned in SAL (#wikimedia-operations) [2018-03-21T10:51:20Z] <marostegui> Stop MySQL on db1016 to clone db1065 - T183469

Completed auto-reimage of hosts:

['db1065.eqiad.wmnet']

and were ALL successful.

db1065 is now replicating in m1.
I will leave mysql on db1016 stopped

Change 420991 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] dbproxy100{1,6}: Change standby host

https://gerrit.wikimedia.org/r/420991

Marostegui updated the task description. (Show Details)Mar 21 2018, 12:20 PM
Marostegui updated the task description. (Show Details)

Change 420991 merged by Marostegui:
[operations/puppet@production] dbproxy100{1,6}: Change standby host

https://gerrit.wikimedia.org/r/420991

Mentioned in SAL (#wikimedia-operations) [2018-03-22T06:15:59Z] <marostegui> Reload dbproxy1001 to pick up the new standby host - T183469

Mentioned in SAL (#wikimedia-operations) [2018-03-22T06:16:31Z] <marostegui> Reload dbproxy1006 to pick up the new standby host - T183469

Marostegui updated the task description. (Show Details)Mar 22 2018, 6:22 AM
Marostegui updated the task description. (Show Details)Mar 22 2018, 6:39 AM
Marostegui closed this task as Resolved.Mar 22 2018, 6:41 AM

All the hosts have been replaced.
The old hosts are now ready for DC Ops to finish the decommissioned and they are being tracked on their own single tasks.
The scope of this task is now then resolved.

We would like to thank everyone who helped out to get this done and get all the misc and x1 servers upgraded and renewed!

Change 418898 abandoned by Marostegui:
db-eqiad,db-codfw.php: Proposal for moving hosts

Reason:
This has already been done

https://gerrit.wikimedia.org/r/418898

Change 424268 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/mediawiki-config@master] db-eqiad.php: db1106 is sanitarium

https://gerrit.wikimedia.org/r/424268

Change 424268 merged by jenkins-bot:
[operations/mediawiki-config@master] db-eqiad.php: db1106 is sanitarium's master

https://gerrit.wikimedia.org/r/424268