Productionize 11 new eqiad database servers
Closed, ResolvedPublic

Description

Like T170662, more than a task, this is a tracking list to not "lose" not yet in-production servers and coordinate how to set them up.

These servers T162233 have to be used to:

  • Decom servers < db1051
  • Setup misc servers appropiately
  • Setup the eventual s8

State:

  • db1096: provisioned on s5 rc service (and will later serve s8)
  • db1097: provisioned and serving s4
  • db1098: provisioned and serving s6
  • db1099: provisioned on s5 rc service (and will later serve s8)
  • db1100: provisioned on s5 (cloned from old master db1049 - and will later serve s8)
  • db1101: provisioned on s2
  • db1102: temporarily used as sanitarium3 - T169510
  • db1103: temporarily used in s3 to replace db1035 (cloned from db1035 to preserve its data)
  • db1104: provisioned on s5 (and will later serve s8)
  • db1105: provisioned on s5 (and will later serve s8)
  • db1106: provisioned on s5 (and will later serve s8)
There are a very large number of changes, so older changes are hidden. Show Older Changes

Mentioned in SAL (#wikimedia-operations) [2017-09-07T07:54:51Z] <marostegui@tin> Synchronized wmf-config/db-eqiad.php: Add db1100 - T172679 (duration: 00m 48s)

Change 376484 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/mediawiki-config@master] db-eqiad.php: Add db1100 to s5

https://gerrit.wikimedia.org/r/376484

Change 376484 merged by jenkins-bot:
[operations/mediawiki-config@master] db-eqiad.php: Add db1100 to s5 depooled

https://gerrit.wikimedia.org/r/376484

Mentioned in SAL (#wikimedia-operations) [2017-09-07T09:15:57Z] <marostegui@tin> Synchronized wmf-config/db-eqiad.php: Add db1100 depooled to s5 array - T172679 (duration: 00m 49s)

Marostegui updated the task description. (Show Details)Sep 7 2017, 9:16 AM

Change 376499 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/mediawiki-config@master] db-eqiad.php: Pool db1100 with weight 0

https://gerrit.wikimedia.org/r/376499

Change 376499 merged by jenkins-bot:
[operations/mediawiki-config@master] db-eqiad.php: Pool db1100 with weight 0

https://gerrit.wikimedia.org/r/376499

Mentioned in SAL (#wikimedia-operations) [2017-09-07T11:24:38Z] <marostegui@tin> Synchronized wmf-config/db-eqiad.php: Pool db1100 with 0 weight - T172679 (duration: 00m 49s)

Change 378859 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] mariadb: Setup db1101 on s2 to replace db1018 and db1036

https://gerrit.wikimedia.org/r/378859

jcrespo updated the task description. (Show Details)Sep 19 2017, 9:51 AM

Change 378859 merged by Jcrespo:
[operations/puppet@production] mariadb: Setup db1101 on s2 to replace db1018 and db1036

https://gerrit.wikimedia.org/r/378859

Change 378914 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/mediawiki-config@master] mariadb: Pool db1101 as new s2 host

https://gerrit.wikimedia.org/r/378914

Change 378916 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/software@master] dbtools: Add db1101 to s2

https://gerrit.wikimedia.org/r/378916

Change 378916 merged by Jcrespo:
[operations/software@master] dbtools: Add db1101 to s2

https://gerrit.wikimedia.org/r/378916

Change 378914 merged by Jcrespo:
[operations/mediawiki-config@master] mariadb: Pool db1101 as new s2 host

https://gerrit.wikimedia.org/r/378914

Change 378962 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] icinga: Disable notifications on db2078, enable them on db1101

https://gerrit.wikimedia.org/r/378962

Change 378962 merged by Jcrespo:
[operations/puppet@production] icinga: Disable notifications on db2078, enable them on db1101

https://gerrit.wikimedia.org/r/378962

I would like to decommission db1035 from s3, it is low on disk space.
According to https://gerrit.wikimedia.org/r/#/c/338996/1/wmf-config/db-eqiad.php what we could do is..simply decommission it and get both db1077 and db1078 to serve main traffic + recentchanges service.
If not, we can get one of the non used hosts of this tasks, temporarily, clone it from db1035 and just decommission it and later decide what to do with the rc service for s3

@jcrespo any thoughts on this?

Yes, although we may need still an extra host for vlow/dumps, separate from the other services, and smaller in size. Any ideas about which one we can move that is relatively old but not about to be removed?

Yes, although we may need still an extra host for vlow/dumps, separate from the other services, and smaller in size. Any ideas about which one we can move that is relatively old but not about to be removed?

According to: https://gerrit.wikimedia.org/r/#/c/338996/1/wmf-config/db-eqiad.php

In s1 we'd have 2 of the most powerful servers serving API.
Currently we have 3x160G serving API, we could take one of those and get one of the big ones to serve API instead.

I would like to decommission db1035 from s3, it is low on disk space.
According to https://gerrit.wikimedia.org/r/#/c/338996/1/wmf-config/db-eqiad.php what we could do is..simply decommission it and get both db1077 and db1078 to serve main traffic + recentchanges service.
If not, we can get one of the non used hosts of this tasks, temporarily, clone it from db1035 and just decommission it and later decide what to do with the rc service for s3

@jcrespo any thoughts on this?

Going to grab one of the servers and clone it from db1035, so we can save db1035's and later checksum it with no rush as db1035 is quickly filling up:

Filesystem            Type  Size  Used Avail Use% Mounted on
/dev/mapper/tank-data xfs   1.6T  1.5T  171G  90% /srv
Marostegui updated the task description. (Show Details)Sep 27 2017, 9:22 AM

Change 380937 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] mariadb: Add db1103 to s3

https://gerrit.wikimedia.org/r/380937

Change 380938 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/mediawiki-config@master] db-eqiad.php: Depool db1035 to clone it to db1103

https://gerrit.wikimedia.org/r/380938

Change 380937 merged by Marostegui:
[operations/puppet@production] mariadb: Add db1103 to s3

https://gerrit.wikimedia.org/r/380937

Change 380938 merged by jenkins-bot:
[operations/mediawiki-config@master] db-eqiad.php: Depool db1035 to clone it to db1103

https://gerrit.wikimedia.org/r/380938

Change 380944 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/software@master] s3.hosts: Add db1103

https://gerrit.wikimedia.org/r/380944

Mentioned in SAL (#wikimedia-operations) [2017-09-27T09:42:37Z] <marostegui@tin> Synchronized wmf-config/db-eqiad.php: Depool db1035 to transfer its data to db1103 - T172679 (duration: 00m 48s)

Mentioned in SAL (#wikimedia-operations) [2017-09-27T09:45:13Z] <marostegui> Stop mysql on db1035 to copy its data to db1103 - T172679

Change 380944 merged by jenkins-bot:
[operations/software@master] s3.hosts: Add db1103

https://gerrit.wikimedia.org/r/380944

Change 380954 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/mediawiki-config@master] db-eqiad,db-codfw.php: Add db1103 to the array

https://gerrit.wikimedia.org/r/380954

Change 380954 merged by jenkins-bot:
[operations/mediawiki-config@master] db-eqiad,db-codfw.php: Add db1103 to the array

https://gerrit.wikimedia.org/r/380954

Marostegui updated the task description. (Show Details)Sep 27 2017, 3:53 PM

Mentioned in SAL (#wikimedia-operations) [2017-09-27T15:54:04Z] <marostegui@tin> Synchronized wmf-config/db-eqiad.php: Pool db1103 with low weight - T172679 (duration: 00m 47s)

Yes, although we may need still an extra host for vlow/dumps, separate from the other services, and smaller in size. Any ideas about which one we can move that is relatively old but not about to be removed?

According to: https://gerrit.wikimedia.org/r/#/c/338996/1/wmf-config/db-eqiad.php

In s1 we'd have 2 of the most powerful servers serving API.
Currently we have 3x160G serving API, we could take one of those and get one of the big ones to serve API instead.

I am going to change the config (without taking any of the hosts away yet) and see how it would perform.
If it all goes fine, we can move one of the 160G to s3 vslow dumps (T172679#3635468) and decomm db1038

Change 381929 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/mediawiki-config@master] db-eqiad.php: Depool db1066

https://gerrit.wikimedia.org/r/381929

Change 381929 merged by jenkins-bot:
[operations/mediawiki-config@master] db-eqiad.php: Depool db1066

https://gerrit.wikimedia.org/r/381929

Mentioned in SAL (#wikimedia-operations) [2017-10-03T06:28:11Z] <marostegui@tin> Synchronized wmf-config/db-eqiad.php: Depool db1066 - T172679 (duration: 00m 47s)

Marostegui updated the task description. (Show Details)Oct 4 2017, 6:31 AM

Change 382118 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/mediawiki-config@master] db-eqiad.php: Depool db1087 to clone db1104

https://gerrit.wikimedia.org/r/382118

Change 382118 merged by jenkins-bot:
[operations/mediawiki-config@master] db-eqiad.php: Depool db1087 to clone db1104

https://gerrit.wikimedia.org/r/382118

Mentioned in SAL (#wikimedia-operations) [2017-10-04T06:39:08Z] <marostegui@tin> Synchronized wmf-config/db-eqiad.php: Depool db1087 - T172679 (duration: 00m 50s)

Change 382119 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] mariadb: Add db1104 to s5 cloned from db1087

https://gerrit.wikimedia.org/r/382119

Change 382119 merged by Marostegui:
[operations/puppet@production] mariadb: Add db1104 to s5 cloned from db1087

https://gerrit.wikimedia.org/r/382119

Change 382121 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/software@master] s5.hosts: Add db1104 to s5

https://gerrit.wikimedia.org/r/382121

Mentioned in SAL (#wikimedia-operations) [2017-10-04T06:53:46Z] <marostegui> Stop MySQL on db1087 to clone db1104 from it - T172679

Change 382121 merged by jenkins-bot:
[operations/software@master] s5.hosts: Add db1104 to s5

https://gerrit.wikimedia.org/r/382121

Change 382141 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/mediawiki-config@master] db-eqiad,db-codfw.php: Add db1104 to the config

https://gerrit.wikimedia.org/r/382141

Change 382141 merged by jenkins-bot:
[operations/mediawiki-config@master] db-eqiad,db-codfw.php: Add db1104 to the config

https://gerrit.wikimedia.org/r/382141

Mentioned in SAL (#wikimedia-operations) [2017-10-04T09:32:17Z] <marostegui@tin> Synchronized wmf-config/db-codfw.php: Add db1104 to the config - T172679 (duration: 00m 51s)

Mentioned in SAL (#wikimedia-operations) [2017-10-04T09:33:13Z] <marostegui@tin> Synchronized wmf-config/db-eqiad.php: Add db1104 to the config - T172679 (duration: 00m 50s)

Marostegui updated the task description. (Show Details)Oct 4 2017, 9:33 AM
Marostegui updated the task description. (Show Details)Oct 5 2017, 6:28 AM

Change 382372 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] mariadb: Add db1105 to s5

https://gerrit.wikimedia.org/r/382372

Change 382373 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/software@master] s5.hosts: Add db1105

https://gerrit.wikimedia.org/r/382373

Change 382372 merged by Marostegui:
[operations/puppet@production] mariadb: Add db1105 to s5

https://gerrit.wikimedia.org/r/382372

Change 382373 merged by jenkins-bot:
[operations/software@master] s5.hosts: Add db1105

https://gerrit.wikimedia.org/r/382373

Mentioned in SAL (#wikimedia-operations) [2017-10-05T06:40:38Z] <marostegui> Stop MySQL on db1104 to clone db1105 from it - T172679

Change 382388 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/mediawiki-config@master] db-eqiad,db-codfw.php: Add db1105 to the config

https://gerrit.wikimedia.org/r/382388

Change 382388 merged by jenkins-bot:
[operations/mediawiki-config@master] db-eqiad,db-codfw.php: Add db1105 to the config

https://gerrit.wikimedia.org/r/382388

Mentioned in SAL (#wikimedia-operations) [2017-10-05T08:39:49Z] <marostegui@tin> Synchronized wmf-config/db-codfw.php: Add db1105 to the config - T172679 (duration: 00m 52s)

Marostegui updated the task description. (Show Details)Oct 5 2017, 8:40 AM

Mentioned in SAL (#wikimedia-operations) [2017-10-05T08:40:45Z] <marostegui@tin> Synchronized wmf-config/db-eqiad.php: Add db1105 to the config - T172679 (duration: 00m 50s)

Change 382391 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/software@master] s5.hosts: Add db1106 to s5

https://gerrit.wikimedia.org/r/382391

Change 382392 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] mariadb: Add db1106 to s5

https://gerrit.wikimedia.org/r/382392

Change 382391 merged by jenkins-bot:
[operations/software@master] s5.hosts: Add db1106 to s5

https://gerrit.wikimedia.org/r/382391

Change 382392 merged by Marostegui:
[operations/puppet@production] mariadb: Add db1106 to s5

https://gerrit.wikimedia.org/r/382392

Mentioned in SAL (#wikimedia-operations) [2017-10-05T08:54:09Z] <marostegui> Stop MySQL on db1104 to clone db1106 - T172679

Change 382648 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/mediawiki-config@master] db-eqiad,db-codfw.php: Add db1106 to config

https://gerrit.wikimedia.org/r/382648

Change 382648 merged by jenkins-bot:
[operations/mediawiki-config@master] db-eqiad,db-codfw.php: Add db1106 to config

https://gerrit.wikimedia.org/r/382648

Mentioned in SAL (#wikimedia-operations) [2017-10-06T06:21:59Z] <marostegui@tin> Synchronized wmf-config/db-codfw.php: Add db1106 to the config - T172679 (duration: 00m 47s)

Marostegui updated the task description. (Show Details)Oct 6 2017, 6:22 AM

Mentioned in SAL (#wikimedia-operations) [2017-10-06T06:23:32Z] <marostegui@tin> Synchronized wmf-config/db-eqiad.php: Add db1106 to the config - T172679 (duration: 00m 47s)

Marostegui closed this task as Resolved.Oct 6 2017, 6:23 AM
Marostegui claimed this task.

All these hosts have been productionized already.
I am going to close this task as resolved as they are now all in production. They might not be in they definitive place, but they are serving traffic anyways.

Change 383076 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/mediawiki-config@master] db-eqiad.php: Move db1066 from s1 api to s3 vslow

https://gerrit.wikimedia.org/r/383076

Yes, although we may need still an extra host for vlow/dumps, separate from the other services, and smaller in size. Any ideas about which one we can move that is relatively old but not about to be removed?

According to: https://gerrit.wikimedia.org/r/#/c/338996/1/wmf-config/db-eqiad.php

In s1 we'd have 2 of the most powerful servers serving API.
Currently we have 3x160G serving API, we could take one of those and get one of the big ones to serve API instead.

I am going to change the config (without taking any of the hosts away yet) and see how it would perform.
If it all goes fine, we can move one of the 160G to s3 vslow dumps (T172679#3635468) and decomm db1038

I am going to move db1066 from s1 api to s3 vslow service.
Once done, we can decommission db1038 which is becoming short on disk space.

Change 383077 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] mariadb: Move db1066 from s1 to s3

https://gerrit.wikimedia.org/r/383077

Change 383076 merged by jenkins-bot:
[operations/mediawiki-config@master] db-eqiad.php: Move db1066 from s1 api to s3 vslow

https://gerrit.wikimedia.org/r/383076

Mentioned in SAL (#wikimedia-operations) [2017-10-09T08:12:23Z] <marostegui@tin> Synchronized wmf-config/db-eqiad.php: Move db1066 from s1 to s3 - T172679 (duration: 01m 25s)

I am going to change my plans and move db1072 instead of db1066.
The reason for that change is that db1072 was cloned from db1052, so we are sure we still have its data on the master.
I have been tracing from where db1066 was cloned from and could not find it. So instead of dropping it for good without knowing its data status, I rather use db1072 which we know its data is somewhere.

Change 383083 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/mediawiki-config@master] db-eqiad.php: Restore db1066 on s1, db1072 to s3

https://gerrit.wikimedia.org/r/383083

Change 383077 abandoned by Marostegui:
mariadb: Move db1066 from s1 to s3

Reason:
Going to move db1072 instead

https://gerrit.wikimedia.org/r/383077

Change 383083 merged by jenkins-bot:
[operations/mediawiki-config@master] db-eqiad.php: Restore db1066 on s1, db1072 to s3

https://gerrit.wikimedia.org/r/383083

Mentioned in SAL (#wikimedia-operations) [2017-10-09T08:26:22Z] <marostegui@tin> Synchronized wmf-config/db-eqiad.php: Restore db1066 on s1 to and move db1072 to s3 instead - T172679 (duration: 00m 47s)

Change 383085 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] mariadb: Move db1072 to s3

https://gerrit.wikimedia.org/r/383085

Mentioned in SAL (#wikimedia-operations) [2017-10-10T06:05:55Z] <marostegui> Stop MySQL on db1072 to move it to s3 - T172679

Change 383085 merged by Marostegui:
[operations/puppet@production] mariadb: Move db1072 to s3

https://gerrit.wikimedia.org/r/383085

Change 383309 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/mediawiki-config@master] db-eqiad.php: Depool db1038

https://gerrit.wikimedia.org/r/383309

Change 383309 merged by jenkins-bot:
[operations/mediawiki-config@master] db-eqiad.php: Depool db1038

https://gerrit.wikimedia.org/r/383309

Mentioned in SAL (#wikimedia-operations) [2017-10-10T06:19:58Z] <marostegui@tin> Synchronized wmf-config/db-eqiad.php: Depool db1038 - T172679 (duration: 00m 47s)

Mentioned in SAL (#wikimedia-operations) [2017-10-10T06:22:07Z] <marostegui> Stop MySQL on db1038 to transfer its data to db1072 - T172679

Change 383310 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/software@master] s3.hosts,s1.hosts: Move db1072 from s1 to s3

https://gerrit.wikimedia.org/r/383310

Change 383310 merged by jenkins-bot:
[operations/software@master] s3.hosts,s1.hosts: Move db1072 from s1 to s3

https://gerrit.wikimedia.org/r/383310

Mentioned in SAL (#wikimedia-operations) [2017-10-10T14:29:16Z] <marostegui@tin> Synchronized wmf-config/db-eqiad.php: Repool db1072 with low weight - T172679 (duration: 00m 47s)

Mentioned in SAL (#wikimedia-operations) [2017-10-11T07:10:47Z] <marostegui@tin> Synchronized wmf-config/db-eqiad.php: Pool db1072 into the vslow and dump service for s3 - T172679 (duration: 00m 46s)

Change 383516 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/mediawiki-config@master] db-eqiad.php: Depool db1103

https://gerrit.wikimedia.org/r/383516

Change 383516 merged by jenkins-bot:
[operations/mediawiki-config@master] db-eqiad.php: Depool db1103

https://gerrit.wikimedia.org/r/383516