Page MenuHomePhabricator

Support multi-instance on core hosts
Closed, ResolvedPublic

Description

We currently have dbstore servers supporting multi-instance with systemd running on Debian Stretch

We need support for multi-instance on the core servers now.
To do that we have some paths that we could follow:

Jessie + 10.0 + support for init.d with mysqld_multi
Jessie + 10.1
Stretch + 10.1 (only on those hosts).

Some work has already been done and db2084 is being used as a testing host: https://gerrit.wikimedia.org/r/#/c/384452/

The idea is to use 8 hosts to serve all the recentchanges services across all the core shards.
Splitting them across them so we can have redundancy:

codfw:

[] `db2092 (B8): will not be done, because we don't need special replicas on s3 https://gerrit.wikimedia.org/r/394573

eqiad

Pending things:

  • Fully pool db1098:3316 and db1098:3317 (to be finished today 7th)
  • Finish compressing tables on db1099:3311 after reimporting a few of them
  • Pool db1099:3311
  • Compress db1099:3318
  • Compress db1096:3316
    • To be done once db1098:3316 is fully pooled

Details

SubjectRepoBranchLines +/-
operations/mediawiki-configmaster+6 -6
operations/mediawiki-configmaster+6 -6
operations/mediawiki-configmaster+1 -1
operations/mediawiki-configmaster+3 -8
operations/mediawiki-configmaster+1 -1
operations/mediawiki-configmaster+1 -1
operations/mediawiki-configmaster+6 -0
operations/mediawiki-configmaster+2 -2
operations/mediawiki-configmaster+16 -3
operations/puppetproduction+0 -1
operations/softwaremaster+1 -0
operations/mediawiki-configmaster+6 -6
operations/softwaremaster+1 -1
operations/mediawiki-configmaster+6 -6
operations/puppetproduction+12 -4
operations/mediawiki-configmaster+2 -2
operations/mediawiki-configmaster+0 -6
operations/puppetproduction+0 -1
operations/mediawiki-configmaster+4 -4
operations/puppetproduction+1 -1
operations/mediawiki-configmaster+12 -4
operations/softwaremaster+1 -0
operations/mediawiki-configmaster+7 -7
operations/softwaremaster+1 -3
operations/puppetproduction+4 -9
operations/mediawiki-configmaster+11 -45
operations/puppetproduction+9 -6
operations/mediawiki-configmaster+6 -6
operations/mediawiki-configmaster+3 -3
operations/puppetproduction+4 -2
operations/mediawiki-configmaster+10 -8
operations/mediawiki-configmaster+3 -3
operations/softwaremaster+1 -0
operations/mediawiki-configmaster+6 -6
operations/softwaremaster+1 -1
operations/puppetproduction+9 -5
operations/puppetproduction+4 -1
operations/mediawiki-configmaster+1 -1
operations/mediawiki-configmaster+2 -2
operations/mediawiki-configmaster+2 -2
operations/mediawiki-configmaster+3 -3
operations/mediawiki-configmaster+16 -4
operations/mediawiki-configmaster+9 -9
operations/puppetproduction+0 -1
operations/softwaremaster+1 -0
operations/softwaremaster+1 -1
operations/softwaremaster+1 -1
operations/mediawiki-configmaster+6 -6
operations/puppetproduction+11 -4
operations/mediawiki-configmaster+2 -2
operations/mediawiki-configmaster+1 -1
operations/mediawiki-configmaster+2 -2
operations/puppetproduction+0 -1
operations/mediawiki-configmaster+16 -8
operations/mediawiki-configmaster+2 -2
operations/mediawiki-configmaster+6 -6
operations/mediawiki-configmaster+6 -5
operations/mediawiki-configmaster+7 -7
operations/softwaremaster+2 -1
operations/mediawiki-configmaster+9 -9
operations/puppetproduction+8 -4
operations/puppetproduction+1 -0
operations/puppetproduction+2 -0
operations/puppetproduction+0 -4
operations/mediawiki-configmaster+7 -7
operations/mediawiki-configmaster+2 -2
operations/mediawiki-configmaster+2 -2
operations/mediawiki-configmaster+4 -4
operations/mediawiki-configmaster+2 -2
operations/mediawiki-configmaster+20 -7
operations/puppetproduction+0 -1
operations/mediawiki-configmaster+4 -4
operations/mediawiki-configmaster+5 -5
operations/mediawiki-configmaster+3 -3
operations/mediawiki-configmaster+19 -6
operations/puppetproduction+0 -1
operations/mediawiki-configmaster+2 -2
operations/mediawiki-configmaster+2 -2
operations/mediawiki-configmaster+6 -6
operations/softwaremaster+2 -1
operations/mediawiki-configmaster+6 -6
operations/puppetproduction+8 -4
operations/mediawiki-configmaster+16 -0
operations/mediawiki-configmaster+16 -0
operations/mediawiki-configmaster+16 -4
operations/mediawiki-configmaster+16 -0
operations/softwaremaster+1 -0
operations/mediawiki-configmaster+16 -0
operations/puppetproduction+0 -1
operations/mediawiki-configmaster+2 -2
operations/mediawiki-configmaster+16 -2
operations/puppetproduction+1 -3
operations/mediawiki-configmaster+16 -0
operations/mediawiki-configmaster+2 -1
operations/mediawiki-configmaster+7 -7
operations/softwaremaster+1 -1
operations/puppetproduction+11 -3
operations/softwaremaster+2 -0
operations/puppetproduction+2 -0
operations/softwaremaster+2 -0
operations/puppetproduction+8 -1
operations/puppetproduction+2 -0
operations/puppetproduction+0 -1
operations/mediawiki-configmaster+14 -3
operations/puppetproduction+8 -5
operations/softwaremaster+2 -0
operations/mediawiki-configmaster+7 -7
operations/puppetproduction+8 -1
operations/puppetproduction+2 -0
operations/puppetproduction+8 -4
operations/softwaremaster+2 -1
operations/mediawiki-configmaster+6 -2
operations/puppetproduction+2 -0
operations/mediawiki-configmaster+7 -7
operations/softwaremaster+2 -0
operations/puppetproduction+11 -0
operations/mediawiki-configmaster+7 -7
operations/softwaremaster+5 -3
operations/puppetproduction+11 -0
operations/mediawiki-configmaster+13 -13
operations/softwaremaster+2 -0
operations/puppetproduction+14 -1
operations/softwaremaster+1 -0
operations/puppetproduction+2 -1
operations/softwaremaster+1 -1
operations/puppetproduction+2 -0
Show related patches Customize query in gerrit

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

So the summary of changes: db2085 (s3) and db2092 (s3) have disappeared (data has not been pysically deleted, but mysql is no longer running). db2092(s1) has been moved to db2085 (again, data has not been deleted, only server is shutdown. That means no special s3 multi-instance hosts, and db2078(m1), db2090 (empty) and db2092 (with garbage) can be dedicated to misc hosts?

Change 394622 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] mariadb: Reenable notifications on db2085 after s1 reimport

https://gerrit.wikimedia.org/r/394622

Mentioned in SAL (#wikimedia-operations) [2017-12-02T17:55:52Z] <marostegui> Reboot db1096.s5 to pick up the correct innodb_buffer_pool size after finishing compressing s5 - T178359

Change 394916 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/mediawiki-config@master] db-eqiad.php: Depool db1098

https://gerrit.wikimedia.org/r/394916

Change 394916 merged by jenkins-bot:
[operations/mediawiki-config@master] db-eqiad.php: Depool db1098

https://gerrit.wikimedia.org/r/394916

Mentioned in SAL (#wikimedia-operations) [2017-12-04T06:34:03Z] <marostegui@tin> Synchronized wmf-config/db-eqiad.php: Depool db1098 - T178359 (duration: 00m 46s)

Mentioned in SAL (#wikimedia-operations) [2017-12-04T06:40:24Z] <marostegui> Stop MySQL on db1098 to clone db1096.s6 - T178359

Mentioned in SAL (#wikimedia-operations) [2017-12-04T07:17:24Z] <marostegui> Compress s1 on db1099 - T178359

Change 394925 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/software@master] s6.hosts: Add db1096:3316

https://gerrit.wikimedia.org/r/394925

Change 394925 merged by jenkins-bot:
[operations/software@master] s6.hosts: Add db1096:3316

https://gerrit.wikimedia.org/r/394925

Change 394926 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/mediawiki-config@master] db-eqiad,db-codfw.php: Pool db1096:3315

https://gerrit.wikimedia.org/r/394926

Change 394926 merged by jenkins-bot:
[operations/mediawiki-config@master] db-eqiad,db-codfw.php: Pool db1096:3315

https://gerrit.wikimedia.org/r/394926

Mentioned in SAL (#wikimedia-operations) [2017-12-04T08:11:31Z] <marostegui@tin> Synchronized wmf-config/db-eqiad.php: Pool db1096:3315 - T178359 (duration: 00m 45s)

Change 394928 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] install_server: Allow reimage of db1098

https://gerrit.wikimedia.org/r/394928

Mentioned in SAL (#wikimedia-operations) [2017-12-04T08:12:22Z] <marostegui@tin> Synchronized wmf-config/db-codfw.php: Pool db1096:3315 - T178359 (duration: 00m 44s)

Change 394928 merged by Marostegui:
[operations/puppet@production] install_server: Allow reimage of db1098

https://gerrit.wikimedia.org/r/394928

Change 394929 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/mediawiki-config@master] db-eqiad.php: Increase traffic for db1096:331{5,6}

https://gerrit.wikimedia.org/r/394929

Change 394929 merged by jenkins-bot:
[operations/mediawiki-config@master] db-eqiad.php: Increase traffic for db1096:331{5,6}

https://gerrit.wikimedia.org/r/394929

Mentioned in SAL (#wikimedia-operations) [2017-12-04T08:30:31Z] <marostegui@tin> Synchronized wmf-config/db-eqiad.php: Increase traffic for db1096:3315 and pool db1096:3316 - T178359 (duration: 00m 45s)

Mentioned in SAL (#wikimedia-operations) [2017-12-04T08:45:54Z] <marostegui@tin> Synchronized wmf-config/db-eqiad.php: Increase traffic for db1096:3316 - T178359 (duration: 00m 45s)

Mentioned in SAL (#wikimedia-operations) [2017-12-04T08:57:50Z] <marostegui@tin> Synchronized wmf-config/db-eqiad.php: Increase traffic for db1096:3315 and 3316 - T178359 (duration: 00m 45s)

Change 394944 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/mediawiki-config@master] db-eqiad.php: Remove db1099 from s5

https://gerrit.wikimedia.org/r/394944

Change 394622 merged by Jcrespo:
[operations/puppet@production] mariadb: Reenable notifications on db2085 after s1 reimport

https://gerrit.wikimedia.org/r/394622

Change 394944 merged by jenkins-bot:
[operations/mediawiki-config@master] db-eqiad.php: Remove db1099 from s5

https://gerrit.wikimedia.org/r/394944

Change 394954 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/mediawiki-config@master] db-eqiad.php: Fully pool db1096:3316

https://gerrit.wikimedia.org/r/394954

Change 394955 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] mariadb: Conver db1098 to multiinstance

https://gerrit.wikimedia.org/r/394955

Change 394954 merged by jenkins-bot:
[operations/mediawiki-config@master] db-eqiad.php: Fully pool db1096:3316

https://gerrit.wikimedia.org/r/394954

Change 394955 merged by Marostegui:
[operations/puppet@production] mariadb: Convert db1098 to multiinstance

https://gerrit.wikimedia.org/r/394955

Script wmf-auto-reimage was launched by marostegui on neodymium.eqiad.wmnet for hosts:

db1098.eqiad.wmnet

The log can be found in /var/log/wmf-auto-reimage/201712041022_marostegui_22618_db1098_eqiad_wmnet.log.

Change 394615 merged by Jcrespo:
[operations/mediawiki-config@master] mariadb: Pool db2085:3311 (s1) after being moved from db2092

https://gerrit.wikimedia.org/r/394615

Completed auto-reimage of hosts:

['db1098.eqiad.wmnet']

and were ALL successful.

Change 394965 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/software@master] s6.hosts: db1098 will replicate on 3316

https://gerrit.wikimedia.org/r/394965

Change 394965 merged by jenkins-bot:
[operations/software@master] s6.hosts: db1098 will replicate on 3316

https://gerrit.wikimedia.org/r/394965

Mentioned in SAL (#wikimedia-operations) [2017-12-04T13:05:52Z] <marostegui> Compress s6 on db1098 - T178359

Change 395183 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/mediawiki-config@master] db-eqiad.php: Depool db1034

https://gerrit.wikimedia.org/r/395183

Change 395183 merged by jenkins-bot:
[operations/mediawiki-config@master] db-eqiad.php: Depool db1034

https://gerrit.wikimedia.org/r/395183

Mentioned in SAL (#wikimedia-operations) [2017-12-05T06:55:35Z] <marostegui@tin> Synchronized wmf-config/db-eqiad.php: Depool db1034 - T178359! (duration: 00m 43s)

Mentioned in SAL (#wikimedia-operations) [2017-12-05T06:58:47Z] <marostegui> Stop MySQL on db1034 to clone db1098:3317 - T178359

Mentioned in SAL (#wikimedia-operations) [2017-12-05T09:20:59Z] <marostegui> Optimize s7 on db1098 - T178359

Mentioned in SAL (#wikimedia-operations) [2017-12-05T09:20:59Z] <marostegui> Optimize s7 on db1098 - T178359

s/optimize/compress

Mentioned in SAL (#wikimedia-operations) [2017-12-05T09:49:02Z] <marostegui@tin> Synchronized wmf-config/db-eqiad.php: Repool db1034 with low weight - T178359! (duration: 00m 43s)

Mentioned in SAL (#wikimedia-operations) [2017-12-05T10:05:55Z] <marostegui@tin> Synchronized wmf-config/db-eqiad.php: Increase traffic for db1034 - T178359! (duration: 00m 43s)

Mentioned in SAL (#wikimedia-operations) [2017-12-05T11:53:52Z] <marostegui@tin> Synchronized wmf-config/db-eqiad.php: Increase traffic for db1034 - T178359! (duration: 00m 43s)

Change 395693 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/software@master] s7.hosts: Add db1098:3317

https://gerrit.wikimedia.org/r/395693

Change 395693 merged by jenkins-bot:
[operations/software@master] s7.hosts: Add db1098:3317

https://gerrit.wikimedia.org/r/395693

Change 395928 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/mediawiki-config@master] db-eqiad.php: Pool db1099:3311

https://gerrit.wikimedia.org/r/395928

Mentioned in SAL (#wikimedia-operations) [2017-12-07T06:45:48Z] <marostegui> Stop replication on db1099:3311 to reimport: change_tag, tag_summary, user and watchlist tables and recompress again - T178359

Change 395929 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/mediawiki-config@master] db-eqiad,db-codfw.php: Pool db1098:331{6,7}

https://gerrit.wikimedia.org/r/395929

Change 395930 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] db1098.yaml: Enable notifications

https://gerrit.wikimedia.org/r/395930

Change 395930 merged by Marostegui:
[operations/puppet@production] db1098.yaml: Enable notifications

https://gerrit.wikimedia.org/r/395930

Change 395929 merged by jenkins-bot:
[operations/mediawiki-config@master] db-eqiad,db-codfw.php: Pool db1098:331{6,7}

https://gerrit.wikimedia.org/r/395929

Mentioned in SAL (#wikimedia-operations) [2017-12-07T07:18:55Z] <marostegui@tin> Synchronized wmf-config/db-codfw.php: Slowly pool db1098:3316 db1098:3317 - T178359 (duration: 00m 48s)

Mentioned in SAL (#wikimedia-operations) [2017-12-07T07:19:49Z] <marostegui@tin> Synchronized wmf-config/db-eqiad.php: Slowly pool db1098:3316 db1098:3317 - T178359 (duration: 00m 47s)

db1098:3316 and 3317 starting to get pooled.

Pending things:

  • Fully pool db1098:3316 and db1098:3317 (to be finished today 7th)
    • Finished compressing tables on db1099:3311 after reimporting a few of them
      • pool db1099:3311
    • Compress db1096:3316
      • To be done once db1098:3316 is fully pooled

Mentioned in SAL (#wikimedia-operations) [2017-12-07T08:32:26Z] <marostegui@tin> Synchronized wmf-config/db-eqiad.php: Increase traffic for db1098:3316 db1098:3317 - T178359 (duration: 00m 48s)

Mentioned in SAL (#wikimedia-operations) [2017-12-07T08:51:25Z] <marostegui@tin> Synchronized wmf-config/db-eqiad.php: Increase traffic for db1098:3316 db1098:3317 - T178359 (duration: 00m 48s)

Mentioned in SAL (#wikimedia-operations) [2017-12-07T09:36:14Z] <marostegui@tin> Synchronized wmf-config/db-eqiad.php: Increase traffic for db1098:3316 db1098:3317 - T178359 (duration: 00m 52s)

Change 395963 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/mediawiki-config@master] db-eqiad.php: Fully pool db1098:331{6,7}

https://gerrit.wikimedia.org/r/395963

Change 395963 merged by jenkins-bot:
[operations/mediawiki-config@master] db-eqiad.php: Fully pool db1098:331{6,7}

https://gerrit.wikimedia.org/r/395963

Mentioned in SAL (#wikimedia-operations) [2017-12-07T10:14:34Z] <marostegui@tin> Synchronized wmf-config/db-eqiad.php: Fully pool db1098:3316 db1098:3317 - T178359 (duration: 00m 51s)

Mentioned in SAL (#wikimedia-operations) [2017-12-07T11:35:12Z] <marostegui> Compress s8 on db1099 - T178359

Change 395928 merged by jenkins-bot:
[operations/mediawiki-config@master] db-eqiad.php: Pool db1099:3311

https://gerrit.wikimedia.org/r/395928

Mentioned in SAL (#wikimedia-operations) [2017-12-07T14:55:19Z] <marostegui@tin> Synchronized wmf-config/db-eqiad.php: Pool db1099:331 with low weight - T178359 (duration: 00m 47s)

Change 396022 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/mediawiki-config@master] db-eqiad.php: Increase traffic for db1099:3311

https://gerrit.wikimedia.org/r/396022

Change 396022 merged by jenkins-bot:
[operations/mediawiki-config@master] db-eqiad.php: Increase traffic for db1099:3311

https://gerrit.wikimedia.org/r/396022

Mentioned in SAL (#wikimedia-operations) [2017-12-07T15:24:39Z] <marostegui@tin> Synchronized wmf-config/db-eqiad.php: Increase db1099:3311 weight - T178359 (duration: 00m 48s)

Change 396026 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/mediawiki-config@master] db-eqiad.php: Increase traffic for db1099:3311

https://gerrit.wikimedia.org/r/396026

Change 396026 merged by jenkins-bot:
[operations/mediawiki-config@master] db-eqiad.php: Increase traffic for db1099:3311

https://gerrit.wikimedia.org/r/396026

Mentioned in SAL (#wikimedia-operations) [2017-12-07T15:44:39Z] <marostegui@tin> Synchronized wmf-config/db-eqiad.php: Increase db1099:3311 weight - T178359 (duration: 00m 48s)

Change 396030 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/mediawiki-config@master] db-eqiad.php: Repool db1093 as main traffic

https://gerrit.wikimedia.org/r/396030

Change 396030 merged by jenkins-bot:
[operations/mediawiki-config@master] db-eqiad.php: Repool db1093 as main traffic

https://gerrit.wikimedia.org/r/396030

Mentioned in SAL (#wikimedia-operations) [2017-12-07T16:01:51Z] <marostegui@tin> Synchronized wmf-config/db-eqiad.php: Repool db1093 back as main traffic in s6 - T178359 (duration: 00m 48s)

Mentioned in SAL (#wikimedia-operations) [2017-12-07T16:14:00Z] <marostegui@tin> Synchronized wmf-config/db-eqiad.php: Increase db1099:3311 weight - T178359 (duration: 00m 48s)

Change 396304 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/mediawiki-config@master] db-eqiad.php: Fully pool db1099:3311

https://gerrit.wikimedia.org/r/396304

Change 396304 merged by jenkins-bot:
[operations/mediawiki-config@master] db-eqiad.php: Fully pool db1099:3311

https://gerrit.wikimedia.org/r/396304

Mentioned in SAL (#wikimedia-operations) [2017-12-08T06:43:33Z] <marostegui@tin> Synchronized wmf-config/db-eqiad.php: Fully pool db1099:3311 - T178359 (duration: 00m 55s)

Change 397232 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/mediawiki-config@master] db-eqiad.php: Depool db1096:3316

https://gerrit.wikimedia.org/r/397232

Change 397232 merged by jenkins-bot:
[operations/mediawiki-config@master] db-eqiad.php: Depool db1096:3316

https://gerrit.wikimedia.org/r/397232

Mentioned in SAL (#wikimedia-operations) [2017-12-11T06:21:08Z] <marostegui@tin> Synchronized wmf-config/db-eqiad.php: Depool db1096:3316 to compress InnoDB there - T178359 (duration: 00m 45s)

Mentioned in SAL (#wikimedia-operations) [2017-12-11T06:22:12Z] <marostegui> Compress s6 on db1096 - T178359

Change 397739 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/mediawiki-config@master] db-eqiad.php: Depool db1055

https://gerrit.wikimedia.org/r/397739

Change 397739 merged by jenkins-bot:
[operations/mediawiki-config@master] db-eqiad.php: Depool db1055

https://gerrit.wikimedia.org/r/397739

Mentioned in SAL (#wikimedia-operations) [2017-12-12T06:48:56Z] <marostegui@tin> Synchronized wmf-config/db-eqiad.php: Depool db1055 - T178359 T182653 (duration: 00m 56s)

Mentioned in SAL (#wikimedia-operations) [2017-12-12T09:23:10Z] <marostegui@tin> Synchronized wmf-config/db-eqiad.php: Repool db1096:3316 after InnoDB there - T178359 (duration: 00m 56s)

Marostegui updated the task description. (Show Details)