Page MenuHomePhabricator

Migrate dbstore2001 to multi instance
Closed, ResolvedPublic

Description

dbstore2001 is acting very weirdly and clearly has some issues
See: T165033#3257985
See: T168354#3363734

Ideally we should migrate it to multi instance so we can start testing with it, its monitoring etc.

Shards that have been imported into dbstore2001:

  • s1
  • s2
  • s3
  • s4
  • s5
  • s6
  • s7
  • x1

Event Timeline

Marostegui moved this task from Triage to Pending comment on the DBA board.

Now that we are considering dbstore2002 done with its 5 shards (T171321) the idea would be:

  • Delete all the content from dbstore2001
  • Reimage as strecth (it runs jessie)
  • Move its puppet role from dbstore2 to dbstore_multiinstance
  • Copy the content from dbstore2002 to dbstore2001
  • Start adding the pending new shards (s5,s6 and s7)

Change 370788 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] install_server: Allow reimage of db1069, dbstore2001

https://gerrit.wikimedia.org/r/370788

jcrespo moved this task from Pending comment to In progress on the DBA board.

Change 370788 merged by Jcrespo:
[operations/puppet@production] install_server: Allow reimage of db1069, dbstore2001

https://gerrit.wikimedia.org/r/370788

Change 370789 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] dbstore: Migrate dbstore2001 to dbstore_multiinstance role

https://gerrit.wikimedia.org/r/370789

Change 370789 merged by Jcrespo:
[operations/puppet@production] dbstore: Migrate dbstore2001 to dbstore_multiinstance role

https://gerrit.wikimedia.org/r/370789

Script wmf_auto_reimage was launched by jynus on neodymium.eqiad.wmnet for hosts:

['dbstore2001.codfw.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201708090847_jynus_29328.log.

Completed auto-reimage of hosts:

['dbstore2001.codfw.wmnet']

and were ALL successful.

Change 370796 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] prometheus_mysqld_exporter: Adding new mysql instances at dbstore2001

https://gerrit.wikimedia.org/r/370796

Change 370797 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/software@master] dblists: Add new 5 instances from dbstore2001 to the list of mysqls

https://gerrit.wikimedia.org/r/370797

Change 370796 merged by Jcrespo:
[operations/puppet@production] prometheus_mysqld_exporter: Adding new mysql instances at dbstore2001

https://gerrit.wikimedia.org/r/370796

Change 370797 merged by Jcrespo:
[operations/software@master] dblists: Add new 5 instances from dbstore2001 to the list of mysqls

https://gerrit.wikimedia.org/r/370797

I am finishing adding s1 and s2 today, will continue adding the other 3 tomorrow.

I am finishing adding s1 and s2 today, will continue adding the other 3 tomorrow.

s5 and s6 will soon be ready to be added too. How would you like to tackle T168409#3511504?

I will deploy some conditional code, want to finish the first 5 first.

Change 371073 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] dbstore_multiinstance: All hosts other than dbstore2002 will have 8 instances

https://gerrit.wikimedia.org/r/371073

Change 371447 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/software@master] dblists: Add extra instances to dbstore2001

https://gerrit.wikimedia.org/r/371447

Change 371447 merged by Jcrespo:
[operations/software@master] dblists: Add extra instances to dbstore2001

https://gerrit.wikimedia.org/r/371447

Change 371073 merged by Jcrespo:
[operations/puppet@production] dbstore_multiinstance: All hosts other than dbstore2002 will have 8 instances

https://gerrit.wikimedia.org/r/371073

You said:

s5 and s6 will soon be ready to be added too

Can you comment which are ready here, because I know you are doing lots of things and I do not want to touch them unless you give me the go. Also I am not in a hurry, this is to have it tracked here.

You said:

s5 and s6 will soon be ready to be added too

Can you comment which are ready here, because I know you are doing lots of things and I do not want to touch them unless you give me the go. Also I am not in a hurry, this is to have it tracked here.

Absolutely!

s5 is now ready to be copied to dbstore2001 from db2075

Change 371491 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/mediawiki-config@master] mariadb: Depool db2075 for cloning to dbstore2001

https://gerrit.wikimedia.org/r/371491

Change 371491 merged by jenkins-bot:
[operations/mediawiki-config@master] mariadb: Depool db2075 for cloning to dbstore2001

https://gerrit.wikimedia.org/r/371491

s5 is being loaded right now. I may have configured the memory usage too high for 7/8 instances (15 GB for the buffer pool)- we may want to increase it for dbstore2002, and decrease it a bit for the others, or we will run out of memory.

Change 371494 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] prometheus-mysqld-exporter: Add s5 to the dbstore2001 monitored hosts

https://gerrit.wikimedia.org/r/371494

Change 371494 merged by Jcrespo:
[operations/puppet@production] prometheus-mysqld-exporter: Add s5 to the dbstore2001 monitored hosts

https://gerrit.wikimedia.org/r/371494

Mentioned in SAL (#wikimedia-operations) [2017-08-16T07:25:01Z] <marostegui> Stop MySQL on db2076 to copy its content to dbstore2001 - T168409

Mentioned in SAL (#wikimedia-operations) [2017-08-16T12:24:41Z] <marostegui> Compressing InnoDB on db2077 - T168409

Change 372138 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] mysql-dbstore_codfw: Add s6

https://gerrit.wikimedia.org/r/372138

Kizule subscribed.

Not needlessly to Patch For Review project be here.

Change 372138 merged by Marostegui:
[operations/puppet@production] mysql-dbstore_codfw: Add s6

https://gerrit.wikimedia.org/r/372138

Change 372366 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] dbstore3.my.cnf: Reduce pool size

https://gerrit.wikimedia.org/r/372366

Change 372366 abandoned by Marostegui:
dbstore3.my.cnf: Reduce pool size

Reason:
Better to be done via: https://gerrit.wikimedia.org/r/#/c/372400/

https://gerrit.wikimedia.org/r/372366

Change 372502 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/mediawiki-config@master] db-codfw.php: Depool db2077

https://gerrit.wikimedia.org/r/372502

Change 372502 merged by jenkins-bot:
[operations/mediawiki-config@master] db-codfw.php: Depool db2077

https://gerrit.wikimedia.org/r/372502

Mentioned in SAL (#wikimedia-operations) [2017-08-18T07:26:20Z] <marostegui@tin> Synchronized wmf-config/db-codfw.php: Deoool db2077 - T168409 (duration: 00m 44s)

Mentioned in SAL (#wikimedia-operations) [2017-08-18T07:28:32Z] <marostegui> Stop MySQL on db2077 to copy it to dbstore2001 - T168409

Change 372508 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] mysql-dbstore_codfw: Add dbstore2001 - s7

https://gerrit.wikimedia.org/r/372508

Change 372508 merged by Marostegui:
[operations/puppet@production] mysql-dbstore_codfw: Add dbstore2001 - s7

https://gerrit.wikimedia.org/r/372508

s7 is now replicating on dbstore2001 with gtid

Marostegui updated the task description. (Show Details)

Mentioned in SAL (#wikimedia-operations) [2017-08-18T09:16:45Z] <marostegui@tin> Synchronized wmf-config/db-codfw.php: Repool db2077 - T168409 (duration: 00m 44s)

Change 372519 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] dbstore2001: Lower memory pressure

https://gerrit.wikimedia.org/r/372519

Change 372519 merged by Jcrespo:
[operations/puppet@production] dbstore2001: Lower memory pressure

https://gerrit.wikimedia.org/r/372519

Change 372819 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] dbstore2001: Increase buffer pool of s7

https://gerrit.wikimedia.org/r/372819

Change 372819 merged by Jcrespo:
[operations/puppet@production] dbstore2001: Increase buffer pool of s7

https://gerrit.wikimedia.org/r/372819

Change 373256 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] mariadb: reduce shards replicated to dbstore2001

https://gerrit.wikimedia.org/r/373256

Change 373256 merged by Jcrespo:
[operations/puppet@production] mariadb: reduce shards replicated to dbstore2001

https://gerrit.wikimedia.org/r/373256

Change 373267 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] mariadb: Renable buffer pool dump on dbstores and prometheus fix

https://gerrit.wikimedia.org/r/373267

Change 373267 merged by Jcrespo:
[operations/puppet@production] mariadb: Renable buffer pool dump on dbstores and prometheus fix

https://gerrit.wikimedia.org/r/373267