Page MenuHomePhabricator

Move db1111 from test-s4 to s8
Closed, ResolvedPublic

Description

db1111 will be moved to s8 to help with the current load issues there.

Event Timeline

Marostegui renamed this task from Move db1112 from test-s4 to s8 to Move db1111 from test-s4 to s8.Feb 28 2020, 3:23 PM
Marostegui triaged this task as High priority.
Marostegui moved this task from Triage to In progress on the DBA board.
Marostegui updated the task description. (Show Details)

Mentioned in SAL (#wikimedia-operations) [2020-02-28T15:24:39Z] <marostegui> Stop replication on db1077 from db1111 (its master) - T246447

root@db1077.eqiad.wmnet[(none)]> stop slave; show slave status\G
Query OK, 0 rows affected, 1 warning (0.00 sec)

*************************** 1. row ***************************
               Slave_IO_State:
                  Master_Host: db1111.eqiad.wmnet
                  Master_User: repl
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: db1111-bin.000309
          Read_Master_Log_Pos: 513383829
               Relay_Log_File: db1077-relay-bin.000131
                Relay_Log_Pos: 23362108
        Relay_Master_Log_File: db1111-bin.000309
             Slave_IO_Running: No
            Slave_SQL_Running: No
              Replicate_Do_DB:
          Replicate_Ignore_DB:
           Replicate_Do_Table:
       Replicate_Ignore_Table:
      Replicate_Wild_Do_Table:
  Replicate_Wild_Ignore_Table:
                   Last_Errno: 0
                   Last_Error:
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 513383829
              Relay_Log_Space: 46894573
              Until_Condition: None
               Until_Log_File:
                Until_Log_Pos: 0
           Master_SSL_Allowed: Yes
           Master_SSL_CA_File:
           Master_SSL_CA_Path:
              Master_SSL_Cert:
            Master_SSL_Cipher:
               Master_SSL_Key:
        Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error:
               Last_SQL_Errno: 0
               Last_SQL_Error:
  Replicate_Ignore_Server_Ids:
             Master_Server_Id: 171966592
               Master_SSL_Crl:
           Master_SSL_Crlpath:
                   Using_Gtid: Slave_Pos
                  Gtid_IO_Pos: 171978775-171978775-2640579788,0-180359175-3368394787,171970589-171970589-201132050,171970567-171970567-14,180359175-180359175-43143523,171966592-171966592-54612387
      Replicate_Do_Domain_Ids:
  Replicate_Ignore_Domain_Ids:
                Parallel_Mode: conservative
1 row in set (0.00 sec)

root@db1077.eqiad.wmnet[(none)]> reset slave all;
Query OK, 0 rows affected (0.02 sec)

Change 575550 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] mariadb: Move db1111 into s8

https://gerrit.wikimedia.org/r/575550

Change 575550 merged by Marostegui:
[operations/puppet@production] mariadb: Move db1111 into s8

https://gerrit.wikimedia.org/r/575550

Confirming db1111 has nothing before re-imaging:

root@db1111:~# mysql -e "show processlist ; show slave status\G"
+---------+-----------------+-------------------+--------------------+---------+---------+-----------------------------+------------------+----------+
| Id      | User            | Host              | db                 | Command | Time    | State                       | Info             | Progress |
+---------+-----------------+-------------------+--------------------+---------+---------+-----------------------------+------------------+----------+
|       2 | event_scheduler | localhost         | NULL               | Daemon  | 6855434 | Waiting for next activation | NULL             |    0.000 |
|     602 | root            | localhost         | heartbeat          | Sleep   |       0 |                             | NULL             |    0.000 |
| 3200543 | watchdog        | 10.64.0.122:58948 | information_schema | Sleep   |       4 |                             | NULL             |    0.000 |
| 3200545 | watchdog        | 10.64.0.122:59624 | information_schema | Sleep   |       4 |                             | NULL             |    0.000 |
| 3200546 | watchdog        | 10.64.0.122:59864 | mysql              | Sleep   |       3 |                             | NULL             |    0.000 |
| 3200548 | watchdog        | 10.64.0.122:60776 | information_schema | Sleep   |      15 |                             | NULL             |    0.000 |
| 3200551 | watchdog        | 10.64.0.122:33522 | information_schema | Sleep   |       5 |                             | NULL             |    0.000 |
| 3271962 | watchdog        | 10.64.0.122:54270 | information_schema | Sleep   |       4 |                             | NULL             |    0.000 |
| 3280364 | watchdog        | 10.64.0.122:51456 | mysql              | Sleep   |       3 |                             | NULL             |    0.000 |
| 3344673 | watchdog        | 10.64.0.122:55830 | information_schema | Sleep   |     876 |                             | NULL             |    0.000 |
| 3344674 | watchdog        | 10.64.0.122:55842 | information_schema | Sleep   |     876 |                             | NULL             |    0.000 |
| 3344675 | watchdog        | 10.64.0.122:55924 | information_schema | Sleep   |     876 |                             | NULL             |    0.000 |
| 3344676 | watchdog        | 10.64.0.122:55958 | information_schema | Sleep   |     876 |                             | NULL             |    0.000 |
| 3348510 | root            | localhost         | NULL               | Query   |       0 | init                        | show processlist |    0.000 |
+---------+-----------------+-------------------+--------------------+---------+---------+-----------------------------+------------------+----------+
root@db1111:~#
root@db1111:/srv/sqldata/commonswiki# mysql -e "show master status\G"
*************************** 1. row ***************************
            File: db1111-bin.000309
        Position: 513714918
    Binlog_Do_DB:
Binlog_Ignore_DB:
root@db1111:/srv/sqldata# mysqlbinlog -vvv db1111-bin.000309 | grep -v heartbeat | egrep -i "INSERT|UPDATE|DELETE"
/*!40019 SET @@session.max_insert_delayed_threads=0*/;
root@db1111:/srv/sqldata#

Script wmf-auto-reimage was launched by marostegui on cumin1001.eqiad.wmnet for hosts:

['db1111.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/202002281540_marostegui_72603.log.

Completed auto-reimage of hosts:

['db1111.eqiad.wmnet']

and were ALL successful.

Change 575561 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] db1111: Reimage db1111 as buster

https://gerrit.wikimedia.org/r/575561

Change 575561 merged by Marostegui:
[operations/puppet@production] db1111: Reimage db1111 as buster

https://gerrit.wikimedia.org/r/575561

Script wmf-auto-reimage was launched by marostegui on cumin1001.eqiad.wmnet for hosts:

['db1111.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/202002281611_marostegui_79481.log.

Completed auto-reimage of hosts:

['db1111.eqiad.wmnet']

and were ALL successful.

Mentioned in SAL (#wikimedia-operations) [2020-03-01T17:54:21Z] <marostegui> Start replication on db1111 new host on s8 - T246447

The transfer has finished, I have started replication and it is now catching up.

Mentioned in SAL (#wikimedia-operations) [2020-03-02T06:04:38Z] <marostegui> Re-add db1111 to s8 in tendril and zarcillo - T246447

Users looking good:

mysql.py -hdb1111 mysql -e "select user,host from user where user like 'wik%';"
+-----------+-----------+
| User      | Host      |
+-----------+-----------+
| wikiadmin | 10.192.%  |
| wikiuser  | 10.192.%  |
| wikiadmin | 10.64.%   |
| wikiuser  | 10.64.%   |
| wikiadmin | localhost |
| wikiuser  | localhost |
+-----------+-----------+

Change 575825 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] db1111: Enable notifications

https://gerrit.wikimedia.org/r/575825

Change 575825 merged by Marostegui:
[operations/puppet@production] db1111: Enable notifications

https://gerrit.wikimedia.org/r/575825

Mentioned in SAL (#wikimedia-operations) [2020-03-02T06:24:36Z] <marostegui@cumin1001> dbctl commit (dc=all): 'Add db1111 to s8 with minimal weight to check grants and any other issues T246447', diff saved to https://phabricator.wikimedia.org/P10564 and previous config saved to /var/cache/conftool/dbconfig/20200302-062435-marostegui.json

db1111 placed in s8 with weight 1 just to check grants and other possible errors.
db1111 runs buster + 10.4 whilst the others run stretch+10.1 (T246604)

Mentioned in SAL (#wikimedia-operations) [2020-03-02T06:42:03Z] <marostegui> Enable events on db1111 T246447

Events enabled

root@db1111.eqiad.wmnet[ops]> show events;
+-----+--------------------------+------------------+-----------+-----------+------------+----------------+----------------+---------------------+------+---------+------------+----------------------+----------------------+--------------------+
| Db  | Name                     | Definer          | Time zone | Type      | Execute at | Interval value | Interval field | Starts              | Ends | Status  | Originator | character_set_client | collation_connection | Database Collation |
+-----+--------------------------+------------------+-----------+-----------+------------+----------------+----------------+---------------------+------+---------+------------+----------------------+----------------------+--------------------+
| ops | wmf_slave_overload       | root@10.64.32.25 | SYSTEM    | RECURRING | NULL       | 10             | SECOND         | 2018-09-04 00:00:01 | NULL | ENABLED |  171966592 | utf8                 | utf8_general_ci      | binary             |
| ops | wmf_slave_purge          | root@10.64.32.25 | SYSTEM    | RECURRING | NULL       | 15             | MINUTE         | 2018-09-04 00:00:00 | NULL | ENABLED |  171966592 | utf8                 | utf8_general_ci      | binary             |
| ops | wmf_slave_wikiuser_sleep | root@10.64.32.25 | SYSTEM    | RECURRING | NULL       | 30             | SECOND         | 2018-09-04 00:00:05 | NULL | ENABLED |  171966592 | utf8                 | utf8_general_ci      | binary             |
| ops | wmf_slave_wikiuser_slow  | root@10.64.32.25 | SYSTEM    | RECURRING | NULL       | 30             | SECOND         | 2018-09-04 00:00:03 | NULL | ENABLED |  171966592 | utf8                 | utf8_general_ci      | binary             |
+-----+--------------------------+------------------+-----------+-----------+------------+----------------+----------------+---------------------+------+---------+------------+----------------------+----------------------+--------------------+
4 rows in set (0.00 sec)

Mentioned in SAL (#wikimedia-operations) [2020-03-02T06:45:22Z] <marostegui@cumin1001> dbctl commit (dc=all): 'Increase weight from 1 to 10 on db1111 T246447', diff saved to https://phabricator.wikimedia.org/P10565 and previous config saved to /var/cache/conftool/dbconfig/20200302-064522-marostegui.json

Mentioned in SAL (#wikimedia-operations) [2020-03-02T07:21:19Z] <marostegui@cumin1001> dbctl commit (dc=all): 'Increase weight from 10 to 30 on db1111 T246447', diff saved to https://phabricator.wikimedia.org/P10566 and previous config saved to /var/cache/conftool/dbconfig/20200302-072118-marostegui.json

Mentioned in SAL (#wikimedia-operations) [2020-03-02T08:07:21Z] <marostegui@cumin1001> dbctl commit (dc=all): 'Increase weight from 30 to 50 on db1111 T246447', diff saved to https://phabricator.wikimedia.org/P10567 and previous config saved to /var/cache/conftool/dbconfig/20200302-080721-marostegui.json

Mentioned in SAL (#wikimedia-operations) [2020-03-02T08:54:21Z] <marostegui@cumin1001> dbctl commit (dc=all): 'Increase weight from 50 to 80 on db1111 T246447', diff saved to https://phabricator.wikimedia.org/P10568 and previous config saved to /var/cache/conftool/dbconfig/20200302-085420-marostegui.json

Mentioned in SAL (#wikimedia-operations) [2020-03-02T09:12:53Z] <addshore> warm cache for db1111 for Q0-6 million T219123 T246447 (pass 2)

Mentioned in SAL (#wikimedia-operations) [2020-03-02T09:34:49Z] <marostegui@cumin1001> dbctl commit (dc=all): 'Increase weight from 80 to 100 on db1111 T246447', diff saved to https://phabricator.wikimedia.org/P10571 and previous config saved to /var/cache/conftool/dbconfig/20200302-093449-marostegui.json

Mentioned in SAL (#wikimedia-operations) [2020-03-02T09:59:21Z] <marostegui@cumin1001> dbctl commit (dc=all): 'Increase weight from 100 to 150 on db1111 T246447', diff saved to https://phabricator.wikimedia.org/P10575 and previous config saved to /var/cache/conftool/dbconfig/20200302-095921-marostegui.json

Mentioned in SAL (#wikimedia-operations) [2020-03-02T10:34:45Z] <marostegui@cumin1001> dbctl commit (dc=all): 'Increase weight from 150 to 200 on db1111 T246447', diff saved to https://phabricator.wikimedia.org/P10576 and previous config saved to /var/cache/conftool/dbconfig/20200302-103445-marostegui.json

Mentioned in SAL (#wikimedia-operations) [2020-03-02T14:20:18Z] <marostegui@cumin1001> dbctl commit (dc=all): 'Increase weight from 200 to 250 on db1111 T246447', diff saved to https://phabricator.wikimedia.org/P10577 and previous config saved to /var/cache/conftool/dbconfig/20200302-142017-marostegui.json

Mentioned in SAL (#wikimedia-operations) [2020-03-02T14:51:31Z] <marostegui@cumin1001> dbctl commit (dc=all): 'Increase weight from 250 to 300 on db1111 T246447', diff saved to https://phabricator.wikimedia.org/P10581 and previous config saved to /var/cache/conftool/dbconfig/20200302-145130-marostegui.json

Mentioned in SAL (#wikimedia-operations) [2020-03-02T15:11:50Z] <marostegui@cumin1001> dbctl commit (dc=all): 'Increase weight from 300 to 350 on db1111 T246447', diff saved to https://phabricator.wikimedia.org/P10582 and previous config saved to /var/cache/conftool/dbconfig/20200302-151149-marostegui.json

Mentioned in SAL (#wikimedia-operations) [2020-03-02T15:34:16Z] <marostegui@cumin1001> dbctl commit (dc=all): 'Increase weight from 350 to 400 on db1111 T246447', diff saved to https://phabricator.wikimedia.org/P10583 and previous config saved to /var/cache/conftool/dbconfig/20200302-153416-marostegui.json

db1111 is now serving in s8 with the same weight as db1126 (400)