Page MenuHomePhabricator

Migration from es2001-es2010 to es2011-es2019
Closed, ResolvedPublic

Description

Racking and setup of es2011-es2019 was completed on T126006

We need now to configure them and import the data according to following schema:

  • es2001 -> es2011 (es1)
  • es2002 -> es2012 (es1)
  • es2004 -> es2013 (es1)
  • es2005 -> es2014 (es2)
  • es2006 -> es2015 (es2)
  • es2007 -> es2016 (es2)
  • es2008 -> es2017 (es3)
  • es2009 -> es2018 (es3)
  • (was es2010) es2008 -> es2019 (es3)

Event Timeline

Change 271560 had a related patch set uploaded (by Volans):
mariadb: Add new es2011-2019 servers

https://gerrit.wikimedia.org/r/271560

Change 271560 merged by Volans:
mariadb: Add new es2011-2019 servers

https://gerrit.wikimedia.org/r/271560

Change 271577 had a related patch set uploaded (by Volans):
Depool es2001 to copy the data to es2011

https://gerrit.wikimedia.org/r/271577

Change 271577 merged by jenkins-bot:
Depool es2001 to copy the data to es2011

https://gerrit.wikimedia.org/r/271577

Mentioned in SAL [2016-02-18T20:52:46Z] <volans> Depool es2001 to copy the data to es2011 (T127330)

Mentioned in SAL [2016-02-18T21:59:27Z] <volans> Shutting down MySQL on es2001 (depooled) and starting data trasnfer to es2011 [ T127330 ]

Mentioned in SAL [2016-02-19T14:27:53Z] <volans> Restarting MariaDB on es2001 (still depooled) [T127330]

16:02 logmsgbot: volans@tin Synchronized wmf-config/db-codfw.php: Repool of es2001 (duration: 01m 39s)

All the new ones have the hardware RAID configured with the default 64 KB stripe size, running some sysbench to compare with our standard of 256 KB to see if we are loosing performance and need to rebuild them.

Depooling es1016 as has the same hardware and low load for comparison.

Change 272727 had a related patch set uploaded (by Volans):
Depool of es1016 for RAID perf comparison

https://gerrit.wikimedia.org/r/272727

Change 272727 merged by jenkins-bot:
Depool of es1016 for RAID perf comparison

https://gerrit.wikimedia.org/r/272727

First "quick" sysbench on es2011, it's running already on es1016 for comparison.

Based on the result we will decide if run a longer/larger one.

sysbench --test=fileio --file-total-size=256G --file-test-mode=rndrw --file-num=16 --file-fsync-all --num-threads=32 --max-requests=0 --max-time=3600 --batch --batch-delay=30 run
...
[1456239461] Operations performed:  3289174 Read, 2192785 Write, 2192785 Other = 7674744 Total
[1456239461] Read 50.189Gb  Written 33.459Gb  Total transferred 83.648Gb  (23.793Mb/sec)
[1456239461]  1522.75 Requests/sec executed
[1456239461]
[1456239461] Test execution summary:
[1456239461]     total time:                          3600.0411s
[1456239461]     total number of events:              5481959
[1456239461]     total time taken by event execution: 115193.3890
[1456239461]     per-request statistics:
[1456239461]          min:                                  0.01ms
[1456239461]          avg:                                 21.01ms
[1456239461]          max:                               1120.07ms
[1456239461]          approx.  95 percentile:              75.98ms
[1456239461]
[1456239461] Threads fairness:
[1456239461]     events (avg/stddev):           171311.2188/479.84
[1456239461]     execution time (avg/stddev):   3599.7934/0.01
[1456239461]

Results on es1016:

sysbench --test=fileio --file-total-size=256G --file-test-mode=rndrw --file-num=16 --file-fsync-all --num-threads=32 --max-requests=0 --max-time=3600 --batch --batch-delay=30 run
...
[1456242804] Operations performed:  4126455 Read, 2750973 Write, 2750973 Other = 9628401 Total
[1456242804] Read 62.965Gb  Written 41.977Gb  Total transferred 104.94Gb  (29.85Mb/sec)
[1456242804]  1910.38 Requests/sec executed
[1456242804]
[1456242804] Test execution summary:
[1456242804]     total time:                          3600.0362s
[1456242804]     total number of events:              6877428
[1456242804]     total time taken by event execution: 115191.7780
[1456242804]     per-request statistics:
[1456242804]          min:                                  0.01ms
[1456242804]          avg:                                 16.75ms
[1456242804]          max:                                509.00ms
[1456242804]          approx.  95 percentile:              50.31ms
[1456242804]
[1456242804] Threads fairness:
[1456242804]     events (avg/stddev):           214919.6250/561.86
[1456242804]     execution time (avg/stddev):   3599.7431/0.01
[1456242804]

@jcrespo for me this is already enough to justify the rebuild with stripe of 256 KB, that should be the only difference in hardware/configuration. What do you think?

An hdparm read test goes more in favor of the 64 KB, but I think this is less reliable as a test because of it's small size too.

root@es1016:/srv# hdparm -t /dev/mapper/tank-data
 Timing buffered disk reads: 2682 MB in  3.00 seconds = 893.76 MB/sec
 Timing buffered disk reads: 2696 MB in  3.03 seconds = 890.97 MB/sec
 Timing buffered disk reads: 2694 MB in  3.00 seconds = 897.90 MB/sec
root@es2011:~# hdparm -t /dev/mapper/tank-data
 Timing buffered disk reads: 2698 MB in  3.00 seconds = 899.10 MB/sec
 Timing buffered disk reads: 2802 MB in  3.00 seconds = 933.72 MB/sec
 Timing buffered disk reads: 2756 MB in  3.00 seconds = 918.13 MB/sec

Please do more than one (to check it is repeatable). I do not think the results are surprising, as it was expected for sequential writes.

But is it worth it for random writes (mysql-like load)?

The test was already with random read/write, I skip the sequential because doesn't apply to our workload:

--file-test-mode=rndrw               combined random read/write

I left the default r/w ratio:

--file-rw-ratio          reads/writes ration for combined random read/write test (default: 1.5)

Then I agree with your conclusions.

Change 272761 had a related patch set uploaded (by Volans):
Repool es1016 after RAID perf test

https://gerrit.wikimedia.org/r/272761

Change 272761 merged by jenkins-bot:
Repool es1016 after RAID perf test

https://gerrit.wikimedia.org/r/272761

Disabled notifications on icinga for es2011, testing different mount options

@jcrespo I cannot see any substantial difference between those 2 configurations:

  1. default scheduler (deadline), XFS mount to defaults (relatime)
  2. scheduler=noop, XFS mount with nobarrier,noatime,nodiratime

I've run multiple OLTP benchmarks with:
sysbench --test=oltp --oltp-table-size=1000000 --mysql-user=__sysbench__ --mysql-password=Absh7dl9s --mysql-db=__sysbench__ --mysql-socket=/tmp/mysql.sock --num-threads=32 --max-requests=0 --max-time=1800 --batch --batch-delay=30

And the results are the same within the fluctuations between runs:

Configuration 1:

OLTP test statistics:
    queries performed:
        read:                            302205624
        write:                           107930392
        other:                           43172163
        total:                           453308179
    transactions:                        21586047 (11992.23 per sec.)
    deadlocks:                           69     (0.04 per sec.)
    read/write requests:                 410136016 (227853.07 per sec.)
    other operations:                    43172163 (23984.51 per sec.)

Test execution summary:
    total time:                          1800.0021s
    total number of events:              21586047
    total time taken by event execution: 57471.6983
    per-request statistics:
        min:                                  1.44ms
        avg:                                  2.66ms
        max:                                 49.18ms
        approx.  95 percentile:               3.52ms

Threads fairness:
    events (avg/stddev):           674563.9688/3872.81
    execution time (avg/stddev):   1795.9906/0.02

Configuration 2:

OLTP test statistics:
    queries performed:
        read:                            302146278
        write:                           107909187
        other:                           43163686
        total:                           453219151
    transactions:                        21581809 (11989.88 per sec.)
    deadlocks:                           68     (0.04 per sec.)
    read/write requests:                 410055465 (227808.27 per sec.)
    other operations:                    43163686 (23979.79 per sec.)

Test execution summary:
    total time:                          1800.0026s
    total number of events:              21581809
    total time taken by event execution: 57470.5864
    per-request statistics:
        min:                                  1.45ms
        avg:                                  2.66ms
        max:                                 49.57ms
        approx.  95 percentile:               3.54ms

Threads fairness:
    events (avg/stddev):           674431.5312/2976.01
    execution time (avg/stddev):   1795.9558/0.02

For me we can go ahead with T127938 just fixing the RAID stripe for now.

Change 274351 had a related patch set uploaded (by Volans):
Depooled codfw external storage for migration

https://gerrit.wikimedia.org/r/274351

Starting migration with latest slave by topology: es2005, es2007, es2009 (es2010 is broken)

Change 274351 merged by jenkins-bot:
Depooled codfw external storage for migration

https://gerrit.wikimedia.org/r/274351

Mentioned in SAL [2016-03-02T09:16:50Z] <volans@tin> Synchronized wmf-config/db-codfw.php: Depooling external storage DBs in codfw for migration: T127330 (duration: 01m 24s)

Mentioned in SAL [2016-03-02T09:43:38Z] <volans> Cloning es2005->es2014, es2007->es2016, es2009->es2018, see T127330

Change 274447 had a related patch set uploaded (by Volans):
Repool es2005, es2007, es2009

https://gerrit.wikimedia.org/r/274447

Change 274447 merged by jenkins-bot:
Repool es2005, es2007, es2009

https://gerrit.wikimedia.org/r/274447

Mentioned in SAL [2016-03-02T17:55:11Z] <volans@tin> Synchronized wmf-config/db-codfw.php: Repooling external storage DBs in codfw after data was copied: T127330 (duration: 01m 06s)

Mentioned in SAL [2016-03-02T19:07:09Z] <volans> Data transfer completed, started MySQL and replica on es2014,es2016,es2018 [ T127330 ]

Volans updated the task description. (Show Details)

Mentioned in SAL [2016-03-03T09:57:01Z] <volans> Added es2014,es2016,es2018 to tendril [ T127330 ]

Mentioned in SAL [2016-03-03T10:36:59Z] <volans> Changing local replica topology for shard es2 in codfw for T127330

Mentioned in SAL [2016-03-03T10:48:57Z] <volans> Changing local replica topology for shard es3 in codfw for T127330

Change 274671 had a related patch set uploaded (by Volans):
Depooled es200{6,8} to migrate data to es201{5,6}

https://gerrit.wikimedia.org/r/274671

Change 274671 merged by jenkins-bot:
Depool es200{6,8} to migrate data to es201{5,7}

https://gerrit.wikimedia.org/r/274671

Mentioned in SAL [2016-03-03T11:21:11Z] <volans@tin> Synchronized wmf-config/db-codfw.php: Depool es2005,es2008 to migrate data to es2015,es2017 T127330 (duration: 00m 53s)

Mentioned in SAL [2016-03-03T11:52:59Z] <volans> Migrating data es2006->es2015 and es2008->es2017->es2019 T127330

Mentioned in SAL [2016-03-03T19:27:29Z] <volans> Completed migration of data from es200[68] to es201[579], added es201[579] to tendril. T127330

Change 274914 had a related patch set uploaded (by Volans):
Update codfw external storage server topology

https://gerrit.wikimedia.org/r/274914

Change 274914 merged by jenkins-bot:
Update codfw external storage server topology

https://gerrit.wikimedia.org/r/274914

Mentioned in SAL [2016-03-04T10:07:33Z] <volans@tin> Synchronized wmf-config/db-codfw.php: Update codfw external storage servers topology T127330 (duration: 00m 39s)

Mentioned in SAL [2016-03-04T10:30:22Z] <volans> Start copying data from es200[124] to es201[123] (ETA ~16-17h) T127330

Mentioned in SAL [2016-03-05T11:14:15Z] <volans> Data trasnfer completed during the night, (re)starting MySQL on es200[124] and es201[123] T127330

Pretty much all set, es201[1-9] ready, added to tendril and with all checks active on icinga.
Scheduled downtime for es200[124] will expire in few hours from now.

I'll send a CR to add the latest es1 new hosts in mediawiki-config to be merged on Monday.

Change 275166 had a related patch set uploaded (by Volans):
Repool es200[124] after data migration

https://gerrit.wikimedia.org/r/275166

This comment was removed by Volans.

I've dumped all schemas with:

mysqldump --all-databases --no-data --routines --triggers  > /tmp/db.schema

and found that the only diff is between es2005 and the other of es2 (es2006 and es2007) because of a test that was done to compress the data. It is replicated now to es2014, but will be reverted in the future, we'll put es2014 with a bit less load in the cluster for now. I've opened T129350

I've also run some quick sanity checks, see P2729, all looks good.

Change 275166 merged by jenkins-bot:
Rebalance external storage in codfw

https://gerrit.wikimedia.org/r/275166

Mentioned in SAL [2016-03-09T14:54:33Z] <volans@tin> Synchronized wmf-config/db-codfw.php: Rebalance external storage servers in codfw T127330 (duration: 00m 41s)

Change 276229 had a related patch set uploaded (by Volans):
Change codfw extenal storage topology

https://gerrit.wikimedia.org/r/276229

Mentioned in SAL [2016-03-09T18:44:23Z] <volans> Changing topology of local codfw masters for es2 and es3 before merging https://gerrit.wikimedia.org/r/#/c/276229/1 T127330

Change 276229 merged by jenkins-bot:
Change codfw extenal storage topology

https://gerrit.wikimedia.org/r/276229

Mentioned in SAL [2016-03-09T19:01:03Z] <volans@tin> Synchronized wmf-config/db-codfw.php: Change codfw external storage topology T127330 (duration: 00m 27s)

Change 276416 had a related patch set uploaded (by Volans):
Rebalance external storage topology in codfw

https://gerrit.wikimedia.org/r/276416

Disable notification and put scheduled downtime for es2001-es2010 on Icinga

Change 276416 merged by jenkins-bot:
Rebalance external storage topology in codfw

https://gerrit.wikimedia.org/r/276416

Mentioned in SAL [2016-03-10T09:50:52Z] <volans@tin> Synchronized wmf-config/db-codfw.php: Rebalance external storage servers in codfw T127330 (duration: 00m 34s)

es2001-es2010 are out of MediaWiki config. Scheduling them for decommissioning in T129452
I've already scheduled downtime until 2016-12-31 and disable notification for all of them on Icinga to avoid unnecessary pages.