This task will track the setup and deployment of tempdb2001 (WMF6407). This is a temporarily allocation of 1 month or less (so this can likely be reclaimed in May 2017 with DBA approval.
|operations/mediawiki-config : master||db-codfw.php: Add tempdb2001 to x1|
|operations/mediawiki-config : master||db-codfw,db-eqiad.php: Add tempdb2001|
|operations/software : master||x1.hosts: Add tempdb2001.codfw.wmnet to x1|
|operations/puppet : production||site.pp: Add tempdb2001 new host|
|operations/puppet : production||update tempdb2001 partitioning|
|operations/puppet : production||tempdb2001 install parameters|
|operations/dns : master||setting dns for tempdb2001|
|Resolved||RobH||T161712 codfw: (1) spare pool system for temp allocation as database failover|
|Resolved||Marostegui||T162290 setup tempdb2001(WMF6407)|
To clarify the state of this, we still need this ASAP for service implementation ahead of the switchover (that can take quite some time, it is more than just running puppet). It will be returned on the May 3rd.
tempdb2001 is now replicating from db2033.
It is running 10.0.30 (mysql_upgrade has been run), SSL is enabled.
Right now it is quite delayed:
We will see how long it takes to catch up. Will enable GTID too.
Too early to say, but we might have IO problems on this host:
If I set global innodb_flush_log_at_trx_commit = 0; the server starts to catch up slowly, but steady, as soon as it gets back to 1, it starts to suffer.
As I said, too early to say we'll in the next few hours and overnight.
No hardware RAID like the other servers- we need to set innodb_flush_log_at_trx_commit = 0; sync_binlog=0; innodb_flush_method=(default)- needs restart, and other stuff to reduce synchronous IO. If it goes down, we just reclone it. This is not perfect hardware- do not expect "proper" performance :-)
After disabling sync binlog and trx commit yesterday the server caught up.
I have enabled gtid as well. I have sent the patch to pool it, but I think we should leave it running the weekend and deploy it on Monday if all goes fine.
I have merged the patch to add it as a slave in codfw and the puppet temporary change to get it with sync_binlog=0 and innodb_flush_trx_commit=0 too. As Jaime said this should be properly done with hiera in the future.
As soon as we have the new servers in place we will provision one for this and we will schedule this tempdb2001 to be removed.
Thanks everyone for the help to get this server up and running so fast and on time for the DC switchover!