Page MenuHomePhabricator

setup tempdb2001(WMF6407)
Closed, ResolvedPublic

Description

This task will track the setup and deployment of tempdb2001 (WMF6407). This is a temporarily allocation of 1 month or less (so this can likely be reclaimed in May 2017 with DBA approval.

Event Timeline

Change 346576 had a related patch set uploaded (by RobH):
[operations/dns@master] setting dns for tempdb2001

https://gerrit.wikimedia.org/r/346576

Change 346576 merged by RobH:
[operations/dns@master] setting dns for tempdb2001

https://gerrit.wikimedia.org/r/346576

To clarify the state of this, we still need this ASAP for service implementation ahead of the switchover (that can take quite some time, it is more than just running puppet). It will be returned on the May 3rd.

Change 346577 had a related patch set uploaded (by RobH):
[operations/puppet@production] tempdb2001 install parameters

https://gerrit.wikimedia.org/r/346577

I'm getting the OS installed today and handed off.

Change 346577 merged by RobH:
[operations/puppet@production] tempdb2001 install parameters

https://gerrit.wikimedia.org/r/346577

Change 346587 had a related patch set uploaded (by RobH):
[operations/puppet@production] update tempdb2001 partitioning

https://gerrit.wikimedia.org/r/346587

Change 346587 merged by RobH:
[operations/puppet@production] update tempdb2001 partitioning

https://gerrit.wikimedia.org/r/346587

RobH updated the task description. (Show Details)
RobH removed jcrespo as the assignee of this task.Apr 5 2017, 7:54 PM

So this is now ready for puppet key/salt key and service implementation by the DBA team.

This already has their tag for DBA on the task, I had assigned to @jcrespo but recalled they prefer we not do direct assignments. So I'm listing both him and @Marostegui in this comment for update.

Change 346691 had a related patch set uploaded (by Marostegui):
[operations/puppet@production] site.pp: Add tempdb2001 new host

https://gerrit.wikimedia.org/r/346691

Thanks Rob! We will take it from here!

Change 346693 had a related patch set uploaded (by Marostegui):
[operations/software@master] x1.hosts: Add tempdb2001.codfw.wmnet

https://gerrit.wikimedia.org/r/346693

Change 346691 merged by Marostegui:
[operations/puppet@production] site.pp: Add tempdb2001 new host

https://gerrit.wikimedia.org/r/346691

Change 346693 merged by Marostegui:
[operations/software@master] x1.hosts: Add tempdb2001.codfw.wmnet to x1

https://gerrit.wikimedia.org/r/346693

tempdb2001 is now replicating from db2033.
It is running 10.0.30 (mysql_upgrade has been run), SSL is enabled.

Right now it is quite delayed:

Seconds_Behind_Master: 23069

We will see how long it takes to catch up. Will enable GTID too.

Cool! We need to let it replicate for a while before pooling it to confirm is is ok.

Cool! We need to let it replicate for a while before pooling it to confirm is is ok.

Yes! I will leave the ticket open until them (as a reminder, mostly for myself)

Change 346764 had a related patch set uploaded (by Marostegui):
[operations/mediawiki-config@master] db-codfw,db-eqiad.php: Add tempdb2001

https://gerrit.wikimedia.org/r/346764

Too early to say, but we might have IO problems on this host:

Captura de pantalla 2017-04-06 a las 16.43.07.png (254×633 px, 30 KB)

If I set global innodb_flush_log_at_trx_commit = 0; the server starts to catch up slowly, but steady, as soon as it gets back to 1, it starts to suffer.

As I said, too early to say we'll in the next few hours and overnight.

No hardware RAID like the other servers- we need to set innodb_flush_log_at_trx_commit = 0; sync_binlog=0; innodb_flush_method=(default)- needs restart, and other stuff to reduce synchronous IO. If it goes down, we just reclone it. This is not perfect hardware- do not expect "proper" performance :-)

Change 346764 merged by jenkins-bot:
[operations/mediawiki-config@master] db-codfw,db-eqiad.php: Add tempdb2001

https://gerrit.wikimedia.org/r/346764

Mentioned in SAL (#wikimedia-operations) [2017-04-06T15:26:46Z] <marostegui@tin> Synchronized wmf-config/db-eqiad.php: Add tempdb2001 to config files - T162290 (duration: 00m 39s)

Mentioned in SAL (#wikimedia-operations) [2017-04-06T15:27:32Z] <marostegui@tin> Synchronized wmf-config/db-codfw.php: Add tempdb2001 to config files - T162290 (duration: 00m 40s)

Change 346948 had a related patch set uploaded (by Marostegui):
[operations/mediawiki-config@master] db-codfw.php: Add tempdb2001 to x1

https://gerrit.wikimedia.org/r/346948

After disabling sync binlog and trx commit yesterday the server caught up.
I have enabled gtid as well. I have sent the patch to pool it, but I think we should leave it running the weekend and deploy it on Monday if all goes fine.

Change 346948 merged by jenkins-bot:
[operations/mediawiki-config@master] db-codfw.php: Add tempdb2001 to x1

https://gerrit.wikimedia.org/r/346948

Mentioned in SAL (#wikimedia-operations) [2017-04-10T06:03:48Z] <marostegui@tin> Synchronized wmf-config/db-codfw.php: Add tempdb2001 to x1 as a slave - T162290 (duration: 00m 38s)

Hi,

I have merged the patch to add it as a slave in codfw and the puppet temporary change to get it with sync_binlog=0 and innodb_flush_trx_commit=0 too. As Jaime said this should be properly done with hiera in the future.
As soon as we have the new servers in place we will provision one for this and we will schedule this tempdb2001 to be removed.

Thanks everyone for the help to get this server up and running so fast and on time for the DC switchover!