Page MenuHomePhabricator

rack and setup db1107 and db1108
Closed, ResolvedPublic

Description

This task will track the racking/setup/installation of 2 new databases (db1107 and db1108) for eqiad. These were ordered on T168650. @Marostegui please update task with desired racking locations.

db1107

  • receive in system on procurement task T168650
  • bios/drac/serial setup/testing
  • raid set up to raid 10 (standard db raid)
  • mgmt dns entries added for both asset tag and hostname
  • production dns entries added
  • network port setup
  • operations/puppet update
  • OS installation
  • puppet/salt accept/initial run
  • handoff for service implementation

db1108

  • receive in system on procurement task T168650
  • bios/drac/serial setup/testing
  • raid set up to raid 10 (standard db raid)
  • mgmt dns entries added for both asset tag and hostname
  • production dns entries added
  • network port setup
  • operations/puppet update
  • OS installation
  • puppet/salt accept/initial run
  • handoff for service implementation

Details

Show related patches Customize query in gerrit

Event Timeline

Cmjohnson created this task.Oct 4 2017, 3:13 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptOct 4 2017, 3:13 PM
Cmjohnson renamed this task from rack and setup db1107 and db1008 to rack and setup db1107 and db1108.Oct 4 2017, 3:14 PM

Wait, should those be named db1* ? CC @Cmjohnson @elukey @Ottomata I personally do not have a problem with that (it gets some regex configurations because it is a database), but maybe you didn't want that?

I have no opinions on the naming of these boxes :)

elukey added a comment.Oct 4 2017, 3:24 PM

I have no preference, the alternative would be to explicitly mention eventlogging in their names, going to ask to my team and report back asap.

I don't really have any preference on where to rack them, I would just suggest they are placed on a different rack. But as these two hosts will be mostly for Analytics, I would leave it to @elukey or @Ottomata to decide (I can help of course!)

elukey added a comment.EditedOct 4 2017, 4:01 PM

We are super fine with db1* names, no real preference, just asked to my team. If it is fine for the DBA team, we can proceed!

For the racking, I'd prefer different rows if possible..

db1* is ok, then, @Cmjohnson . I only asked because I may have thought you proposed to rename them. I am more than ok as keeping them as part of the db* family as m4 replica set. Sorry for the noise. having them on separate rows, if possible, would be indeed ideal (we have decom'ed lots of servers, so space should not a problem).

@jcrespo I only recommended that named based on the procurement ticket subject: "eqiad: replacements for db1046 and db1047" I have zero preference to naming.

As for the racking I would suggest

db1107: Take db1036 (T176311) place on B2
db1108: Take db1015 (T173570) on A2

So we would have different racks and different rows.

Note that in theory, these new servers should be 1/2 the size of the original ones, so they should fit there and even have space for extra 1U servers.

Cmjohnson moved this task from Backlog to Up next on the ops-eqiad board.Oct 8 2017, 2:44 PM

Change 383385 had a related patch set uploaded (by Cmjohnson; owner: Cmjohnson):
[operations/dns@master] Adding mgmt dns entries for db1107/1108 T177405

https://gerrit.wikimedia.org/r/383385

Change 383385 merged by Cmjohnson:
[operations/dns@master] Adding mgmt dns entries for db1107/1108 T177405

https://gerrit.wikimedia.org/r/383385

Cmjohnson updated the task description. (Show Details)Oct 10 2017, 6:03 PM

Change 383583 had a related patch set uploaded (by Cmjohnson; owner: Cmjohnson):
[operations/dns@master] Adding production dns for db1107/1108 T177405

https://gerrit.wikimedia.org/r/383583

Change 383583 merged by Cmjohnson:
[operations/dns@master] Adding production dns for db1107/1108 T177405

https://gerrit.wikimedia.org/r/383583

Change 383609 had a related patch set uploaded (by Cmjohnson; owner: Cmjohnson):
[operations/puppet@production] Adding mac address to dhcpd file and updating netboot.cfg file for db1107 and db1108 T177405

https://gerrit.wikimedia.org/r/383609

Change 383609 merged by Cmjohnson:
[operations/puppet@production] Adding macs to dhcpd file & updated netboot.cfg db1107/1108 T177405

https://gerrit.wikimedia.org/r/383609

Cmjohnson updated the task description. (Show Details)Oct 12 2017, 6:46 PM
Cmjohnson updated the task description. (Show Details)

@elukey these are ready for you or @Ottomata Please reassign

One of you can go ahead and apply the same roles than its replacements (IMPORTANT: with notifications disabled on hiera), once they are ready, we DBAs can copy transfer the data for you. Stretch would be suggested.

elukey triaged this task as Medium priority.Oct 18 2017, 10:26 AM
elukey edited projects, added User-Elukey, Analytics-Kanban; removed Patch-For-Review.

Change 384963 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] Set PXE boot options and notification disabled for db110[78]

https://gerrit.wikimedia.org/r/384963

Change 384963 merged by Elukey:
[operations/puppet@production] Set PXE boot options and notification disabled for db110[78]

https://gerrit.wikimedia.org/r/384963

Script wmf-auto-reimage was launched by elukey on neodymium.eqiad.wmnet for hosts:

['db1107.eqiad.wmnet', 'db1108.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201710181254_elukey_13070.log.

Completed auto-reimage of hosts:

['db1108.eqiad.wmnet', 'db1107.eqiad.wmnet']

and were ALL successful.

Change 384979 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] netboot: prevent db110[78] to be reimaged

https://gerrit.wikimedia.org/r/384979

Cmjohnson moved this task from Up next to Blocked on the ops-eqiad board.Oct 18 2017, 5:01 PM

Change 385173 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] Introduce mariadb eventlogging profiles for master/replica

https://gerrit.wikimedia.org/r/385173

Change 384979 merged by Elukey:
[operations/puppet@production] netboot: prevent db110[78] to be reimaged

https://gerrit.wikimedia.org/r/384979

elukey claimed this task.Oct 19 2017, 3:03 PM
elukey moved this task from Next Up to In Progress on the Analytics-Kanban board.
elukey moved this task from Backlog to In Progress on the User-Elukey board.Oct 23 2017, 3:48 PM

Change 385173 merged by Elukey:
[operations/puppet@production] role::mariadb: Introduce mariadb eventlogging profiles for master/replica

https://gerrit.wikimedia.org/r/385173

Next steps:

  1. Create unit files and systemd config for eventlogging_sync.sh and add the guards in puppet to allow trusty/stretch to co-exist (dbstore1002 will keep running with trusty).
  2. Assign the new eventlogging replica role to db1108, make sure that everything is good
  3. Work with Manuel to move data from db1047 to db1108
  4. Start eventlogging_sync on db1108 and let it run alongside with db1047 for some days to spot anomalies
  5. Move the analytics-slave CNAME to db1108
  6. Decom db1047

Change 386346 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] profile::mariadb::misc::eventlogging: add support for systemd

https://gerrit.wikimedia.org/r/386346

Change 386359 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] site.pp: set db1108 as analyics db replica

https://gerrit.wikimedia.org/r/386359

Change 386346 merged by Elukey:
[operations/puppet@production] profile::mariadb::misc::eventlogging: add support for systemd

https://gerrit.wikimedia.org/r/386346

Change 386359 merged by Elukey:
[operations/puppet@production] site.pp: set db1108 as analyics db replica

https://gerrit.wikimedia.org/r/386359

Change 386370 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] profile::mariadb::misc::eventlogging:repl: fix systemd template

https://gerrit.wikimedia.org/r/386370

Change 386370 merged by Elukey:
[operations/puppet@production] profile::mariadb::misc::eventlogging:repl: fix systemd template

https://gerrit.wikimedia.org/r/386370

Change 386371 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] profile::mariadb::misc::eventlogging:repl: fix typo in rsyslog config

https://gerrit.wikimedia.org/r/386371

Change 386371 merged by Elukey:
[operations/puppet@production] profile::mariadb::misc::eventlogging:repl: fix typo in rsyslog config

https://gerrit.wikimedia.org/r/386371

Change 386376 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] profile::mariadb::misc::eventlogging::database: support mariadb 10.1

https://gerrit.wikimedia.org/r/386376

Change 386376 merged by Elukey:
[operations/puppet@production] profile::mariadb::misc::eventlogging::database: support mariadb 10.1

https://gerrit.wikimedia.org/r/386376

Change 386379 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] profile::mariadb::misc::eventlogging: fix systemd unit template

https://gerrit.wikimedia.org/r/386379

Change 386379 merged by Elukey:
[operations/puppet@production] profile::mariadb::misc::eventlogging: fix systemd unit template

https://gerrit.wikimedia.org/r/386379

Mentioned in SAL (#wikimedia-operations) [2017-10-26T07:43:37Z] <marostegui> Stop MySQL on db1047 to copy data over db1108 - T177405 T156844

Mentioned in SAL (#wikimedia-operations) [2017-10-26T07:57:41Z] <marostegui> Drop databases in s1 and s2 from db1047 and unconfigure replication - T177405

Change 386586 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] profile::mariadb::misc::eventlogging::database: set correct mysql params

https://gerrit.wikimedia.org/r/386586

Change 386586 merged by Elukey:
[operations/puppet@production] profile::mariadb::misc::eventlogging::database: set correct mysql params

https://gerrit.wikimedia.org/r/386586

db1047's data has been migrated and imported to db1108

Change 388021 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] Enable icinga notifications for db1108

https://gerrit.wikimedia.org/r/388021

Change 388021 merged by Elukey:
[operations/puppet@production] Enable icinga notifications for db1108

https://gerrit.wikimedia.org/r/388021

Mentioned in SAL (#wikimedia-operations) [2017-11-07T10:24:48Z] <elukey> create staging database on db1108 (researchers scratch pad) - T177405

Change 389922 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/software@master] s1,s2.hosts: Remove db1047

https://gerrit.wikimedia.org/r/389922

Change 389922 merged by jenkins-bot:
[operations/software@master] s1,s2.hosts: Remove db1047

https://gerrit.wikimedia.org/r/389922

Change 391182 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] Refactor evenlogging database hosts definition

https://gerrit.wikimedia.org/r/391182

Change 391182 merged by Jcrespo:
[operations/puppet@production] Refactor evenlogging database hosts definition

https://gerrit.wikimedia.org/r/391182

Mentioned in SAL (#wikimedia-operations) [2017-11-15T09:15:19Z] <marostegui> Stop mysql on db1046 to transfer its content to db1107 - T177405

Change 391519 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] site.pp: add role to db1107 (new eventlogging master db)

https://gerrit.wikimedia.org/r/391519

Change 391519 merged by Elukey:
[operations/puppet@production] site.pp: add role to db1107 (new eventlogging master db)

https://gerrit.wikimedia.org/r/391519

Change 391537 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] profile::mariadb::misc::eventlogging::database: fix hearthbeat settings

https://gerrit.wikimedia.org/r/391537

Change 391537 merged by Elukey:
[operations/puppet@production] profile::mariadb::misc::eventlogging::database: fix hearthbeat settings

https://gerrit.wikimedia.org/r/391537

elukey moved this task from In Progress to Done on the Analytics-Kanban board.Nov 15 2017, 4:03 PM
elukey closed this task as Resolved.Nov 15 2017, 5:56 PM

All the work has been completed, closing the task!

Change 393220 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] Fix prometheus target for the Eventlogging mysql master db

https://gerrit.wikimedia.org/r/393220

Change 393220 merged by Elukey:
[operations/puppet@production] Fix prometheus target for the Eventlogging mysql master db

https://gerrit.wikimedia.org/r/393220