Page MenuHomePhabricator

Rack and setup db1115 (tendril replacement database)
Closed, ResolvedPublic

Description

This task will track the racking/setup/installation of the database that will replace the current tendril database ordered on: T184808

Suggested rack (@Cmjohnson to confirm if possible): A6

db1115

  • receive in system on procurement task T184808
  • bios/drac/serial setup/testing
  • configure software raid1 recipe by @jcrespo or @Marostegui
  • mgmt dns entries added for both asset tag and hostname
  • production dns entries added
  • network port setup
  • operations/puppet update
  • OS installation
  • puppet/salt accept/initial run
  • handoff for service implementation

Event Timeline

Marostegui triaged this task as Normal priority.Jan 26 2018, 11:25 PM
Marostegui created this task.
Marostegui created this object in space Restricted Space.
Restricted Application added a project: procurement. · View Herald TranscriptJan 26 2018, 11:25 PM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript
Marostegui renamed this task from Rack and setup db1115 to Rack and setup db1115 (tendril replacement database).Jan 26 2018, 11:25 PM
Marostegui shifted this object from the Restricted Space space to the S1 Public space.
Marostegui moved this task from Triage to Blocked external/Not db team on the DBA board.

Change 406375 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] install_server: Allow install db1115

https://gerrit.wikimedia.org/r/406375

Change 406375 merged by Marostegui:
[operations/puppet@production] install_server: Allow install db1115

https://gerrit.wikimedia.org/r/406375

@Marostegui- this is bad, codfw machine was created as tendril2001 T186123, and this was called db1115. This is not a terrible name for it (I would call it a separate name as misc hosts), but it should have the same name on both datacenters.

@Marostegui- this is bad, codfw machine was created as tendril2001 T186123, and this was called db1115. This is not a terrible name for it (I would call it a separate name as misc hosts), but it should have the same name on both datacenters.

We can change it now, nothing has been installed yet but, we cannot use db1001 (which would be the equivalent) though. Suggestions?

Sorry, nevermind. We can use tendril1001

Marostegui renamed this task from Rack and setup db1115 (tendril replacement database) to Rack and setup tendril1001 (tendril replacement database).Feb 2 2018, 9:19 AM
Marostegui updated the task description. (Show Details)

Suggestions?

The easy thing would be call it tendril1001 (which I do not 100% like, as this will be a monitoring db database, but dbmonitor already exists, but it points to the web frontend), the more difficult thing would be to call it something else and rename tendril2001 as that something else.

Change 407605 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] install_server: Replace db1115 with tendril1001

https://gerrit.wikimedia.org/r/407605

Suggestions?

The easy thing would be call it tendril1001 (which I do not 100% like, as this will be a monitoring db database, but dbmonitor already exists, but it points to the web frontend), the more difficult thing would be to call it something else and rename tendril2001 as that something else.

To be honest, I would call it as a normal database name, to avoid making any kind of exception and having dedicated hostnames (that is why I used db1115) as we never know if we might decide to add more databases to it in the future

To be honest, I would call it as a normal database name, to avoid making any kind of exception and having dedicated hostnames

Which would be ok to me, as long as the other is renamed.

To be honest, I would call it as a normal database name, to avoid making any kind of exception and having dedicated hostnames

Which would be ok to me, as long as the other is renamed.

Sure, we can do that and call it db2092.
I will rename this back to db1115 then

Change 407605 abandoned by Marostegui:
install_server: Replace db1115 with tendril1001

https://gerrit.wikimedia.org/r/407605

Marostegui renamed this task from Rack and setup tendril1001 (tendril replacement database) to Rack and setup db1115 (tendril replacement database).Feb 2 2018, 9:37 AM
Marostegui updated the task description. (Show Details)

Note for that one means involving papaul and renaming stuff, from the physical label to racktables, to dns, etc.

Change 407610 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] install_server: Change db.cfg with raid1-gpt.cfg

https://gerrit.wikimedia.org/r/407610

Note for that one means involving papaul and renaming stuff, from the physical label to racktables, to dns, etc.

Yep, I was writing on the other task at this moment

All db hosts will have a hw RAID except these, it will be confusing.

All db hosts will have a hw RAID except these, it will be confusing.

ok - I am going to stop making patches and renaming tasks until a name is decided because this is getting a bit of a back and forth and it is going to create more confusion.

Marostegui changed the task status from Open to Stalled.Feb 2 2018, 10:00 AM

Change 407610 merged by Marostegui:
[operations/puppet@production] install_server: Change db.cfg with raid1-gpt.cfg

https://gerrit.wikimedia.org/r/407610

Cmjohnson moved this task from Backlog to Up next on the ops-eqiad board.Feb 2 2018, 4:39 PM
RobH added a subscriber: RobH.Feb 5 2018, 5:28 PM

Why did we call this tendril2001 in codfw, but db1115 in eqiad?

See discussion at T186123 and starting at: T185788#3940445
(basically when I created this ticket I didn't know it was being decided to call it tendril2001 in codfw) but there is that discussion still on-going :-)

RobH added a comment.Feb 5 2018, 5:39 PM

See discussion at T186123 and starting at: T185788#3940445
(basically when I created this ticket I didn't know it was being decided to call it tendril2001 in codfw) but there is that discussion still on-going :-)

Sorry about that, missed that reply in my quick grep, my bad! So yeah, lets just keep naming standardized between sites. I don't have a strong preference on which ends up being used (though I prefer some kind of distinction like tendril, I understand why it may not be preferred.)

racked in A6 wmf7316

Cmjohnson updated the task description. (Show Details)Feb 13 2018, 9:41 PM

Change 410341 had a related patch set uploaded (by Cmjohnson; owner: Cmjohnson):
[operations/dns@master] Adding mgmt and prodcution dns for db1115

https://gerrit.wikimedia.org/r/410341

Change 410341 merged by Cmjohnson:
[operations/dns@master] Adding mgmt and prodcution dns for db1115

https://gerrit.wikimedia.org/r/410341

Cmjohnson updated the task description. (Show Details)Feb 13 2018, 9:55 PM

Chris, can you hold this task?
There are under going discussions about the hostname, I marked the task as stalled but I should've said it was blocked - sorry about that

@jcrespo what would be your proposals? As I already stated mine (either tendrid1001 and/or normal dbXXXX are fine with me), but they were not too satisfying, I would like to hear yours so we can discuss them, reach and agreement and unblock our DC-Ops :-)

RobH reassigned this task from Cmjohnson to Marostegui.Feb 14 2018, 3:55 PM

@Marostegui:

I'm assigning this to you for stalled until you provide feedback as requested. Please assign to Chris once this is ready to proceed.

Marostegui changed the task status from Stalled to Open.Feb 15 2018, 2:23 PM
Marostegui reassigned this task from Marostegui to Cmjohnson.

Hi, this can now proceed with the current hostname (db1115)
Thanks!

Marostegui changed the task status from Stalled to Open.Feb 15 2018, 2:23 PM
Marostegui reassigned this task from Marostegui to Cmjohnson.
This comment was removed by Marostegui.
Cmjohnson updated the task description. (Show Details)Feb 20 2018, 4:41 PM

@Marostegui This server only has 2 4TB disks, with no raid card. This will need a software raid. Let me know if you want to reconsider the naming.

^that was my main reason to object to not call it db*, all db* hosts have a hardware raid and are, to some extent, interchangable.

It's not too late to rename it to tendril1001...easy dns change right now

^that was my main reason to object to not call it db*, all db* hosts have a hardware raid and are, to some extent, interchangable.

I am seriously tired of having this discussion again. I already stated my preference and the fact that I don't really care about this hostname or any other one we want to choose. I was fine either way.
I believe you said you were neither convinced with db* nor tendril*. But last Monday we agreed on going for this model.

So if you are still not convinced I would suggest you give some proposal and I will be fine with it most likely. Having this blocked in a hostname decision for many days is not very productive.

no, tendril is definitely not ok.

jcrespo added a comment.EditedFeb 20 2018, 5:06 PM

I did not reopen the discussion this, Chris did. And if our decision confuses people, maybe it wasn't that good? dbmon-be could be an alternative. We can later rename dbmonitor as dbmon-fe.

@Cmjohnson, please proceed with db1115 as @jcrespo and myself agreed on that hostname yesterday in our weekly meeting.

@Marostegui Okay, since I cannot do standard DB raid...any suggestions?

Do nothing, we (the recipe) will install the RAID1 in software.

$ git grep db1115
modules/install_server/files/autoinstall/netboot.cfg:        db1115|db2093) echo partman/raid1-gpt.cfg ;; \
Cmjohnson updated the task description. (Show Details)Feb 20 2018, 6:14 PM

Change 412973 had a related patch set uploaded (by Cmjohnson; owner: Cmjohnson):
[operations/puppet@production] Adding dhcpd entry db1115

https://gerrit.wikimedia.org/r/412973

Change 412973 merged by Cmjohnson:
[operations/puppet@production] Adding dhcpd entry db1115

https://gerrit.wikimedia.org/r/412973

jcrespo updated the task description. (Show Details)Feb 20 2018, 6:21 PM
Cmjohnson updated the task description. (Show Details)Feb 21 2018, 3:11 PM

@jcrespo and @Marostegui This is all yours. Please resolve once verified. Thanks!

Marostegui closed this task as Resolved.Feb 21 2018, 3:31 PM

Thanks @Cmjohnson the host looks good.
We can continue the service setup at T184704

Marostegui updated the task description. (Show Details)Feb 21 2018, 3:31 PM