Page MenuHomePhabricator

eqiad: rack/setup/install (4) dbproxy systems.
Closed, ResolvedPublic0 Estimated Story Points

Description

This task will track the racking and setup of 4 new dbproxy systems.

Racking Proposal:
dbproxy1018 - C5, this host will replace dbproxy1010 so please make sure it is in the same VLAN as that one
dbproxy1019 - C5, this host will replace dbproxy1011 so please make sure it is in the same VLAN as that one
dbproxy1020 - Anywhere in row D
dbproxy1021 - Anywhere in row D

Hostname Proposal: dbproxy1018, dbproxy1019, dbproxy1020, dbproxy1021

dbproxy1018:

  • - receive in system on procurement task T213765 (make sure this task T213765 is listed as the procurement task in netbox, not this racking task.)
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
  • - bios/drac/serial setup/testing
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, vlan)
    • end on-site specific steps
  • - production dns entries added
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation
  • - puppet accept/initial run (with role:spare)
  • - host state in netbox set to staged
  • - handoff for service implementation
  • - service implementer changes from 'staged' status to 'active' status in netbox'

dbproxy1019:

  • - receive in system on procurement task T213765 (make sure this task T213765 is listed as the procurement task in netbox, not this racking task.)
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
  • - bios/drac/serial setup/testing
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, vlan)
    • end on-site specific steps
  • - production dns entries added
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation
  • - puppet accept/initial run (with role:spare)
  • - host state in netbox set to staged
  • - handoff for service implementation
  • - service implementer changes from 'staged' status to 'active' status in netbox'

dbproxy1020:

  • - receive in system on procurement task T213765 (make sure this task T213765 is listed as the procurement task in netbox, not this racking task.)
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
  • - bios/drac/serial setup/testing
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, vlan)
    • end on-site specific steps
  • - production dns entries added
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation
  • - puppet accept/initial run (with role:spare)
  • - host state in netbox set to staged
  • - handoff for service implementation
  • - service implementer changes from 'staged' status to 'active' status in netbox'

dbproxy1021:

  • - receive in system on procurement task T213765 (make sure this task T213765 is listed as the procurement task in netbox, not this racking task.)
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
  • - bios/drac/serial setup/testing
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, vlan)
    • end on-site specific steps
  • - production dns entries added
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation
  • - puppet accept/initial run (with role:spare)
  • - host state in netbox set to staged
  • - handoff for service implementation
  • - service implementer changes from 'staged' status to 'active' status in netbox'

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

I don't know if 2 or 3, depending on the needs of the others. There was discussion with cloud if to also put a proxy in front of toolsdb.

Assigning this to myself to let Chris know that this is still blocked on DBAs to decide.
So for now 2 of them will go to replace 1010 and 1011 for sure.

@Bstorm @bd808 any comments on T225704#5261972?

More like: we need 1 for m5, something else?

m5 at the moment doesn't use the proxies (I know it should but they are not being used at the moment) (T202367#5252689)

Thinking more, as toolsdb was canibalized by openstack, maybe its potential proxies should too. I guess 2/2 is the safe option right now. Sorry, but I didn't think too much about this in advance. Cloud input would be nice of future service expansion and general load balancing/failover needs.

In addition, we may even want to call the dbproxies labsdbproxy or cloudproxy or something else (wikireplica service).

Thinking more, as toolsdb was canibalized by openstack, maybe its potential proxies should too. I guess 2/2 is the safe option right now. Sorry, but I didn't think too much about this in advance. Cloud input would be nice of future service expansion and general load balancing/failover needs.

2/2 meaning 2 for cloud (to replace 1010 and 1011) and 2 for other usages (misc, core..)?

2/2 meaning 2 for cloud (to replace 1010 and 1011) and 2 for other usages (misc, core..)?

Yes.

Thanks! I will update the task accordingly to reflect this discussion on top so it is easier for Chris

Marostegui updated the task description. (Show Details)
Marostegui moved this task from In progress to Blocked external/Not db team on the DBA board.

@Cmjohnson I have updated the task with the racking proposal at the beginning.
Thanks!

No name change? I do not mind, just want to make sure it is a conscious decision.

No name change? I do not mind, just want to make sure it is a conscious decision.

I would prefer not to change them for now as I would like them to be as generic as possible.

@Bstorm @bd808 any comments on T225704#5261972?

I think that if we need a proxy in front of ToolsDB we should probably do that with an instance (or pair of instances) inside Cloud VPS now that the db is there.

For the wiki replica pool it makes sense to keep using bare metal servers in the dbproxy pool.

Change 517464 had a related patch set uploaded (by Cmjohnson; owner: Cmjohnson):
[operations/dns@master] Adding mgmt dns for dbprox1018-22

https://gerrit.wikimedia.org/r/517464

Change 517464 merged by Cmjohnson:
[operations/dns@master] Adding mgmt dns for dbprox1018-22

https://gerrit.wikimedia.org/r/517464

Cmjohnson added a subscriber: ayounsi.

Assigning to @ayounsi to add cloud-support1-d-eqiad. Once that is done, the vlan for dbproxy1020 and 1021 will need to be set up. Switch port descriptions are done.

@Cmjohnson which ones will go in the cloud vlan finally?
1018 and 1019 or 1020 and 1021?
I'm fine either way but I'm confused with your last comment :)

@Marostegui: do they all go to the cloud vlan? if they do then 1020 and 1021 are in row D...that support-cloud vlan is not available on row D yet. I need Arzhel to copy the vlan over.

The cloud support vlan/network is legacy, so I'd rather not create a new one (in a new row).
As we already have cloud-support1-a-eqiad and cloud-support1-c-eqiad (row A and C) could dbproxy1020/1021 be in row A instead?

Yep! Not a problem, I don't mind which hosts as long as we have two on that VLAN, whichever ones work best for you

1018 and 1019 are ok to go to cloud VLAN from my side (as they are in row C)
We just need two hosts on that vlan

@ayounsi I rather not move the servers...I racked them based on the instructions and they're already in racks and setup

@Cmjohnson I'm pretty sure they will have to move :( Creating a new vlan and all the supporting config (DHCP, routing, IP allocation, etc) is a non trivial task for a vlan that is not expected to grow past those 2 hosts.

@Marostegui do you confirm those need to be in a cloud-support vlan and not in the regular public/private?

So, to be clear from my side:

Out of those 4 hosts, which 2 are on C row and 2 are on D row, we need 2 of them (I don't mind which ones) to be in the same VLAN as dbproxy1010 and dbproxy1011 as they will be replaced.

Ok good so that's 1018/1019).
In which vlan the other 2 (dbproxy1020/1021) need to go then?

dbproxy1020/1021 can go the same vlans as dbproxy1001-1008 as those will be replacing some of those

dbproxy1001-1008 are in the private vlans across row A-B-C, none in D. Is row D private fine for dbproxy1020/1021 or should they be in private-A/B/C ?

dbproxy1001-1008 are in the private vlans across row A-B-C, none in D. Is row D private fine for dbproxy1020/1021 or should they be in private-A/B/C ?

What's the difference on those VLANs?
dbproxy1020/1021 should be able to reach anything dbproxy1001-1008 can access

No differences other than the physical row they're in. They will be able to reach the same resources.

@Cmjohnson so dbproxy1020/1021 will go in private1-d-eqiad.

No differences other than the physical row they're in. They will be able to reach the same resources.

Great! Thanks :-)

I updated the switch config to private1-d.....both servers are currently off and ready for installs. assigning to @RobH to install

@RobH if you add the production DNS entries, I can take care of the installations myself

Change 518059 had a related patch set uploaded (by RobH; owner: RobH):
[operations/dns@master] setting production dns for new dbproxy systems

https://gerrit.wikimedia.org/r/518059

Change 518059 merged by RobH:
[operations/dns@master] setting production dns for new dbproxy systems

https://gerrit.wikimedia.org/r/518059

RobH removed a project: Patch-For-Review.
RobH updated the task description. (Show Details)

Assigned to @Marostegui per irc sync up (dns records are live.)

Script wmf-auto-reimage was launched by marostegui on cumin1001.eqiad.wmnet for hosts:

['dbproxy1018.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201906210500_marostegui_9787.log.

Script wmf-auto-reimage was launched by marostegui on cumin1001.eqiad.wmnet for hosts:

['dbproxy1020.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201906210526_marostegui_15425.log.

Script wmf-auto-reimage was launched by marostegui on cumin1001.eqiad.wmnet for hosts:

['dbproxy1019.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201906210534_marostegui_16679.log.

Script wmf-auto-reimage was launched by marostegui on cumin1001.eqiad.wmnet for hosts:

['dbproxy1021.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201906210534_marostegui_16981.log.

Change 518197 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] install_server: Add MAC address for dbproxy1018

https://gerrit.wikimedia.org/r/518197

Change 518197 merged by Marostegui:
[operations/puppet@production] install_server: Add MAC address for dbproxy1018

https://gerrit.wikimedia.org/r/518197

Completed auto-reimage of hosts:

['dbproxy1018.eqiad.wmnet']

Of which those FAILED:

['dbproxy1018.eqiad.wmnet']

Change 518203 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] install_server: Add MAC for dbproxies

https://gerrit.wikimedia.org/r/518203

Change 518203 merged by Marostegui:
[operations/puppet@production] install_server: Add MAC for dbproxies

https://gerrit.wikimedia.org/r/518203

Script wmf-auto-reimage was launched by marostegui on cumin1001.eqiad.wmnet for hosts:

['dbproxy1018.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201906210623_marostegui_27386.log.

Completed auto-reimage of hosts:

['dbproxy1020.eqiad.wmnet']

Of which those FAILED:

['dbproxy1020.eqiad.wmnet']

Script wmf-auto-reimage was launched by marostegui on cumin1001.eqiad.wmnet for hosts:

['dbproxy1020.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201906210630_marostegui_28949.log.

Completed auto-reimage of hosts:

['dbproxy1019.eqiad.wmnet']

Of which those FAILED:

['dbproxy1019.eqiad.wmnet']

Completed auto-reimage of hosts:

['dbproxy1021.eqiad.wmnet']

Of which those FAILED:

['dbproxy1021.eqiad.wmnet']

Script wmf-auto-reimage was launched by marostegui on cumin1001.eqiad.wmnet for hosts:

['dbproxy1021.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201906210639_marostegui_31413.log.

Completed auto-reimage of hosts:

['dbproxy1020.eqiad.wmnet']

and were ALL successful.

Completed auto-reimage of hosts:

['dbproxy1021.eqiad.wmnet']

and were ALL successful.

Completed auto-reimage of hosts:

['dbproxy1018.eqiad.wmnet']

Of which those FAILED:

['dbproxy1018.eqiad.wmnet']
Marostegui updated the task description. (Show Details)
Marostegui added a subscriber: ayounsi.

@Cmjohnson @ayounsi is there anything special with dbproxy1018 and dbproxy1019 VLAN's and PXE? None of the seems to be booting up from PXE, despite that the MACs I added on tftpboot are the same ones that the IDRAC show it is trying to boot up from:
dbproxy1018 4C:D9:8F:6C:A5:9E https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/518197/1/modules/install_server/files/dhcpd/linux-host-entries.ttyS1-115200
dbproxy1019 4C:D9:8F:6C:9F:2F https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/518203/2/modules/install_server/files/dhcpd/linux-host-entries.ttyS1-115200

Script wmf-auto-reimage was launched by marostegui on cumin1001.eqiad.wmnet for hosts:

['dbproxy1018.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201906210750_marostegui_45255.log.

Change 518216 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/dns@master] wmnet: Change dbproxy1018,dbproxy1019 IPs to be in cloud

https://gerrit.wikimedia.org/r/518216

Change 518216 merged by Marostegui:
[operations/dns@master] wmnet: Change dbproxy1018,dbproxy1019 IPs to be in cloud

https://gerrit.wikimedia.org/r/518216

While debugging we Arzhel we have noticed that the DNS entries for dbproxy1018 and dbproxy1019 didn't belong to the cloud network, I have changed them and I will to install again.

Completed auto-reimage of hosts:

['dbproxy1018.eqiad.wmnet']

Of which those FAILED:

['dbproxy1018.eqiad.wmnet']

Script wmf-auto-reimage was launched by marostegui on cumin1001.eqiad.wmnet for hosts:

['dbproxy1018.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201906210855_marostegui_58624.log.

Script wmf-auto-reimage was launched by marostegui on cumin1001.eqiad.wmnet for hosts:

['dbproxy1019.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201906210911_marostegui_63492.log.

Completed auto-reimage of hosts:

['dbproxy1018.eqiad.wmnet']

and were ALL successful.

Completed auto-reimage of hosts:

['dbproxy1019.eqiad.wmnet']

and were ALL successful.

Marostegui updated the task description. (Show Details)

All hosts installed