Page MenuHomePhabricator

install/setup/deploy ms-be2016-2021
Closed, ResolvedPublic

Description

This is the tracking task for the overall implementation of ms-be2016-2021 from handoff from the onsite (@Papaul) for installation.

Some of the steps below have already been completed during the onsite racking in T114712.

  • - setup/test mgmt
  • - mgmt dns entries
  • - production dns entries
  • - switch port description and vlan updates
  • - install_server module updates
  • - OS installation
  • - puppet/salt key sign/accept
  • - service implementation

Event Timeline

RobH raised the priority of this task from to Needs Triage.
RobH updated the task description. (Show Details)
RobH added projects: SRE, SRE-swift-storage.
RobH added subscribers: RobH, fgiunchedi, Papaul.
chasemp set Security to None.

Change 250687 had a related patch set uploaded (by Filippo Giunchedi):
codfw-prod: add ms-be2016 / ms-be2018 / ms-be2020 at weight 1000

https://gerrit.wikimedia.org/r/250687

allocation plans swift-wise for codfw:

  • 3x machines in different zones at weight 1000 increments until weight 4000
  • add the other 3x machines at weight 1000 increments until weight 4000

ditto for eqiad, having live traffic though we should bring up weight to 3000 and then evaluate in smaller increments if going all the way to 4000 makes sense

after @Papaul installed the OS:

ms-be2016,2017 - added to puppet/salt/icinga yesterday
ms-be2018,2019 - added to puppet/salt/icinga today

because there is already a role swift::storage for node /^ms-be20(1[6-9]|2[0-1])\.codfw\.wmnet$/

they are already getting all the swift things and monitoring just after i signed the certs:

https://icinga.wikimedia.org/cgi-bin/icinga/status.cgi?host=ms-be2016&nostatusheader
https://icinga.wikimedia.org/cgi-bin/icinga/status.cgi?host=ms-be2017&nostatusheader

ms-be2020/2021 reinstalled, and they switched names around. but they don't have signed puppet certs yet

ms-be2020/2021 reinstalled by @Papaul.

fixed puppet certs and added to puppetmaster. you can see them appearing in Icinga now.

Dzahn updated the task description. (Show Details)

@filippo fyi, the last 2 are also in puppet now and in monitoring and already have all the swift checks from the role.

this is for you to check the last checkbox. confirm the service is implemented and this is done.

thanks Daniel! I'll track the swift expansion here

Change 250687 merged by Filippo Giunchedi:
codfw-prod: add ms-be2016 / ms-be2018 / ms-be2020 at weight 1000

https://gerrit.wikimedia.org/r/250687

all machines in service at weight 3000, pending full provisioning and testing of eqiad at weight 4000

given tests at weight 4000 on ms-be1019 in T118183: add ms-be1019 / 1020 / 1021 to swift I've bumped ms-be2016 and ms-be2017 weight to 3500 to lessen the load on older nodes and get more disk utilization, more to follow since codfw has 6 newer machines with 4tb disks

ms-be2016 -> ms-be2021 now to weight 3500, resolving