Page MenuHomePhabricator

(Need By: TBD) upgrade ram in an-master100[12]
Closed, ResolvedPublic

Description

This task will track the scheduling and upgrade of the RAM in an-master100[12] servers. This memory upgrade was ordered via T257403.

an-master1001:

  • - schedule downtime window with @elukey on Analytics team for this work.
  • - analytics team depools server from services for maint
  • - put host into maint mode on icinga for duration of the downtime window
  • - check memory output on host to ensure all currently installed memory is detected properly (command: lshw -class memory)
  • - power down host, install half of the RAM ordered on T257403, doubling the memory in the host.
  • - power back up host, monitor RAM self test, ensure new memory is detected via lshw -class memory
  • - hand server back to Analytics to return to service, remove maint mode on icinga.

an-master1002:
[]x - schedule downtime window with @elukey on Analytics team for this work.

  • - analytics team depools server from services for maint
  • - put host into maint mode on icinga for duration of the downtime window
  • - check memory output on host to ensure all currently installed memory is detected properly (command: lshw -class memory)
  • - power down host, install half of the RAM ordered on T257403, doubling the memory in the host.
  • - power back up host, monitor RAM self test, ensure new memory is detected via lshw -class memory
  • - hand server back to Analytics to return to service, remove maint mode on icinga.

Please note when all DC-Ops steps are completed on this task, this task will be resolved. If there need to be sub-team followups due to this task, they should have their own tasks created.

Related Objects

Event Timeline

RobH triaged this task as Medium priority.Jul 29 2020, 5:14 PM
RobH created this task.
RobH added a parent task: Unknown Object (Task).Jul 29 2020, 5:14 PM

Please note that while I just created this task, the actual memory has NOT yet been placed to order. It was escalated for approvals and placement today.

10:15 < robh> : So we have a number (at least 3) tasks for upgrading memory in existing hosts
10:15 < robh> : ive just been pushing the actual upgrade task into 'racking tasks' as its the closest fit
10:16 < robh> : but if you eqiad folks want them somewhere else lemme know
10:16 < robh> : cmjohnson1: jclark-ctr ^
10:17 < cmjohnson1> : They will get lost in racking tasks. Put in hardware repair

@elukey. ram has arrived how long will it take to drain host? and how long can they be down?

@Jclark-ctr I'd say 5/10 minutes for each host to do proper failover, and the host can stay down even for half an hour but better if less of course :)

@elukey Same thing with these...can we do them all Monday or will you need multiple days?

Cmjohnson updated the task description. (Show Details)

Task complete