Page MenuHomePhabricator

hw troubleshooting: <type of hardware failre> for <fqhn of server>
Closed, InvalidPublicRequest

Description

This task will track the hardware troubleshooting and repair of server <enter FQDN of server here>.

The first half of the steps should be completed by the person filing the hardware repair task. Some of these steps require access to Icinga to put a host into maintenance mode.

General Checklist

Steps while filling out template:

  • - Pull up hardware troubleshooting runbook for step by step directions.
  • - Update all fields with <> on this task with the required info.
  • - Determine location and warranty information of device in netbox, and provide in following:
  • - System warranty expires on: <enter warranty expiry info from netbox>
  • - Provide urgency of request, along with justification (redundancy, details of service cluster, number of hosts in cluster, etc) if attention is needed immediately. : <enter in urgency level/justification>
  • - Attach detailed hardware failure log from above hw troubleshooting runbook. <enter logs here>
  • - Append correct project tag for server location & assign to proper user (see above) for site.

Failure Specific Checklists

See hardware troubleshooting runbook, for applicable type of hw failure. This template is for ALL hardware failures. You can DELETE other sections unrelated to your hardware failure.

Example: If you have a memory failure, delete the sections for Powersupply & Disk failures, leaving only the section for 'All Other Failures'

HDD/SSD Failure
Power Supply Failure
All Other Failures (memory, battery, controller, cpu, etc.)
  • - Follow directions on https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook#All_Other_Failures
  • - Place system in fully offline state for hardware troubleshooting by the onsite. If it cannot be placed offline, coordinate with on-site engineer to schedule a maintenance window.
  • - Set system and mgmt interface to maint mode (no checks/alarms on services) for 5 business days (excluding weekends).
  • - attach detailed hardware failure log to this task via comment, see hardware troubleshooting runbook on how to accomplish this.
  • - Assign task to proper assignee & onsite project, and place in 'hardware troubleshooting' column on workboard.

Event Timeline

Restricted Application added subscribers: Liuxinyu970226, Aklapper. · View Herald TranscriptAug 6 2019, 5:43 PM
wiki_willy closed this task as Invalid.Aug 6 2019, 5:45 PM

created task as a test. resolving.

Restricted Application removed a subscriber: Liuxinyu970226. · View Herald TranscriptAug 6 2019, 5:45 PM