Page MenuHomePhabricator

add SSDs to wdqs200[12]
Closed, ResolvedPublic

Description

This task will be used to coordinate a time between @Gehel and @Papaul to add the new disks to wdqs200[12], purchased on T198657

Adding the disks (not replacing existing) shouldn't result in any downtime, but we should watch it carefully just to be certain. Then @Gehel will need to take over on manually extending the software raid/lvm onto the new disks to use the space.

Related Objects

StatusSubtypeAssignedTask
ResolvedGehel
ResolvedSmalyshev

Event Timeline

RobH triaged this task as Medium priority.Aug 24 2018, 9:46 PM
RobH created this task.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptAug 24 2018, 9:46 PM

@Papaul: I'm ready to reimage wdqs2002 today. Ping me when you're around and I'll shut it down.

Mentioned in SAL (#wikimedia-operations) [2018-08-29T13:39:55Z] <gehel> shutting down wdqs2001 for new SSD and reimaging - T202777

Papaul removed Papaul as the assignee of this task.Aug 29 2018, 2:39 PM

@Gehel Disks added to wdqs2001

Script wmf-auto-reimage was launched by gehel on neodymium.eqiad.wmnet for hosts:

['wdqs2001.codfw.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201808291511_gehel_4726.log.

Script wmf-auto-reimage was launched by gehel on neodymium.eqiad.wmnet for hosts:

['wdqs2001.codfw.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201808291558_gehel_13621.log.

Gehel added a comment.Aug 29 2018, 4:09 PM

error during reimage of wdqs2001:

┌────────────────────┤ [!!] Partition disks ├─────────────────────┐       
│                                                                 │       
│                   Error while setting up RAID                   │       
│ An unexpected error occurred while setting up a preseeded RAID  │       
│ configuration.                                                  │       
│                                                                 │       
│ Check /var/log/syslog or see virtual console 4 for the details. │       
│                                                                 │       
│     <Go Back>                                    <Continue>     │       
│                                                                 │       
└─────────────────────────────────────────────────────────────────┘

extract from /var/log/syslog:

Aug 29 16:03:37 debconf: --> RESET partman-md/confirm_nooverwrite
Aug 29 16:03:37 debconf: <-- 0
Aug 29 16:03:37 apt-install: Queueing package mdadm for later installation
Aug 29 16:03:37 debconf: --> GET partman-auto-raid/recipe
/sdb2#/dev/sdc2#/dev/sdd2 . 10  4       0       lvm    - /dev/sda3#/dev/sdb3#/d.
Aug 29 16:03:37 debconf: --> GET partman-auto-raid/raidnum
Aug 29 16:03:37 debconf: <-- 0 
Aug 29 16:03:37 debconf: --> SET partman-auto-raid/raidnum 0
Aug 29 16:03:37 debconf: <-- 0 value set
Aug 29 16:03:37 partman-auto-raid: Selected spare count: 0
Aug 29 16:03:37 partman-auto-raid: Spare devices count: 0
Aug 29 16:03:37 partman-auto-raid: mdadm: cannot open /dev/sdc2: No such file oy
Aug 29 16:03:37 partman-auto-raid: Error creating array /dev/md0
Aug 29 16:03:37 debconf: --> SET partman-auto-raid/error false
Aug 29 16:03:37 debconf: <-- 0 value set
Aug 29 16:03:37 debconf: --> INPUT critical partman-auto-raid/error
Aug 29 16:03:37 debconf: <-- 0 question will be asked
Aug 29 16:03:37 debconf: --> GO

fdisk -l only shows sda and sdb as disks.

Script wmf-auto-reimage was launched by gehel on neodymium.eqiad.wmnet for hosts:

['wdqs2001.codfw.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201808291703_gehel_27095.log.

Completed auto-reimage of hosts:

['wdqs2001.codfw.wmnet']

and were ALL successful.

Addshore moved this task from incoming to monitoring on the Wikidata board.Aug 30 2018, 9:06 AM

Mentioned in SAL (#wikimedia-operations) [2018-09-05T14:10:41Z] <gehel> shutting down wdqs2002 for new SSD and reimage - T202777

Script wmf-auto-reimage was launched by gehel on neodymium.eqiad.wmnet for hosts:

['wdqs2002.codfw.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201809051602_gehel_6775.log.

Completed auto-reimage of hosts:

['wdqs2002.codfw.wmnet']

and were ALL successful.

New SSD in place, server reimaged and data reimported. We're all good!

Smalyshev closed this task as Resolved.Sep 12 2018, 5:01 AM
Smalyshev claimed this task.