Page MenuHomePhabricator

rack/setup/install db2102.codfw.wmnet as a testing host for codfw backups
Closed, ResolvedPublic

Description

This task will track the installation of <hostname tbd> in codfw as a testing host for codfw backups.

This was ordered off of T214069, but is 1 of 6 systems on that task/order.

Racking Proposal: Any 1G rack is fine

db2102.codfw.wmnet: Row C Rack C5

  • - receive in system on procurement task T214069
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location)
  • - bios/drac/serial setup/testing
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, vlan)
    • end on-site specific steps
  • - production dns entries added
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation
  • - puppet accept/initial run
  • - handoff for service implementation T220572

Event Timeline

RobH triaged this task as Medium priority.Mar 27 2019, 10:00 PM
RobH created this task.
RobH created this object in space Restricted Space.
Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Ok, this needs some updates from the DBA team, either @jcrespo or @Marostegui can provide feedback on racking restrictions, hostname use, vlan, and other setup details. Once provided, please assign to @Papaul.

RobH shifted this object from the Restricted Space space to the S1 Public space.Mar 27 2019, 10:06 PM
Marostegui updated the task description. (Show Details)

Done.
Racking: Any 1G rack is fine.
Hostname: db2102.codfw.wmnet

Change 499706 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] install_server: Allow image of the new codfw DBs.

https://gerrit.wikimedia.org/r/499706

Change 499706 merged by Marostegui:
[operations/puppet@production] install_server: Allow image of the new codfw DBs.

https://gerrit.wikimedia.org/r/499706

vlan: private
This will be like any other production db-hosts, documented here: https://wikitech.wikimedia.org/wiki/Raid_setup

switch port information
asw-c5-codfw ge-5/0/12

RobH renamed this task from rack/setup/install (1) testing host for codfw backups to rack/setup/install db2102.codfw.wmnet as a testing host for codfw backups.Mar 28 2019, 3:15 PM

Change 502651 had a related patch set uploaded (by Papaul; owner: Papaul):
[operations/dns@master] DNS: Add mgmt and prodcution DNS for db209[7-9] db210[0-2]

https://gerrit.wikimedia.org/r/502651

Note that despite this host being separate than T219463 for setup because it will have a different puppet role, hardware wise doesn't have any difference.

Change 502824 had a related patch set uploaded (by Papaul; owner: Papaul):
[operations/puppet@production] DHCP: Add MAC address entries for db209[7-9] and db210[0-2]

https://gerrit.wikimedia.org/r/502824

Change 502651 merged by Marostegui:
[operations/dns@master] DNS: Add mgmt and prodcution DNS for db209[7-9] db210[0-2]

https://gerrit.wikimedia.org/r/502651

Change 502824 merged by Marostegui:
[operations/puppet@production] DHCP: Add MAC address entries for db209[7-9] and db210[0-2]

https://gerrit.wikimedia.org/r/502824

papaul@asw-c-codfw> show interfaces ge-5/0/12 descriptions    
Interface       Admin Link Description
ge-5/0/12       up    up   db2102

@Marostegui while installing db2102 I am getting

[!!] Partition disks ├─────────────┐

│                                                 │               
│               No root file system               │               
│ No root file system is defined.                 │               
│                                                 │               
│ Please correct this from the partitioning menu. │               
│                                                 │               
│                   <Continue>                    │               
│

I will check if the raid is on sda, because the host is correctly set to be allowed to be re-imaged:

db1114|db112[6-9]|db113[0-9]|db1140|dbprov200[1-2]|db209[7-9]|db210[0-2]) echo partman/db.cfg ;; \

The raid is sdb and we need it to be sda for db.cfg to work:

Disk /dev/sdb: 3.5 TiB, 3840699359232 bytes, 7501365936 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 262144 bytes / 524288 bytes

I am going to try to destroy it and create it again from the BIOS but that doesn't always work for me as the menu is usually unusable for me due to latency, so you might need to do that onsite for me. I will let you know

We of course could make sdb work, but that would make this servers special, compared to the rest. Maybe a disk was not added properly to the RAID, or something strange? We should ask @Papaul to review the RAID setup.

So, I have been checking out the RAID menu on the controller, but unfortunately over vsp it doesn't show most of the options.
I can see there is a RAID created and the disks but that is pretty much all I can see, not even the RAID size or anything related to its options (level, strip size, etc)
Sometimes, if there is an SD card on the server, that takes or is assumed to be the sda (although I cannot see it on fdisk) and the logical drive is created as sdb. @Papaul can you review if there is another storage device there?
Example of what I "see":

Captura de pantalla 2019-04-11 a las 7.39.25.png (187×772 px, 39 KB)

Captura de pantalla 2019-04-11 a las 7.37.34.png (345×948 px, 17 KB)

Captura de pantalla 2019-04-11 a las 7.37.51.png (347×946 px, 19 KB)

I have been trying to check if there is something else defined on a storage level but it is impossible to see anything with vsp :(

Captura de pantalla 2019-04-11 a las 9.19.01.png (343×940 px, 12 KB)

@jcrespo @Marostegui I disable the SD card and it is working

Please look if the configuration looks right so i can do the same on the other 5 servers

root@db2102:~# fdisk -l
Disk /dev/sda: 3.5 TiB, 3840699359232 bytes, 7501365936 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 262144 bytes / 524288 bytes
Disklabel type: gpt
Disk identifier: 308170B4-330A-43CA-850C-7B6F344BA9DC

Device        Start        End    Sectors  Size Type
/dev/sda1      2048   78125055   78123008 37.3G Linux filesystem
/dev/sda2  78125056   93749247   15624192  7.5G Linux swap
/dev/sda3  93749248 7501365247 7407616000  3.5T Linux LVM


Disk /dev/mapper/tank-data: 3.5 TiB, 3792695197696 bytes, 7407607808 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 262144 bytes / 524288 bytes

So the SD disablement did the trick! :-)
The server looks good now:

root@db2102:~# df -hT
Filesystem            Type      Size  Used Avail Use% Mounted on
udev                  devtmpfs  252G     0  252G   0% /dev
tmpfs                 tmpfs      51G  9.6M   51G   1% /run
/dev/sda1             ext4       37G  899M   34G   3% /
tmpfs                 tmpfs     252G     0  252G   0% /dev/shm
tmpfs                 tmpfs     5.0M     0  5.0M   0% /run/lock
tmpfs                 tmpfs     252G     0  252G   0% /sys/fs/cgroup
/dev/mapper/tank-data xfs       3.5T  3.6G  3.5T   1% /srv
root@db2102:~# free -g
              total        used        free      shared  buff/cache   available
Mem:            503           0         502           0           0         500
Swap:             7           0           7

Change 503009 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] db2102: Disable notifications

https://gerrit.wikimedia.org/r/503009

Change 503009 merged by Marostegui:
[operations/puppet@production] db2102: Disable notifications

https://gerrit.wikimedia.org/r/503009

Marostegui updated the task description. (Show Details)

Thanks @Papaul!
This server is ready to be productionized at: T220572: Productionize eqiad and codfw source backup hosts & codfw backup test host

root@db2102:~# lsb_release -a
No LSB modules are available.
Distributor ID:	Debian
Description:	Debian GNU/Linux 9.8 (stretch)
Release:	9.8
Codename:	stretch

root@db2102:~# free -g
              total        used        free      shared  buff/cache   available
Mem:            503           0         502           0           0         500
Swap:             7           0           7

root@db2102:~# df -hT
Filesystem            Type      Size  Used Avail Use% Mounted on
udev                  devtmpfs  252G     0  252G   0% /dev
tmpfs                 tmpfs      51G  9.6M   51G   1% /run
/dev/sda1             ext4       37G  1.4G   34G   4% /
tmpfs                 tmpfs     252G     0  252G   0% /dev/shm
tmpfs                 tmpfs     5.0M     0  5.0M   0% /run/lock
tmpfs                 tmpfs     252G     0  252G   0% /sys/fs/cgroup
/dev/mapper/tank-data xfs       3.5T  3.6G  3.5T   1% /srv

@RobH @faidon Re: T219461#5103942 I wonder if we should document this stop as one to do for these models. The sda/sdb renaming is probably not a huge issue, but it adds an unnecessary variability, and I cannot see how an SD card reader would be useful to us (while it poses a theoretical IO threat)

@RobH @faidon Re: T219461#5103942 I wonder if we should document this stop as one to do for these models.

Or rather buy these without an SD card reader if that's a viable option?

@MoritzMuehlenhoff, just guessing, but I am assuming it is a chassis "bundled" SD card reader, not something we have bought on purpose. Also I just read in other ticket (cannot find it here), it may have been reported as a Debian Bug.

@RobH @faidon Re: T219461#5103942 I wonder if we should document this stop as one to do for these models.

Or rather buy these without an SD card reader if that's a viable option?

Years ago it wasn't possible from my experience. Maybe that has changed now.
The SD card taking over sda is a problem that has bitten me in the past quite a lot, funnily enough, with Dell, not with HP hehe :-) that's why I suggested it