Page MenuHomePhabricator

(Need By: 2020-09-15) rack/setup/install db2141 (or next in sequence)
Closed, ResolvedPublic

Description

This task will track the racking, setup, and OS installation of db2141 (or next in sequence)

Need By: Tied to Q1 OKR T257551, but other procurement tickets T257550 and T257547 are a higher priority. Needed by end of quarter, even if only racking/basic setup. Rob has input this in the subject with a due date of 2020-09-31 to give two of implementation time in Q1 for DBA team.

Hostname / Racking / Installation Details

Hostnames: db2141 (assuming no other dbs are purchased meanwhile)
Racking Proposal: anywhere, production (mw) network
Networking/Subnet/VLAN/IP: 10 cards are requested, but 1G is enough for now for these servers
Partitioning/Raid: RAID10 with 256 stripe size and writeback as documented at https://wikitech.wikimedia.org/wiki/Raid_and_MegaCli#Raid_setup_at_Wikimedia
OS Distro: Buster

Per host setup checklist

Each host should have its own setup checklist copied and pasted into the list below.

db2141: Row C rack C3 ge-3/0/10

  • - receive in system on procurement task T257981 & in coupa
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
  • - bios/drac/serial setup/testing
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, vlan)
    • end on-site specific steps
  • - production dns entries added
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation
  • - puppet accept/initial run (with role:insetup)
  • - host state in netbox set to staged

Once the system(s) above have had all checkbox steps completed, this task can be resolved.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript
RobH moved this task from Backlog to Racking Tasks on the ops-codfw board.
RobH added a parent task: Unknown Object (Task).
RobH unsubscribed.
[edit interfaces interface-range vlan-private1-c-codfw]
     member ge-3/0/9 { ... }
+    member ge-3/0/10;
[edit interfaces interface-range disabled]
-    member ge-3/0/10;
[edit interfaces]
+   ge-3/0/10 {
+       description db2141;
+   }
papaul@asw-c-codfw# run show interfaces ge-3/0/10 descriptions
Interface       Admin Link Description
ge-3/0/10       up    up   db2141

Change 626198 had a related patch set uploaded (by Papaul; owner: Papaul):
[operations/dns@master] DNS: Add production DNS for db2141

https://gerrit.wikimedia.org/r/626198

Change 626198 merged by Papaul:
[operations/dns@master] DNS: Add production DNS for db2141

https://gerrit.wikimedia.org/r/626198

Change 626202 had a related patch set uploaded (by Papaul; owner: Papaul):
[operations/puppet@production] DHCP: Add MAC address for db2141

https://gerrit.wikimedia.org/r/626202

Change 626202 merged by Papaul:
[operations/puppet@production] DHCP: Add MAC address for db2141

https://gerrit.wikimedia.org/r/626202

Change 626206 had a related patch set uploaded (by Papaul; owner: Papaul):
[operations/puppet@production] Add db2124 to site.pp

https://gerrit.wikimedia.org/r/626206

Change 626206 merged by Papaul:
[operations/puppet@production] Add db2141 to site.pp

https://gerrit.wikimedia.org/r/626206

Script wmf-auto-reimage was launched by pt1979 on cumin2001.codfw.wmnet for hosts:

db2141.codfw.wmnet

The log can be found in /var/log/wmf-auto-reimage/202009091924_pt1979_17781_db2141_codfw_wmnet.log.

                           │
│ reuse-parts: Recipe device matching failed │
│ ERROR: =dev=mapper=* matches zero devices  │
│                                            │
│ All devices:                               │
│ =dev=sda                                   │
│                                            │
│     <Go Back>               <Continue>     │
│                                            │
└────────────────────────────────────────────┘

@Marostegui problem with partman recipe. Can you please check. Thanks

From what I can see this host isn't assigned to a partman recipe, but I am going to leave this to @jcrespo as this host is going to the backup infra and I am not fully sure what are the plans with it.
Thank you Papaul!

@Papaul, as a general rule, al db* hosts with the same spec, as far as first install, they should have the custom/db.cfg recipe. I believe it may fail because it has the catch-all db-reuse recipe, which is only intended for safe reimages.

I will apply the db recipe now.

Change 626275 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] mariadb: Remove db1133 from full reimage, add db2141 & db1150

https://gerrit.wikimedia.org/r/626275

Change 626275 merged by Jcrespo:
[operations/puppet@production] mariadb: Remove db1133 from full reimage, add db2141 & db1150

https://gerrit.wikimedia.org/r/626275

Script wmf-auto-reimage was launched by jynus on cumin2001.codfw.wmnet for hosts:

['db2141.codfw.wmnet']

The log can be found in /var/log/wmf-auto-reimage/202009100831_jynus_19743.log.

Completed auto-reimage of hosts:

['db2141.codfw.wmnet']

and were ALL successful.

@Papaul, this is all completed after my patch. Only leaving it open so you can see it (e.g. in case you need to do something else not on the checklist), and suggesting updating the checklist template from role(staging) to role(insetup). Otherwise, this can be closed on my side. Software setup will be done at T257551.