Page MenuHomePhabricator

Information missing from racktables
Closed, ResolvedPublic

Description

I 've been going through the racktables to servermon migration process again and while I 've managed to substantially diminish through a number of tricks like

  • For equipment in the "decomissioned" rack, autogenerating Serial Number in case they are not known.
  • For equipment in the "decomissioned" rack, assigning dummy vendor+equipment model in case they are not known
  • Using SNMP to fetch Power Strip vendor, model and strip and using php to update racktables

the number of non-migratable equipment due to some info missing, there are still quite a few. The 3 data items missing are Vendor (e.g. "Dell"), EquipmentModel (e.g. Poweredge R730xd) and Serial Number (e.g. "SDF312").

The list is attached in this task. We should aim to take it down to 0 so we can finally migrate away from racktables.

Missing model:
backup-array1, cp3011, db1027, mobile1004, mobile1005, ms6, msw-c1-pmtpa, msw-d1-eqiad, msw-d1-pmtpa, msw-d2-eqiad, msw-d2-pmtpa, msw-d3-eqiad, msw-d3-pmtpa, msw-d4-eqiad, msw-d5-eqiad, msw-d6-eqiad, msw-d7-eqiad, msw-d8-eqiad, msw-oe11-esams, mw1002, mw1016, mw1018, mw1021, mw1097, ps1-a3-sdtpa, ps1-a4-sdtpa, ps1-oe10-esams, ps1-oe11-esams, ps1-oe12-esams, ps1-oe13-esams, scs-a8-eqiad, scs-c1-codfw, scs-c1-eqiad, scs-d1-pmtpa, tridge-array, wmf6407

Event Timeline

I am still working through the list, namely the ones I can in someway get information about, but there are many (for example spares) that I can obviously do nothing about.

I think I 've minimized the list as much as possible on my part. The leftovers are hosts that I have access to (due to them being offline (e.g. the mw10* hosts) or non-managed (e.g. the msw* hosts). For the ones left, I see the following possible actions

  • Acquire the model and serial and enter it in the racktables. That's the best option all around. Especially for the equipment currently present in the racks this is definitely the best option
  • Move equipment into the decommissioned rack. That's probably a viable option for hosts that are going away soon. Equipment model and serials will be autogenerated for these hosts upon the migration. Examples that possibly can be in that category are "tridge-array" and nas100* stuff.
  • Delete stuff that is wrong in the first place. I see no such example, just noting this for completeness.

@RobH, @Cmjohnson, @Papaul could you please help ? Most of these are for EQIAD/ESAMS. There is 1 for CODFW and a couple ULSFO from what I see.

Creating an excel sheet and sending it to me is just fine. I 've got some sample PHP code that will insert the data in a bulk in racktables.

akosiaris triaged this task as Medium priority.Nov 15 2016, 12:32 PM

This came up in the ops meeting as well. The discussion there is for any esams mgmt switches that are missing the serial, to just randomly generate one that is known to be fake (maybe include wmf in the serial?).

The items in the US all have asset tags, so the asset tags can be substitued for the mgmt switch serial in the US based sites. These are stupid switches, they have no mgmt capability or remote access.

I cleaned up a lot of the items that were missing, but the mgmt switches are indeed missing all serial info in a lot of them.

Added dummy serials to

atlas-codfw, atlas-eqiad, atlas-ulsfo, br1-knams, cp3001, cp3002, dataset1001-array1, db1027, frdb1001, indium, msw-c1-eqiad, msw1-ulsfo, msw2-ulsfo, mw1010, mw1011, mw1016, mw1018, mw1019, mw1021, mw1097, mw1108, mw1111, mw1112, mw1120, mw1122, mw1123, mw1124, nas-array2, nas-array3, nas1, nas1-array1, nas1001, nas1001a, nas1001b, nas1001c, ps1-a3-sdtpa, ps1-a4-sdtpa, scs-c1-eqiad
faidon added a subscriber: faidon.

The updated list of devices missing model/number can be found below.

A bunch of them are the new cp40xx. A few more are also online, which means that we can locate it from servermon's inventory page. Others are not, but we can guess the information from neighboring systems of the same order (e.g. cp3011 was the same config as cp3010, etc.). Finally a bunch of them need further investigation and, really, either someone remembering what they are, or actually going to the rack and looking it up.

We should really fix this and add the missing information in Racktables -- looking at the list, I think it's entirely feasible, especially for someone with the necessary institutional knowledge :) @RobH, perhaps you can have a look and update Racktables where needed?

1backup-array1
2cp3011
3cp4022
4cp4023
5cp4024
6cp4025
7cp4026
8cp4027
9cp4028
10cp4029
11cp4030
12cp4031
13cp4032
14db1027
15lvs3001
16mobile1004
17mobile1005
18ms6
19msw-c1-pmtpa
20msw-d1-eqiad
21msw-d1-pmtpa
22msw-d2-eqiad
23msw-d2-pmtpa
24msw-d3-eqiad
25msw-d3-pmtpa
26msw-d4-eqiad
27msw-d5-eqiad
28msw-d6-eqiad
29msw-d7-eqiad
30msw-d8-eqiad
31msw-oe11-esams
32ores1008
33ps1-a3-sdtpa
34ps1-a4-sdtpa
35ps1-oe10-esams
36ps1-oe11-esams
37ps1-oe12-esams
38ps1-oe13-esams
39restbase-dev1005
40scs-a8-eqiad
41scs-c1-codfw
42scs-c1-eqiad
43scs-d1-pmtpa
44tridge-array
45wmf6407

All of the cp40XX are fixed.

ms6: this is a sun system in esams, purchased by wm-de, before we used racktables. As such, I've just populated it with a purchase date info of the racktables entry date, so it has all the required info for the migration.

modified kvm to be kvm-ulsfo. I'm unsure of the model, so I'll look it up next time I'm there. I've created a ticket for this particular followup.

RobH removed RobH as the assignee of this task.Jul 19 2017, 5:36 PM

I'm pretty sure we (Faidon, Chris, Papaul, and myself) fixed these in the last week. Faidon has a more recent access list than I do though, @faidon?

We've fixed so many issues over the past few months that I can't even count them :) Thanks all. I did another sweep today and found these that need fixing:

Missing purchase date and "support until":

  • WMF6583 - WMF6586 (also miss Phabricator task)
  • WMF6980 - WMF6987
  • WMF7011 - WMF7015
  • WMF7043
  • WMF7068 - WMF7071 (also probably missing Farnam lease, given two out of four are marked as such?)
  • WMF7091, WMF7093 - WMF7097
  • WMF7113 - WMF7114
  • WMF7127 - WMF7139

Also, WMF7145 is probably also a Farnam lease, given that the rest of the hosts in its group are?

Missing in Racktables":

  • bast4002
  • db1018
  • db1022

@RobH can you perhaps triage and/or address those, looping in Chris/Papaul when appropriate?

@faidon
db1018 and db1022 are confirmed in racktables but are both decommissioned and removed from the rack.

I updated the following asset tags with the purchase and warranty expiration dates.

WMF7043
WMF7068 - WMF7071
WMF7091, WMF7093 - WMF7097
WMF7113 - WMF7114
WMF7127 - WMF7134

I Did Not Update these , @RobH can you help with these
WMF6980 - WMF6987 PDU's
wmf7135 - wmf7139 Network gear

@faidon
db1018 and db1022 are confirmed in racktables but are both decommissioned and removed from the rack.

I updated the following asset tags with the purchase and warranty expiration dates.

WMF7043
WMF7068 - WMF7071
WMF7091, WMF7093 - WMF7097
WMF7113 - WMF7114
WMF7127 - WMF7134

I Did Not Update these , @RobH can you help with these
WMF6980 - WMF6987 PDU's
wmf7135 - wmf7139 Network gear

Thanks @Cmjohnson, that was fast! So remaining are:

  • WMF7011 - WMF7015 elastic10NN (@Cmjohnson?)
  • WMF6980 - WMF6987 asw2-d-eqiad (@RobH)
  • WMF7135 - WMF7139 frack network eqiad (@RobH)
  • WMF6583 - WMF6586 frack network codfw (@RobH)

Thanks @Cmjohnson, that was fast! So remaining are:

updated phab (had wrong sub task listed) and added in purchase and hw warranty expire date.

  • WMF6980 - WMF6987 asw2-d-eqiad (@RobH)

These are all updated with purchase date, but there is no hw support expiry on network gear, only support contract expiry (which has passed, i put in the past date but we're still waiting on the approval for renewal for a new support contract.)

  • WMF7135 - WMF7139 frack network eqiad (@RobH)

All updated with purchase date, some were missing support contract info, updated. All network gear, so no hw warranty expiry (since its all support contracts)

  • WMF6583 - WMF6586 frack network codfw (@RobH)

these were missing purchase date, support contract expiry, and phab task number. All updated now.

Good enough for now. Thanks everyone!