Wed, Nov 25
done and off the rack
done and off the rack
done and off the rack
noticed this today
Tue, Nov 24
A case has been opened with HPE 5351787485
Looking at the server it's not abundantly clear which disk or disks are bad. I do know this server is out of warranty and a disk or 2 will need to be purchased. Looping in @wiki_willy to facilitate a disk purcahse.
xe-2/0/15 and 16 did not have cables attached.
I replaced the sfp+ at cr1-eqiad xe-3/2/1 cleared the interface statistics on that port. Let's leave this open a few days to see if anything changes.
Thu, Nov 19
The backplane and raid controller were both replaced, all disks are showing online.
@RobH These are ready for you, the raid still needs setup but everything is done.
It looks like all the disks are working from my end. I am resolving this task.
Wed, Nov 18
cleaned up the restbase cables and disabled the ports. Also, verified that the other 2 restbases on that ticket didn't have multiple production cables connected.
ge-3/0/22 is in eth1 on and ge-3/0/23 is in eth 3 on restbase1018. (this is reflected on the switch)
Tue, Nov 17
added the new power supplies (will keep the older ones for spares). Added all the new memory sticks. resolving this tasks, if something comes up related to the upgrade please ping me and re-open.
I swapped the bbu with one from a decom'd ms-be host. The server shutdown during the boot process. I put the old bbu back in and the server booted okay. If @fgiunchedi needs this server then we need to purchase a new battery from HP. assigning to @wiki_willy for the next steps.
Dell is sending a new backplane and a couple of disks with a technician. I am not sure when they will arrive. I received an email from Dell this morning that they are delayed. @elukey I will give you as much notice as I can to take this server down for maintenance.
Mon, Nov 16
@wiki_willy I do not know what the Q number would be, all of the HP servers start with MXQ and confirmed MXQ91300JF is correct.
@fgiunchedi The server is out of warranty, I have some decom'd HP servers and most likely can steal a bbu from one of them. I also have decom'd host w/3TB disks that we can take from. This server will require downtime, also worth noting the 4 new ms-be hosts are here and in the rack and will be ready for you by the end of the week (at the latest). In case you want to decom ms-be1022.
I am not sure how it was missed but port 19 is an-worker1114 and 18 is an-worker1113. I updated the switch ports
Tue, Nov 10
network switch updated with asset tag, removed from public vlan and added to disabled
@elukey Let's schedule this for next Tuesday please 1500UTC (10EST)
Thanks, @dcausse Still no h/w error in idrac, A ticket with Dell will need to be created, the server is under warranty.
reseated all of the DIMM, the erorr remained the same
After more investigating and trying to swap it with a known good 4TB disk, I see an amber light blinking on the backplane. I reached back out to Dell to let them know that they should also send me a backplane and 2 new disks.
Both servers are stuck at the same spot during post. I tried rebooting an-1046 but it still sticks, One of the power supplies is bad and I replaced it with one from a spare but there seems to be more of a problem. I am trying to update bios and idrac now to see if that helps. The h/w log doesn't show anything wrong. These are both well out of warranty and if this doesn't fix the issue we need to have them decommissioned.
Mon, Nov 9
@Jclark-ctr if you can give me the network ports you intend to use I will have them pre-configured as well.
Thu, Nov 5
this has been completed
Had a conversation with Jeff about this and we're going to just hold on to the controller for now. There isn't any immediate need to replace it. I am resolving this task. The controller will have this phab task written on it for reference.
I am assigning this to @Jclark-ctr. John, the new scs is in the flexspace, all of the cable ends may need to be snipped and re-done with a standard tia568a. This may not be necessary for all of row A and B, some of these have dongles that are on the system side of the connection. I recommend swapping the scs, plugging in and removing the dongles first.
mw1267 issues have been fixed
@Marostegui yes, db1091 is already gone from the racks. I did a more detailed count and right now, not removing any 1G servers from 10G racks I can fit 10 1U DB hosts. Your servers are not here yet so we have some time to make space but it's going to require moving 1G servers out of 10G U space.
When these arrive they will be sitting on the floor until we have space to rack them. At this time I may be able to get 4 or 5 racked in 10G racks.
@elukey to answer some of the earlier questions. @wiki_willy and I identified all the 1G servers in 10G racks that we could potentially move to create more space (T267065). Will it happen in a month, probably not. Based on the rack availability row B has 8 openings in 2 racks. I now have 0 openings in row A, rack C2 has the 2 and all of row D availability is in one rack. I am also in a predicament because we have several more ms-be servers arriving that are 2U and need 10G and new database servers that are 1U and need 10G space. Let me know what you want to do? I can rack a few of your servers for now and wait for space to open up, I could just fill every hole I have and then rack the remainder when/if more space opens.
@elukey There are 2 480GB SSDs and 12 4TB disks in each of the servers. They are all unpacked and I can rack some but not all of them.
@wiki_willy the crossover cable needs to be made. We have cat5 cable on-site and can be cut to the length needed. If you rather purchase a cross over cable then we need a blue 10M cable.
Dell reached out and needed more information and raid log. I sent over to them now.
Wed, Nov 4
Sent the TSR report to Dell for a new disk
John, you can use the db1139 swap to assist with the documentation.
John, on Thursday can you swap the motherboard out please. The new one is the flex space.
Tue, Nov 3
@elukey the an-presto1004 motherboard has been replaced and the backplane, everything came back up as normal except I am not able to ssh into the server and fresh install may be needed. While it was down I updated the idrac and bios. I am resolving this as the on-site work has been completed. Please reopen if there is still a problem.
@wiki_willy I had time to do this today while the Dell tech worked on an-presto1004. I am going to be utilizing a 2U space in A2 and B2 for the kafka-jumbo 10G updates leaving only 15 2U spaces. We will have less than I previously reported. I am also pasting what I put in the an-worker ticket here for better tracking.
@Jclark-ctr Please make sure all of these switches have been restored to factory defaults, unplug, and remove the racks. Please be very careful not to unplug anything else. There is a lot going on back there. Once off the racks please pull all the old stacking cables.
@RobH This server is ready to go back to you for spares. Where are you tracking that?
These are all 1G serves in 10G racks for row B
These are all 1G servers in 10G racks for row A
I reseated all the DIMM and there were several. I am not getting any Dell h/w errors. Hopefully, the reseat and flea power drain will correct the issue. I am resolving this task. If the problem persists, please re-open and tag me.
The mainboard arrived
Mon, Nov 2
@wiki_willy I do not have any spare SSDs that would match what is in that server now.
Fri, Oct 30
@Jgreen all the on-site work has been completed. idrac password is a temporary password
New PSU arrived and swapped. System reports healthy.
updated the cable number to 5226 20M.
@Dzahn I moved mw1267 and 1268 to rack A8 and confirmed they're up. Updated netbox
Oct 29 2020
Called to open a ticket with Dell, they received the information and the TSR and are sending a new part
Spoke with Dell tech, Chris Bennet today. The ball was dropped by Dell, nobody ordered the new part and our case was left open and not owned by anyone. Today a new case for the backplane was opened and it's being elevated to L3 because it could be a safety issue since we did have smoke inside the server. This includes anything from a part replacement to a system exchange. Enterprise Service Request 84193619
@elukey great, I usually get to the data center around 1500UTC
@elukey I have the 2 DIMM on-site. Does this need to be scheduled? If so can we schedule this for Tuesday 3 November? If not, let me know if I can take it down anytime.