User Details
- User Since
- Dec 5 2022, 4:37 PM (75 w, 1 d)
- Availability
- Available
- LDAP User
- Jhancock.wm
- MediaWiki User
- Jhancock.wm [ Global Accounts ]
Yesterday
the cable or the 1G SFP might need to be replaced. can we downtime the server for a small window to test the cabling?
rebooted. all in C6 up now.
reseated. pings on mgmt.
Mon, May 13
@Papaul, This was the last screen I got. The servers all have the OS installed and it failed at the certificate stage. I think it's cause I used python 7 instead of 5. when I attempt to retry with 5, it fails.
Initiated: 2024-05-13 13:35 Duration: 0 minutes, 1.60 seconds Completed
Thu, May 9
Wed, May 8
uplink for msw2 was degraded and flapping. repaired. staying up now.
port 47 on the maw was going up and down on it's own. replaced the rj-45 terminator. remained steady.
Tue, May 7
Mon, May 6
Forgot I left it there. All yours now!
Thu, May 2
@JMeybohm papaul helped me identify the missing disk. I replaced it with a compatible drive. please let me know if that fixed the issue. Thanks.
reseated psu2 and cable. alert cleared on machine.
Wed, May 1
see T362938
see T362938
removed the error by rebooting the idrac
fixed the main source of the alert (PSU and power cable reseated) but still getting the following error.
Tue, Apr 30
idrac upgraded to 7.0.0. won't go any higher. Bios is already at 2.9.3. Reset the factory defaults and tried rebooting the idrac. reseated the backplane. None of these have fixed the issue. Going to look into getting a replacement part. Might need to be salvaged from decommissioned servers. Will update when we have a solution
known issue with no impact
draining didn't fix it. I'm gonna update the firmware and bios and then see where it is.
Mon, Apr 29
Apologies for the wait on this one. I checked out the server and the drives look to be working physically. But when I logged into the idrac it sees zero disks. Checked the warranty and it expired in February. I do have a pair of decommed 960GB drives that could replace it. However, I cannot tell which drive needs to be replaced. Please let me know if this still needs attention and how I can help.
Tue, Apr 23
known issue with no impact
Mon, Apr 22
Thu, Apr 18
All tests passed on the diagnostic test, including the pci bus. It's pinging on the idrac and the network ips.
@RKemper give it another go. @ me if you run into an issue again.
Tried to run a diagnostic from the Lifecycle controller. Haunted because of a DIMM error on B4. It's been replaced. re-running the diagnostic to check for any more issues.
Wed, Apr 17
@RKemper I am going to check it out and get back in touch with dell. These are the same errors we were getting before the card was replaced.
Tue, Apr 16
I updated the sheet with the needed information but spaced submitting that to this task. Please let me know if there's anything else I can do to help out with the tasks. Thanks!
ty!
alert cleared. being decommed in T362438
known issue, no impact
Mon, Apr 15
@cmooney what is the vlan for this server?
reseated blue cable
Apr 12 2024
@bking I got the HBA card replaced and it booted without any issues that I can find in the iDRAC. Can you check CLI to see if the raid is still degraded?
Apr 11 2024
Update: Dell finally agreed to replace the HBA card. I sent the shipping address confirmation just now. Hopefully it'll be here tomorrow. Latest Monday morning.
Apr 9 2024
Apr 5 2024
Here are some more logs.
follow up: still going back and forth with Dell.
Apr 4 2024
refresh task: https://phabricator.wikimedia.org/T325215
Apr 3 2024
Apr 2 2024
replaced the SFP, server is pingable again.
this error reoccured.
Apr 1 2024
elastic2049 was already decommissioned under https://phabricator.wikimedia.org/T313842