User Details
- User Since
- Jul 24 2019, 8:11 PM (250 w, 6 d)
- Availability
- Available
- LDAP User
- Jclark-ctr
- MediaWiki User
- Jclark-ctr [ Global Accounts ]
Yesterday
@akosiaris could you please update preseed.yaml file? I did take care of site.pp file for codfw and eqiad
kafka-main1010
Rack: E 5
U 26
Cableid : 2013339101771
Port : 6
@Volans i also see this as a learning opportunity most of these are just logs. Some dcops members are very light on linux and we could be expanding knowledge and could be come more valuable members of the team. Although I do love cookbooks but sometimes they fail and would be nice if we could continue to teach and train coworkers
Mon, May 13
@Volans The main purpose is for gathering debug information I would prefer to grep mesg /log files instead of searching throughout entire output. Mdadm commands would allow us to one day rebuild failed software raids
Thu, May 9
@Marostegui you can put server back in rotation even though i uploaded multiple photos yesterday to Dell. They replied this morning requesting part number to send correct part
I attached the photo that was sent to dell. I do not expect it to arrive till next week. We can work on this at a later date.Wed, May 8
I believe we are good to reimage server OS looks corrupt. if you could just wait till tomorrow to put back in production while i wait for Dell to respond if they will send out new cable.
I am powering it up now and will check idrac.
Replaced Backplane : cable that connects raid card<-> backplane / power control board. I did find a cable with a loose pin on the power control board (not replaced) but will be reaching out to Dell regarding it it has been reseated in connector and should be fine for the time being
Tue, May 7
Friday dell agreed to replace Backplane and cables. shipped out Monday expected arrival Tuesday.
Thu, May 2
@akosiaris idrac has stayed up for 4 days now possibly me relocating to a different port helped it. We wont know until it is put in use again. this server is out of warranty if it fails again we could look at swapping it with another decom server?
@andrea.denisse We have been having a few issues with software raids we are trying to pinpoint what slot these are in. Idrac is not listing the drives. I will message you for assistance
Tue, Apr 30
@Marostegui "At the creation of ticket i requested to not repeat any troubleshooting steps the where not effective"
followed up with dell again they should be sending out parts shortly
Idrac is still up after almost 24 hours. i did move IDRAC port on switch to a different group of ports will monitor it
Mon, Apr 29
@Volans We have replaced this drive 4 times now and continues to fail we no longer suspect that it is a Drive issue and maybe a process issues for recreating mdadm raid 10. We are also having same issue with aqs1014 Do you have any input or able to assist or know who might be best person to assist with issue?
@Clement_Goubert @akosiaris since this failed again i did reset idrac again and is back up right now. Idrac is not showing anything and is out of warranty. with my limited access can check and see if there any errors in dmesg or log files?
T363086 duplicate
duplicate T362033
Wed, Apr 24
Corrected typo
cloudcephosd1017 looks like the drive was listed as foreign I cleared the foreign status can you verify it now?
Looking at lshw.log and inventory on idrac it looks like all the drives are in order except sdf ,sdh are swapped in slots. after sdf rebuilds i can swap sdh
@Eevans Replaced drive
server is out of warranty
Opened request with Dell
You have successfully submitted request SR189381173.
@ABran-WMF Received replacement Dimm please reach out to me or @VRiley-WMF to schedule replacement I am available today but will be off the next two days
duplicate ticket T362033
Installed Gpu into stat1010
Tue, Apr 23
@Eevans this one is out of warranty also let me know if i am able to swap drive i can take care of in morning
@Eevans hey sorry about missing the update for being available i did just swap the drive now. When you are recreating the md2 what commands are you running?
Server is out of warranty preformed reboot came up with no issues, Swapped idrac cable and updated idrac firmware. seems to be up and running now. @akosiaris
@akosiaris @hashar reset idrac with no change i will need to reboot server and hook crash cart up to it. Please advise if i am able to reboot.
You have successfully submitted request SR189292045.
installed loop facing Telxius
Opened ticket with Dell sr 189290647
Wed, Apr 17
Mon, Apr 15
Replaced dac cable and reimaged @jcrespo looks like it resolved issue
Apr 11 2024
Corrected netbox errors
Apr 10 2024
Apr 9 2024
Apr 8 2024
Warranty Expired 19 NOV 2023. Will look to see what drives we have available at data center
Apr 1 2024
Mar 28 2024
Replaced failed ssd with extra from onhands at eqiad
Mar 20 2024
drives installed
Mar 19 2024
@dcaro server is out of warranty i did replace disk with an extra one we had on hand in eqiad please confirm fixed issue and close ticket
no disk issues it is rebuilding
replaced disk 7 with onhand disk will put replacement into extra storage when it arrives
Replaced disk 5.
Mar 18 2024
Mar 13 2024
Followed dell troubleshooting steps. updated firmware for Bios ,idrac already most recent multiple bios firmwares versions have come out flagged as urgent
Mar 12 2024
Ticket canceled by dell SR186677718 with no reason 14 hours after creating request.
Mar 11 2024
ticket submitted
Mar 8 2024
@dcaro thanks for the notes much more productive meeting. although nothing popped out for the engineer he also admitted he did not know what to look for on the logs for Debian linux. I will spend some time researching today/tonight while traveling to see what I can find also.
@RobH i have plenty of 1.92tb ssd i have pulled 10x ssd and will put a few away as spares if needed later
Mar 6 2024
@bking was puppet and site.pp updated? unfortunately me and Valerie do not have access to push updates and has become a process for sre owner to do with procurement ticket
Mar 4 2024
Removed gpu from stat1005
found power plug has changed between 730xd to 740xd.
Mar 1 2024
@jcrespo are the Raid instructions backwards os is usually on ssd's RAID 0?
disconnected and removed from netbox
Duplicate ticket for T358787
@BTullis I will be available monday 10am (est) if that works for you
Feb 29 2024
@BTullis would you like to do before or after sre summit?