Page MenuHomePhabricator

Jhancock.wm (Jenn Hancock)
User

Projects (4)

Today

  • No visible events.

Tomorrow

  • No visible events.

Saturday

  • No visible events.

User Details

User Since
Dec 5 2022, 4:37 PM (175 w, 2 d)
Availability
Available
LDAP User
Jhancock.wm
MediaWiki User
Jhancock.wm [ Global Accounts ]

Recent Activity

Today

Jhancock.wm moved T423584: decommission moss-be200[1-2].codfw.wmnet from Backlog to Decommission on the ops-codfw board.
Thu, Apr 16, 2:49 PM · SRE, Ceph, SRE-swift-storage, ops-codfw, DC-Ops, decommission-hardware

Mon, Apr 13

Jhancock.wm created T423195: move es2036.
Mon, Apr 13, 8:02 PM · SRE, Data-Persistence-Misc, DC-Ops, ops-codfw
Jhancock.wm created T423184: db2201 broken DIMM.
Mon, Apr 13, 6:22 PM · SRE, DC-Ops, Data-Persistence-Misc, ops-codfw
Jhancock.wm updated the task description for T423179: sretest2001 has broken psu.
Mon, Apr 13, 6:11 PM · SRE, DC-Ops, ops-codfw
Jhancock.wm created T423179: sretest2001 has broken psu.
Mon, Apr 13, 6:03 PM · SRE, DC-Ops, ops-codfw
Jhancock.wm created T423177: wikikube-worker2188 bus errors.
Mon, Apr 13, 5:44 PM · SRE, ServiceOps new, ops-codfw, DC-Ops
Jhancock.wm added a project to T423175: wikikube-worker2190 System Configuration Check error: ServiceOps new.
Mon, Apr 13, 5:27 PM · SRE, ServiceOps new, DC-Ops, ops-codfw
Jhancock.wm created T423175: wikikube-worker2190 System Configuration Check error.
Mon, Apr 13, 5:26 PM · SRE, ServiceOps new, DC-Ops, ops-codfw
Jhancock.wm placed T423159: lists2001 has multiple bus errors up for grabs.

np!

Mon, Apr 13, 5:07 PM · SRE, collaboration-services, DC-Ops, ops-codfw
Jhancock.wm created T423159: lists2001 has multiple bus errors.
Mon, Apr 13, 4:30 PM · SRE, collaboration-services, DC-Ops, ops-codfw

Fri, Apr 10

Jhancock.wm moved T412078: Alert in need of triage: SmartNotHealthy (instance sretest2006:9100) from Hardware Failure / Troubleshoot to Blocked on the ops-codfw board.
Fri, Apr 10, 4:29 PM · SRE, ops-codfw, DC-Ops, Infrastructure-Foundations, sre-alert-triage
Jhancock.wm closed T422437: decommission cloudcephmon2004-dev, a subtask of T420282: cloudcephmon2007-dev service implementation, as Resolved.
Fri, Apr 10, 4:27 PM · cloud-services-team, SRE, DC-Ops, ops-codfw
Jhancock.wm closed T422437: decommission cloudcephmon2004-dev as Resolved.
Fri, Apr 10, 4:27 PM · SRE, DC-Ops, ops-codfw, decommission-hardware
Jhancock.wm closed T418902: Q3:rack/setup/install apus-be200[56] as Resolved.

@MatthewVernon all yours

Fri, Apr 10, 12:48 AM · SRE-swift-storage, SRE, Data-Persistence, ops-codfw, DC-Ops
Jhancock.wm updated the task description for T418902: Q3:rack/setup/install apus-be200[56].
Fri, Apr 10, 12:48 AM · SRE-swift-storage, SRE, Data-Persistence, ops-codfw, DC-Ops

Thu, Apr 9

Jhancock.wm updated the task description for T418899: Q3:rack/setup/install phab2003.
Thu, Apr 9, 10:04 PM · collaboration-services, SRE, ops-codfw, DC-Ops
Jhancock.wm updated subscribers of T418899: Q3:rack/setup/install phab2003.

okay two things with this server so far.
@Dzahn we won't be able to do legacy bios on these R470 servers. We'll need an efi boot

Thu, Apr 9, 9:58 PM · collaboration-services, SRE, ops-codfw, DC-Ops
Jhancock.wm added a comment to T422043: Create public vlans in eqiad and codfw.

also papaul is on vacation and i'd like to have his weight in as well

Thu, Apr 9, 3:00 PM · Infrastructure-Foundations, netops
Jhancock.wm added a comment to T422043: Create public vlans in eqiad and codfw.

imho, i'd prefer a rack not in A row cause of the two CR racks already taking up real estate.
D row has no specialty rack at all so we can easily work around that for future private vlan installs.
codfw's E row is 5 racks long but the F row is 4 racks + 1 Frack, so E would be the better choice. and not E-3 cause it already has less room cause of all the patch panels.

Thu, Apr 9, 2:02 PM · Infrastructure-Foundations, netops

Wed, Apr 8

Jhancock.wm closed T420708: Unresponsive management for backup2005.mgmt:22 as Resolved.

finally replaced all the parts that got fried in a power surge. powered up and back in the rack.

Wed, Apr 8, 5:41 PM · SRE, DC-Ops, ops-codfw
Jhancock.wm claimed T419970: backup2005 power supplies fried or overvoltage.

good news everybody!

Wed, Apr 8, 5:40 PM · SRE, DC-Ops, Data-Persistence-Backup, media-backups, ops-codfw
Jhancock.wm moved T422437: decommission cloudcephmon2004-dev from Backlog to Decommission on the ops-codfw board.
Wed, Apr 8, 4:36 PM · SRE, DC-Ops, ops-codfw, decommission-hardware

Tue, Apr 7

Jhancock.wm added a comment to T419970: backup2005 power supplies fried or overvoltage.

@jcrespo would loading the disks from a foreign config be acceptable for you? or will that cause issues with recovery?

Tue, Apr 7, 5:23 PM · SRE, DC-Ops, Data-Persistence-Backup, media-backups, ops-codfw

Mon, Apr 6

Jhancock.wm claimed T418899: Q3:rack/setup/install phab2003.
Mon, Apr 6, 4:49 PM · collaboration-services, SRE, ops-codfw, DC-Ops
Jhancock.wm updated the task description for T418902: Q3:rack/setup/install apus-be200[56].
Mon, Apr 6, 4:48 PM · SRE-swift-storage, SRE, Data-Persistence, ops-codfw, DC-Ops
Jhancock.wm added a comment to T418902: Q3:rack/setup/install apus-be200[56].

@MatthewVernon these came in. any objections to me racking them in the new cage, rows E and F?

Mon, Apr 6, 4:36 PM · SRE-swift-storage, SRE, Data-Persistence, ops-codfw, DC-Ops
Jhancock.wm closed T422309: Power Supply - Status - issue on cirrussearch2080:9290 as Resolved.

there was a bad psu in T422310 that was causing power fluctuations in the cabinet. it was replaced in that ticket. then this power supply was reseated. alerts have cleared.

Mon, Apr 6, 2:48 PM · SRE, DC-Ops, ops-codfw
Jhancock.wm closed T422310: Power Supply - Status - issue on logstash2036:9290 as Resolved.

psu2 is rapid blinking even after reseated power, trying new port, and replacing the cable. out of warranty, checking decomms for compatible psu swap. found one and replaced. alert cleared.

Mon, Apr 6, 2:47 PM · SRE, DC-Ops, ops-codfw

Fri, Apr 3

Jhancock.wm closed T422061: Alert for device lsw1-b4-codfw.mgmt.codfw.wmnet - Port with no description on access switch as Resolved.

had the wrong cable connected for a server that's still in provisioning stage. corrected.

Fri, Apr 3, 3:44 PM · SRE, ops-codfw, DC-Ops
Jhancock.wm closed T422058: Alert for device lsw1-c7-codfw.mgmt.codfw.wmnet - Port with no description on access switch as Resolved.

provisioned ports on new server. cleared.

Fri, Apr 3, 3:39 PM · SRE, DC-Ops, ops-codfw

Thu, Apr 2

Jhancock.wm updated the task description for T416396: Q3:rack/setup/install cloudcephmon2007-dev.
Thu, Apr 2, 3:07 PM · SRE, DC-Ops, ops-codfw

Wed, Apr 1

Jhancock.wm moved T416538: FY2526 Q3:rack/setup/install restbase2039 from Racking Tasks to Blocked on the ops-codfw board.
Wed, Apr 1, 4:23 PM · Data-Persistence, SRE, ops-codfw, DC-Ops
Jhancock.wm moved T418914: Q3:rack/setup/install conf200[7-9] from Racking Tasks to Blocked on the ops-codfw board.
Wed, Apr 1, 4:23 PM · SRE, ServiceOps new, ServiceOps-Upgrades-Hardware, ops-codfw, DC-Ops
Jhancock.wm moved T418931: Q3:rack/setup/install kafka-logging200[6-8] from Racking Tasks to Blocked on the ops-codfw board.
Wed, Apr 1, 4:23 PM · observability, SRE, ops-codfw, DC-Ops
Jhancock.wm added a comment to T418914: Q3:rack/setup/install conf200[7-9].

we're having the issue that was documented in https://phabricator.wikimedia.org/T418929 with these servers. still working on a solution.

Wed, Apr 1, 4:19 PM · SRE, ServiceOps new, ServiceOps-Upgrades-Hardware, ops-codfw, DC-Ops
Jhancock.wm updated the task description for T418914: Q3:rack/setup/install conf200[7-9].
Wed, Apr 1, 4:18 PM · SRE, ServiceOps new, ServiceOps-Upgrades-Hardware, ops-codfw, DC-Ops
Jhancock.wm added a comment to T418931: Q3:rack/setup/install kafka-logging200[6-8].

turns out these servers are also having the same issue as these servers https://phabricator.wikimedia.org/T418929
so got a little to figure out if you want to rename them.

Wed, Apr 1, 4:08 PM · observability, SRE, ops-codfw, DC-Ops
Jhancock.wm updated the task description for T418931: Q3:rack/setup/install kafka-logging200[6-8].
Wed, Apr 1, 4:07 PM · observability, SRE, ops-codfw, DC-Ops
Jhancock.wm added a comment to T416538: FY2526 Q3:rack/setup/install restbase2039.

this server is having the issue found in T418929 where we can't add the root user because of hardware changes

Wed, Apr 1, 4:05 PM · Data-Persistence, SRE, ops-codfw, DC-Ops
Jhancock.wm updated the task description for T416538: FY2526 Q3:rack/setup/install restbase2039.
Wed, Apr 1, 4:03 PM · Data-Persistence, SRE, ops-codfw, DC-Ops

Tue, Mar 31

Jhancock.wm closed T420948: Power Supply - Status - issue on cloudbackup2003:9290 as Resolved.
Tue, Mar 31, 6:26 PM · SRE, ops-codfw, cloud-services-team, DC-Ops

Mon, Mar 30

Jhancock.wm updated the task description for T418931: Q3:rack/setup/install kafka-logging200[6-8].
Mon, Mar 30, 8:22 PM · observability, SRE, ops-codfw, DC-Ops
Jhancock.wm closed T419753: Decommission codfw cp hosts cp2027-cp2040 as Resolved.
Mon, Mar 30, 3:17 PM · SRE, DC-Ops, ops-codfw, decommission-hardware, Traffic

Thu, Mar 26

Jhancock.wm updated the task description for T416538: FY2526 Q3:rack/setup/install restbase2039.
Thu, Mar 26, 7:58 PM · Data-Persistence, SRE, ops-codfw, DC-Ops
Jhancock.wm added a comment to T419970: backup2005 power supplies fried or overvoltage.

Hey. The issue turned out to be me more invovled than i thought. I need to replace the power distribution board in the server. but that involves replacing the arm that gets power to the sliding drive bays as well. Might be next week before i can finish that out. I'll let you know.

Thu, Mar 26, 2:50 PM · SRE, DC-Ops, Data-Persistence-Backup, media-backups, ops-codfw

Wed, Mar 25

Jhancock.wm added a comment to T420228: phab2002: SEL System Event:, System Board Front LED Panel, Critical, management controller unavailable.

@Dzahn
looks like that worked. it rebooted after the update and there wasn't a repeat of these error codes from two weeks ago.

2026-03-14 21:20:20 	HWC5003 	Front LED Panel is operating correctly.
2026-03-14 21:20:05 	SWC5008 	Unable to access Front LED Panel because of a hardware error condition.	
2026-03-11 07:33:05 	HWC5003 	Front LED Panel is operating correctly.	
2026-03-11 07:32:45 	SWC5008 	Unable to access Front LED Panel because of a hardware error condition.
Wed, Mar 25, 2:55 PM · collaboration-services, ops-codfw, Phabricator, DC-Ops
Jhancock.wm added a comment to T418931: Q3:rack/setup/install kafka-logging200[6-8].

@herron we've received these and will try to get them installed by EoW. please update site.pp to include the newest nodes. ty!

Wed, Mar 25, 2:47 PM · observability, SRE, ops-codfw, DC-Ops

Tue, Mar 24

Jhancock.wm added a comment to T420228: phab2002: SEL System Event:, System Board Front LED Panel, Critical, management controller unavailable.

I would definitely start with that and see if it clears the issue. the least invasive. I am onsite most workdays from 1400 UTC to 1800 UTC but can go later if needed. pick your day and i'll get that done for you.

Tue, Mar 24, 4:34 PM · collaboration-services, ops-codfw, Phabricator, DC-Ops
Jhancock.wm closed T421043: Power Supply - PS Redundancy - issue on wikikube-ctrl2001:9290 as Resolved.

rmultiple servers found on different breakers. no correlation other than rack.

Tue, Mar 24, 4:31 PM · SRE, DC-Ops, ops-codfw
Jhancock.wm closed T421042: Power Supply - PS Redundancy - issue on cirrussearch2079:9290 as Resolved.
Tue, Mar 24, 4:30 PM · SRE, DC-Ops, ops-codfw
Jhancock.wm added a comment to T421042: Power Supply - PS Redundancy - issue on cirrussearch2079:9290.

rmultiple servers found on different breakers. no correlation other than rack.

Tue, Mar 24, 4:30 PM · SRE, DC-Ops, ops-codfw
Jhancock.wm added a comment to T420948: Power Supply - Status - issue on cloudbackup2003:9290.

there were a few power supplies that went down in the same rack. it wasn't a breaker trip. all on different channels on the PDUs. I see no correlation or cause other than a possible power surge on the PDU. but it didn't affect the whole rack which i would expect.
I might have accidentally rebooted this one while getting the psu issues to clear for the whole rack.

Tue, Mar 24, 4:30 PM · SRE, ops-codfw, cloud-services-team, DC-Ops
Jhancock.wm closed T420905: Power Supply - Status - issue on wikikube-ctrl2001:9290 as Resolved.

rmultiple servers found on different breakers. no correlation other than rack.

Tue, Mar 24, 4:28 PM · SRE, DC-Ops, ops-codfw
Jhancock.wm closed T420762: Power Supply - Status - issue on cirrussearch2079:9290 as Resolved.

rmultiple servers found on different breakers. no correlation other than rack.

Tue, Mar 24, 4:27 PM · SRE, DC-Ops, ops-codfw
Jhancock.wm closed T420761: Power Supply - Status - issue on logstash2036:9290 as Resolved.

rmultiple servers found on different breakers. no correlation other than rack.

Tue, Mar 24, 4:26 PM · SRE, DC-Ops, ops-codfw
Jhancock.wm closed T420760: Power Supply - Status - issue on cirrussearch2080:9290 as Resolved.

rmultiple servers found on different breakers. no correlation other than rack.

Tue, Mar 24, 4:26 PM · SRE, DC-Ops, ops-codfw
Jhancock.wm moved T420708: Unresponsive management for backup2005.mgmt:22 from Backlog to Hardware Failure / Troubleshoot on the ops-codfw board.
Tue, Mar 24, 3:02 PM · SRE, DC-Ops, ops-codfw
Jhancock.wm moved T420760: Power Supply - Status - issue on cirrussearch2080:9290 from Backlog to Hardware Failure / Troubleshoot on the ops-codfw board.
Tue, Mar 24, 3:02 PM · SRE, DC-Ops, ops-codfw
Jhancock.wm moved T420761: Power Supply - Status - issue on logstash2036:9290 from Backlog to Hardware Failure / Troubleshoot on the ops-codfw board.
Tue, Mar 24, 3:02 PM · SRE, DC-Ops, ops-codfw
Jhancock.wm moved T420762: Power Supply - Status - issue on cirrussearch2079:9290 from Backlog to Hardware Failure / Troubleshoot on the ops-codfw board.
Tue, Mar 24, 3:02 PM · SRE, DC-Ops, ops-codfw
Jhancock.wm moved T420905: Power Supply - Status - issue on wikikube-ctrl2001:9290 from Backlog to Hardware Failure / Troubleshoot on the ops-codfw board.
Tue, Mar 24, 3:02 PM · SRE, DC-Ops, ops-codfw
Jhancock.wm moved T420948: Power Supply - Status - issue on cloudbackup2003:9290 from Backlog to Hardware Failure / Troubleshoot on the ops-codfw board.
Tue, Mar 24, 3:02 PM · SRE, ops-codfw, cloud-services-team, DC-Ops
Jhancock.wm moved T421042: Power Supply - PS Redundancy - issue on cirrussearch2079:9290 from Backlog to Hardware Failure / Troubleshoot on the ops-codfw board.
Tue, Mar 24, 3:02 PM · SRE, DC-Ops, ops-codfw
Jhancock.wm moved T421043: Power Supply - PS Redundancy - issue on wikikube-ctrl2001:9290 from Backlog to Hardware Failure / Troubleshoot on the ops-codfw board.
Tue, Mar 24, 3:02 PM · SRE, DC-Ops, ops-codfw

Thu, Mar 19

Jhancock.wm moved T420613: Unresponsive management for backup2005.mgmt:22 from Backlog to Hardware Failure / Troubleshoot on the ops-codfw board.
Thu, Mar 19, 6:09 PM · SRE, DC-Ops, ops-codfw

Wed, Mar 18

Jhancock.wm closed T419817: Disk (sdm) failed in thanos-be2008 as Resolved.

you're welcome!

Wed, Mar 18, 4:51 PM · ops-codfw, DC-Ops, SRE, SRE-swift-storage
Jhancock.wm moved T419753: Decommission codfw cp hosts cp2027-cp2040 from Backlog to Decommission on the ops-codfw board.
Wed, Mar 18, 4:36 PM · SRE, DC-Ops, ops-codfw, decommission-hardware, Traffic
Jhancock.wm added a comment to T419817: Disk (sdm) failed in thanos-be2008.

it's a good thing we didn't wait for dell to send us a new drive. their portal says shipped but the drive still hasn't been delivered to codfw.

Wed, Mar 18, 3:24 PM · ops-codfw, DC-Ops, SRE, SRE-swift-storage
Jhancock.wm closed T420320: bast2003 boot failure as Resolved.
Wed, Mar 18, 3:12 PM · ops-codfw, DC-Ops, SRE

Tue, Mar 17

Jhancock.wm added a comment to T420320: bast2003 boot failure.

@MoritzMuehlenhoff redid it in trixie.

Tue, Mar 17, 7:43 PM · ops-codfw, DC-Ops, SRE
Jhancock.wm added a comment to T420228: phab2002: SEL System Event:, System Board Front LED Panel, Critical, management controller unavailable.

yes. that matches the time. this error can be from a firmware issue.

Tue, Mar 17, 4:58 PM · collaboration-services, ops-codfw, Phabricator, DC-Ops
Jhancock.wm updated subscribers of T420320: bast2003 boot failure.

got into the idrac/console and found the server as this:

Booting from Hard Drive C:
GRUB

rebooted and went to the same screen.
contacted @Papaul for consult. corrupted or missing config file.
drive 0 was replaced back on December 1st. T410195 might have happened then.
also found Critical: CPU 1 machine check error, but the error resolved on it's own. no persistent errors in the idrac logs. likely what caused a reboot and finding the corrupted config file.
got greenlight to reimage the host. going to update the bios and idrac firmware as well

Tue, Mar 17, 4:41 PM · ops-codfw, DC-Ops, SRE

Mar 17 2026

Jhancock.wm added a comment to T420308: Unresponsive management for backup2005.mgmt:22.

related to T419970. will clear soon

Mar 17 2026, 3:17 PM · SRE, DC-Ops, ops-codfw
Jhancock.wm moved T420308: Unresponsive management for backup2005.mgmt:22 from Backlog to Hardware Failure / Troubleshoot on the ops-codfw board.
Mar 17 2026, 3:06 PM · SRE, DC-Ops, ops-codfw
Jhancock.wm moved T420282: cloudcephmon2007-dev service implementation from Backlog to Non-Urgent on the ops-codfw board.
Mar 17 2026, 3:06 PM · cloud-services-team, SRE, DC-Ops, ops-codfw
Jhancock.wm moved T420320: bast2003 boot failure from Backlog to Hardware Failure / Troubleshoot on the ops-codfw board.
Mar 17 2026, 3:05 PM · ops-codfw, DC-Ops, SRE
Jhancock.wm added a comment to T420228: phab2002: SEL System Event:, System Board Front LED Panel, Critical, management controller unavailable.

soft rebooted the idrac

Mar 17 2026, 2:54 PM · collaboration-services, ops-codfw, Phabricator, DC-Ops

Mar 16 2026

Jhancock.wm closed T416396: Q3:rack/setup/install cloudcephmon2007-dev as Resolved.

@Andrew i swear i didn't forget about you. This is complete.

Mar 16 2026, 8:45 PM · SRE, DC-Ops, ops-codfw
Jhancock.wm moved T420228: phab2002: SEL System Event:, System Board Front LED Panel, Critical, management controller unavailable from Backlog to Hardware Failure / Troubleshoot on the ops-codfw board.
Mar 16 2026, 8:12 PM · collaboration-services, ops-codfw, Phabricator, DC-Ops
Jhancock.wm added a comment to T420228: phab2002: SEL System Event:, System Board Front LED Panel, Critical, management controller unavailable.

@Aklapper this notifies on the physical server if something goes wrong. like if a power supply goes bad, it will flash a light.

Mar 16 2026, 7:11 PM · collaboration-services, ops-codfw, Phabricator, DC-Ops
Jhancock.wm added a comment to T419970: backup2005 power supplies fried or overvoltage.

It's not posting at the moment. I have some tricks to try today and if not, i have some decommed servers i can pull parts from. It's my intention to have it back up by the end of my day. TY for letting me know that errors are okay. just need to get it to boot.

Mar 16 2026, 2:17 PM · SRE, DC-Ops, Data-Persistence-Backup, media-backups, ops-codfw
Jhancock.wm closed T415292: Netbox Reporting Triage - Week of 2026-01-22 as Resolved.
Mar 16 2026, 2:13 PM · DC-Ops

Mar 13 2026

Jhancock.wm added a comment to T419970: backup2005 power supplies fried or overvoltage.

it's definitely having some issues.
-power cables reseated with no results.
-replaced the BP1 and that error went away. system board error persists.
-updated idrac to see if i can get better error messages. update successful, but didn't help.
-reseating CPU1 cause that's the most likely cause for persistent error.
-error persists looking up more stuff. coming back to this.

Mar 13 2026, 5:48 PM · SRE, DC-Ops, Data-Persistence-Backup, media-backups, ops-codfw
Jhancock.wm added a comment to T419738: decommission cloudgw2002-dev.

that did it. ty

Mar 13 2026, 3:32 PM · SRE, DC-Ops, ops-codfw, cloud-services-team, decommission-hardware
Jhancock.wm closed T419738: decommission cloudgw2002-dev as Resolved.
Mar 13 2026, 3:32 PM · SRE, DC-Ops, ops-codfw, cloud-services-team, decommission-hardware
Jhancock.wm moved T419970: backup2005 power supplies fried or overvoltage from Backlog to Hardware Failure / Troubleshoot on the ops-codfw board.
Mar 13 2026, 2:27 PM · SRE, DC-Ops, Data-Persistence-Backup, media-backups, ops-codfw

Mar 12 2026

Jhancock.wm updated subscribers of T419738: decommission cloudgw2002-dev.

@Papaul i ran into an issue running the offline script for this one. is this a thing i can fix?

An exception occurred: IntegrityError: update or delete on table "ipam_ipaddress" violates foreign key constraint "dcim_device_oob_ip_id_5e7219c1_fk_ipam_ipaddress_id" on table "dcim_device" DETAIL: Key (id)=(22652) is still referenced from table "dcim_device".
Mar 12 2026, 7:35 PM · SRE, DC-Ops, ops-codfw, cloud-services-team, decommission-hardware
Jhancock.wm closed T418393: decommission frqueue2002.frack.codfw.wmnet as Resolved.
Mar 12 2026, 7:32 PM · SRE, ops-codfw, DC-Ops, Patch-For-Review, fundraising-tech-ops, decommission-hardware
Jhancock.wm closed T418225: decommission moss-fe200[1-2].codfw.wmnet as Resolved.
Mar 12 2026, 7:31 PM · DC-Ops, SRE, Ceph, SRE-swift-storage, ops-codfw, decommission-hardware
Jhancock.wm closed T417735: decommission ms-be20[57-61].codfw.wmnet as Resolved.
Mar 12 2026, 7:30 PM · SRE, DC-Ops, SRE-swift-storage, ops-codfw, decommission-hardware
Jhancock.wm moved T419738: decommission cloudgw2002-dev from Racking Tasks to Decommission on the ops-codfw board.
Mar 12 2026, 4:10 PM · SRE, DC-Ops, ops-codfw, cloud-services-team, decommission-hardware
Jhancock.wm moved T419738: decommission cloudgw2002-dev from Backlog to Racking Tasks on the ops-codfw board.
Mar 12 2026, 4:10 PM · SRE, DC-Ops, ops-codfw, cloud-services-team, decommission-hardware
Jhancock.wm moved T419747: Possible hardware issues on wikikube-worker2332.codfw.wmnet from Backlog to Hardware Failure / Troubleshoot on the ops-codfw board.
Mar 12 2026, 4:10 PM · SRE, ops-codfw, DC-Ops, ServiceOps new
Jhancock.wm added a comment to T419747: Possible hardware issues on wikikube-worker2332.codfw.wmnet.

yes, that's the most likely cause.

Mar 12 2026, 3:58 PM · SRE, ops-codfw, DC-Ops, ServiceOps new
Jhancock.wm added a comment to T419747: Possible hardware issues on wikikube-worker2332.codfw.wmnet.

there was a loose power cable earlier this week. it might have powered off during that time. power cables have been secured since. T419462

Mar 12 2026, 3:47 PM · SRE, ops-codfw, DC-Ops, ServiceOps new
Jhancock.wm claimed T419817: Disk (sdm) failed in thanos-be2008.
Mar 12 2026, 3:38 PM · ops-codfw, DC-Ops, SRE, SRE-swift-storage
Jhancock.wm added a comment to T419817: Disk (sdm) failed in thanos-be2008.

replaced it with an offical warranty disk we had on hand. replacing that warranty disk with a new one from Dell. SR223804656

Mar 12 2026, 3:38 PM · ops-codfw, DC-Ops, SRE, SRE-swift-storage
Jhancock.wm moved T419817: Disk (sdm) failed in thanos-be2008 from Backlog to Hardware Failure / Troubleshoot on the ops-codfw board.
Mar 12 2026, 1:53 PM · ops-codfw, DC-Ops, SRE, SRE-swift-storage

Mar 11 2026

Jhancock.wm updated the task description for T413088: FY2526 Q3:rack/setup/install ms-be209[56].
Mar 11 2026, 3:56 PM · SRE, SRE-swift-storage, ops-codfw, DC-Ops
Jhancock.wm closed T413088: FY2526 Q3:rack/setup/install ms-be209[56] as Resolved.

i need to find a way to wipe it without root access. I'd like to able to fix that issue without pulling someone else in to do that. TY for getting that and explaining it!

Mar 11 2026, 3:55 PM · SRE, SRE-swift-storage, ops-codfw, DC-Ops