Page MenuHomePhabricator

Jhancock.wm (Jenn Hancock)
User

Projects

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Friday

  • Clear sailing ahead.

User Details

User Since
Dec 5 2022, 4:37 PM (75 w, 1 d)
Availability
Available
LDAP User
Jhancock.wm
MediaWiki User
Jhancock.wm [ Global Accounts ]

Recent Activity

Yesterday

Jhancock.wm added a comment to T364863: InterfaceSpeedError - mw2286.

the cable or the 1G SFP might need to be replaced. can we downtime the server for a small window to test the cabling?

Tue, May 14, 2:50 PM · serviceops, SRE, ops-codfw
Jhancock.wm closed T364810: ManagementSSHDown as Resolved.

rebooted. all in C6 up now.

Tue, May 14, 2:46 PM · SRE, ops-codfw
Jhancock.wm moved T364863: InterfaceSpeedError - mw2286 from Backlog to Hardware Failure / Troubleshoot on the ops-codfw board.
Tue, May 14, 2:45 PM · serviceops, SRE, ops-codfw
Jhancock.wm closed T364809: ManagementSSHDown as Resolved.

reseated. pings on mgmt.

Tue, May 14, 1:50 PM · SRE, ops-codfw

Mon, May 13

Jhancock.wm updated subscribers of T363209: Q4:rack/setup/install kafka-main200[6789] & kafka-main2010.

@Papaul, This was the last screen I got. The servers all have the OS installed and it failed at the certificate stage. I think it's cause I used python 7 instead of 5. when I attempt to retry with 5, it fails.

Mon, May 13, 3:13 PM · SRE, ops-codfw, serviceops, DC-Ops
Jhancock.wm updated Other Assignee for T363209: Q4:rack/setup/install kafka-main200[6789] & kafka-main2010, added: Papaul.
Mon, May 13, 1:36 PM · SRE, ops-codfw, serviceops, DC-Ops
Jhancock.wm closed T364633: connected console ports attached to unracked device as Resolved.

Initiated: 2024-05-13 13:35 Duration: 0 minutes, 1.60 seconds Completed

Mon, May 13, 1:36 PM · SRE, ops-codfw
Jhancock.wm moved T364559: Create (or teach Andrew how to create) private connections+dns entries for new cloudcontrols from Backlog to Codfw Switch migration on the ops-codfw board.
Mon, May 13, 1:33 PM · SRE, netops, ops-codfw, Infrastructure-Foundations, cloud-services-team

Thu, May 9

Jhancock.wm updated the task description for T363209: Q4:rack/setup/install kafka-main200[6789] & kafka-main2010.
Thu, May 9, 10:33 PM · SRE, ops-codfw, serviceops, DC-Ops

Wed, May 8

Jhancock.wm updated the task description for T363209: Q4:rack/setup/install kafka-main200[6789] & kafka-main2010.
Wed, May 8, 2:56 PM · SRE, ops-codfw, serviceops, DC-Ops
Jhancock.wm closed T364439: ManagementSSHDown as Resolved.

uplink for msw2 was degraded and flapping. repaired. staying up now.

Wed, May 8, 2:14 PM · SRE, ops-codfw
Jhancock.wm closed T364464: Comms to msw-d2-codfw down as Resolved.

port 47 on the maw was going up and down on it's own. replaced the rj-45 terminator. remained steady.

Wed, May 8, 2:11 PM · netops, SRE, Infrastructure-Foundations, ops-codfw

Tue, May 7

Jhancock.wm closed T364358: Inbound interface errors as Resolved.
Tue, May 7, 2:17 PM · SRE, ops-codfw

Mon, May 6

Jhancock.wm added a comment to T362938: Degraded RAID on mw2382.

Forgot I left it there. All yours now!

Mon, May 6, 3:18 PM · serviceops, SRE, ops-codfw
Jhancock.wm claimed T363209: Q4:rack/setup/install kafka-main200[6789] & kafka-main2010.
Mon, May 6, 2:51 PM · SRE, ops-codfw, serviceops, DC-Ops

Thu, May 2

Jhancock.wm added a comment to T362938: Degraded RAID on mw2382.

@JMeybohm papaul helped me identify the missing disk. I replaced it with a compatible drive. please let me know if that fixed the issue. Thanks.

Thu, May 2, 4:27 PM · serviceops, SRE, ops-codfw
Jhancock.wm closed T363926: PowerSupplyFailure as Resolved.

reseated psu2 and cable. alert cleared on machine.

Thu, May 2, 1:19 PM · SRE, ops-codfw

Wed, May 1

Jhancock.wm closed T363838: Degraded RAID on mw2382 as Declined.

see T362938

Wed, May 1, 2:53 PM · SRE, ops-codfw
Jhancock.wm closed T363847: Degraded RAID on mw2382 as Declined.

see T362938

Wed, May 1, 2:53 PM · SRE, ops-codfw
Jhancock.wm moved T363838: Degraded RAID on mw2382 from Backlog to Hardware Failure / Troubleshoot on the ops-codfw board.
Wed, May 1, 2:51 PM · SRE, ops-codfw
Jhancock.wm moved T363847: Degraded RAID on mw2382 from Backlog to Hardware Failure / Troubleshoot on the ops-codfw board.
Wed, May 1, 2:51 PM · SRE, ops-codfw
Jhancock.wm closed T363756: PowerSupplyFailure as Resolved.

removed the error by rebooting the idrac

Wed, May 1, 2:49 PM · SRE, ops-codfw
Jhancock.wm claimed T363756: PowerSupplyFailure.

fixed the main source of the alert (PSU and power cable reseated) but still getting the following error.

Wed, May 1, 2:41 PM · SRE, ops-codfw
Jhancock.wm moved T363756: PowerSupplyFailure from Backlog to Hardware Failure / Troubleshoot on the ops-codfw board.
Wed, May 1, 2:24 PM · SRE, ops-codfw

Tue, Apr 30

Jhancock.wm added a comment to T362938: Degraded RAID on mw2382.

idrac upgraded to 7.0.0. won't go any higher. Bios is already at 2.9.3. Reset the factory defaults and tried rebooting the idrac. reseated the backplane. None of these have fixed the issue. Going to look into getting a replacement part. Might need to be salvaged from decommissioned servers. Will update when we have a solution

Tue, Apr 30, 4:36 PM · serviceops, SRE, ops-codfw
Jhancock.wm closed T363783: Inbound interface errors as Resolved.

known issue with no impact

Tue, Apr 30, 3:10 PM · SRE, ops-codfw
Jhancock.wm added a comment to T362938: Degraded RAID on mw2382.

draining didn't fix it. I'm gonna update the firmware and bios and then see where it is.

Tue, Apr 30, 2:07 PM · serviceops, SRE, ops-codfw

Mon, Apr 29

Jhancock.wm closed T362801: decommission db2103.codfw.wmnet as Resolved.
Mon, Apr 29, 6:36 PM · SRE, ops-codfw, decommission-hardware
Jhancock.wm closed T362799: decommission db2106.codfw.wmnet as Resolved.
Mon, Apr 29, 6:35 PM · SRE, ops-codfw, decommission-hardware
Jhancock.wm closed T362800: decommission db2105.codfw.wmnet as Resolved.
Mon, Apr 29, 6:35 PM · SRE, ops-codfw, decommission-hardware
Jhancock.wm closed T362798: decommission db2107.codfw.wmnet as Resolved.
Mon, Apr 29, 6:35 PM · SRE, ops-codfw, decommission-hardware
Jhancock.wm closed T362797: decommission db2108.codfw.wmnet as Resolved.
Mon, Apr 29, 6:34 PM · SRE, ops-codfw, decommission-hardware
Jhancock.wm closed T362796: decommission db2109.codfw.wmnet as Resolved.
Mon, Apr 29, 6:34 PM · SRE, ops-codfw, decommission-hardware
Jhancock.wm closed T362795: decommission db2110.codfw.wmnet as Resolved.
Mon, Apr 29, 6:33 PM · SRE, ops-codfw, decommission-hardware
Jhancock.wm closed T362794: decommission db2111.codfw.wmnet as Resolved.
Mon, Apr 29, 6:32 PM · SRE, ops-codfw, decommission-hardware
Jhancock.wm closed T362793: decommission db2112.codfw.wmnet as Resolved.
Mon, Apr 29, 6:32 PM · SRE, ops-codfw, decommission-hardware
Jhancock.wm closed T362792: decommission db2113.codfw.wmnet as Resolved.
Mon, Apr 29, 6:32 PM · SRE, ops-codfw, decommission-hardware
Jhancock.wm closed T362790: decommission db2119.codfw.wmnet as Resolved.
Mon, Apr 29, 6:31 PM · SRE, ops-codfw, decommission-hardware
Jhancock.wm closed T362787: decommission db2120.codfw.wmnet as Resolved.
Mon, Apr 29, 6:30 PM · SRE, ops-codfw, decommission-hardware
Jhancock.wm added a comment to T362938: Degraded RAID on mw2382.

Apologies for the wait on this one. I checked out the server and the drives look to be working physically. But when I logged into the idrac it sees zero disks. Checked the warranty and it expired in February. I do have a pair of decommed 960GB drives that could replace it. However, I cannot tell which drive needs to be replaced. Please let me know if this still needs attention and how I can help.

Mon, Apr 29, 5:12 PM · serviceops, SRE, ops-codfw

Tue, Apr 23

Jhancock.wm moved T362938: Degraded RAID on mw2382 from Backlog to Hardware Failure / Troubleshoot on the ops-codfw board.
Tue, Apr 23, 12:37 PM · serviceops, SRE, ops-codfw
Jhancock.wm closed T363120: Inbound interface errors as Resolved.

known issue with no impact

Tue, Apr 23, 12:37 PM · SRE, ops-codfw

Mon, Apr 22

Jhancock.wm updated the task description for T362729: Q4:rack/setup/install cp70[01-16].
Mon, Apr 22, 5:23 PM · Traffic, ops-magru, DC-Ops
Jhancock.wm updated the task description for T362730: Q4:rack/setup/install magru misc servers.
Mon, Apr 22, 5:22 PM · Traffic, netops, ops-magru, DC-Ops, Infrastructure-Foundations
Jhancock.wm updated the task description for T362730: Q4:rack/setup/install magru misc servers.
Mon, Apr 22, 4:07 PM · Traffic, netops, ops-magru, DC-Ops, Infrastructure-Foundations
Jhancock.wm updated the task description for T362729: Q4:rack/setup/install cp70[01-16].
Mon, Apr 22, 4:06 PM · Traffic, ops-magru, DC-Ops
Jhancock.wm closed Unknown Object (Task), a subtask of T346722: Sao Paulo, Brazil, South America POP tracking task, as Resolved.
Mon, Apr 22, 4:01 PM · ops-magru, Patch-For-Review

Thu, Apr 18

Jhancock.wm added a comment to T361525: Degraded RAID on elastic2088.

All tests passed on the diagnostic test, including the pci bus. It's pinging on the idrac and the network ips.
@RKemper give it another go. @ me if you run into an issue again.

Thu, Apr 18, 6:26 PM · ops-codfw, Data-Platform-SRE (2024.04.15 - 2024.05.05)
Jhancock.wm added a comment to T361525: Degraded RAID on elastic2088.

Tried to run a diagnostic from the Lifecycle controller. Haunted because of a DIMM error on B4. It's been replaced. re-running the diagnostic to check for any more issues.

Thu, Apr 18, 4:01 PM · ops-codfw, Data-Platform-SRE (2024.04.15 - 2024.05.05)

Wed, Apr 17

Jhancock.wm moved T362787: decommission db2120.codfw.wmnet from Backlog to Decommission on the ops-codfw board.
Wed, Apr 17, 3:58 PM · SRE, ops-codfw, decommission-hardware
Jhancock.wm added a comment to T361525: Degraded RAID on elastic2088.

@RKemper I am going to check it out and get back in touch with dell. These are the same errors we were getting before the card was replaced.

Wed, Apr 17, 1:14 PM · ops-codfw, Data-Platform-SRE (2024.04.15 - 2024.05.05)
Jhancock.wm added a project to T361525: Degraded RAID on elastic2088: ops-codfw.
Wed, Apr 17, 1:12 PM · ops-codfw, Data-Platform-SRE (2024.04.15 - 2024.05.05)

Tue, Apr 16

Jhancock.wm added a comment to T358542: Netbox errors caused by system board replacement .

I updated the sheet with the needed information but spaced submitting that to this task. Please let me know if there's anything else I can do to help out with the tasks. Thanks!

Tue, Apr 16, 4:53 PM · SRE, ops-codfw
Jhancock.wm closed T361229: titan200[12] RAM/SSD upgrade coordination as Resolved.
Tue, Apr 16, 4:47 PM · SRE Observability (FY2023/2024-Q4), SRE, observability, ops-codfw
Jhancock.wm added a comment to T362438: decommission cloudbackup200[12].codfw.wmnet.

ty!

Tue, Apr 16, 2:30 PM · SRE, ops-codfw, cloud-services-team, decommission-hardware
Jhancock.wm closed T362438: decommission cloudbackup200[12].codfw.wmnet as Resolved.
Tue, Apr 16, 2:29 PM · SRE, ops-codfw, cloud-services-team, decommission-hardware
Jhancock.wm updated subscribers of T362438: decommission cloudbackup200[12].codfw.wmnet.

@Papaul @Andrew
what are we doing with cloudbackup2001-array1 and cloudbackup2002-array1?

Tue, Apr 16, 1:51 PM · SRE, ops-codfw, cloud-services-team, decommission-hardware
Jhancock.wm closed T362465: ManagementSSHDown as Resolved.

alert cleared. being decommed in T362438

Tue, Apr 16, 1:28 PM · SRE, ops-codfw
Jhancock.wm moved T361305: decommission elastic20[37-54].codfw.wmnet from Decommission to Blocked on the ops-codfw board.
Tue, Apr 16, 1:25 PM · SRE, ops-codfw, decommission-hardware
Jhancock.wm moved T346661: cloud: prepare codfw for expansion (racks, switches, ceph) from Racking Tasks to Blocked on the ops-codfw board.
Tue, Apr 16, 1:25 PM · User-dcaro, SRE, cloud-services-team (Hardware), ops-codfw, User-aborrero
Jhancock.wm moved T356216: Q#:rack/setup/install (2) cloudbackup hosts from Racking Tasks to Blocked on the ops-codfw board.
Tue, Apr 16, 1:25 PM · SRE, ops-codfw, cloud-services-team (Hardware), DC-Ops
Jhancock.wm moved T361229: titan200[12] RAM/SSD upgrade coordination from Racking Tasks to Blocked on the ops-codfw board.
Tue, Apr 16, 1:24 PM · SRE Observability (FY2023/2024-Q4), SRE, observability, ops-codfw
Jhancock.wm closed T362311: Decommission db2101 (was: db2101 crashed) as Resolved.
Tue, Apr 16, 1:24 PM · SRE, ops-codfw, decommission-hardware, DC-Ops, database-backups, Data-Persistence-Backup, DBA
Jhancock.wm closed T362311: Decommission db2101 (was: db2101 crashed), a subtask of T358741: Decommission db2096-db2120, as Resolved.
Tue, Apr 16, 1:22 PM · DBA
Jhancock.wm closed T362596: Inbound interface errors as Resolved.

known issue, no impact

Tue, Apr 16, 1:36 AM · SRE, ops-codfw
Jhancock.wm moved T362311: Decommission db2101 (was: db2101 crashed) from Backlog to Decommission on the ops-codfw board.
Tue, Apr 16, 1:35 AM · SRE, ops-codfw, decommission-hardware, DC-Ops, database-backups, Data-Persistence-Backup, DBA
Jhancock.wm moved T362596: Inbound interface errors from Backlog to Hardware Failure / Troubleshoot on the ops-codfw board.
Tue, Apr 16, 1:35 AM · SRE, ops-codfw

Mon, Apr 15

Jhancock.wm renamed T354896: Q3:rack/setup/install cloudcontrol2009-dev.codfw.wmnet from Q3:rack/setup/install cloudcontrol2006-dev.codfw.wmnet to Q3:rack/setup/install cloudcontrol2009-dev.codfw.wmnet.
Mon, Apr 15, 4:51 PM · SRE, ops-codfw, cloud-services-team (Hardware), DC-Ops
Jhancock.wm claimed T354896: Q3:rack/setup/install cloudcontrol2009-dev.codfw.wmnet.

@cmooney what is the vlan for this server?

Mon, Apr 15, 4:41 PM · SRE, ops-codfw, cloud-services-team (Hardware), DC-Ops
Jhancock.wm updated the task description for T354896: Q3:rack/setup/install cloudcontrol2009-dev.codfw.wmnet.
Mon, Apr 15, 4:35 PM · SRE, ops-codfw, cloud-services-team (Hardware), DC-Ops
Jhancock.wm closed T362550: PowerSupplyFailure as Resolved.

reseated blue cable

Mon, Apr 15, 4:31 PM · SRE, ops-codfw
Jhancock.wm moved T362550: PowerSupplyFailure from Backlog to Hardware Failure / Troubleshoot on the ops-codfw board.
Mon, Apr 15, 4:29 PM · SRE, ops-codfw
Jhancock.wm moved T362465: ManagementSSHDown from Backlog to Hardware Failure / Troubleshoot on the ops-codfw board.
Mon, Apr 15, 1:52 PM · SRE, ops-codfw

Apr 12 2024

Jhancock.wm updated the task description for T354896: Q3:rack/setup/install cloudcontrol2009-dev.codfw.wmnet.
Apr 12 2024, 5:17 PM · SRE, ops-codfw, cloud-services-team (Hardware), DC-Ops
Jhancock.wm added a comment to T361525: Degraded RAID on elastic2088.

@bking I got the HBA card replaced and it booted without any issues that I can find in the iDRAC. Can you check CLI to see if the raid is still degraded?

Apr 12 2024, 4:52 PM · ops-codfw, Data-Platform-SRE (2024.04.15 - 2024.05.05)

Apr 11 2024

Jhancock.wm added a comment to T361525: Degraded RAID on elastic2088.

Update: Dell finally agreed to replace the HBA card. I sent the shipping address confirmation just now. Hopefully it'll be here tomorrow. Latest Monday morning.

Apr 11 2024, 1:40 PM · ops-codfw, Data-Platform-SRE (2024.04.15 - 2024.05.05)

Apr 9 2024

Jhancock.wm moved T362122: decommission wdqs1025.eqiad.wmnet from Backlog to Decommission on the ops-codfw board.
Apr 9 2024, 1:07 PM · ops-eqiad, SRE, decommission-hardware
Jhancock.wm closed T362126: Inbound interface errors as Resolved.
Apr 9 2024, 1:01 PM · SRE, ops-codfw
Jhancock.wm moved T362126: Inbound interface errors from Backlog to Hardware Failure / Troubleshoot on the ops-codfw board.
Apr 9 2024, 12:59 PM · SRE, ops-codfw

Apr 5 2024

Jhancock.wm claimed T361851: db2214 crashed.

Here are some more logs.

Apr 5 2024, 3:11 PM · SRE, ops-codfw, Patch-For-Review, DBA
Jhancock.wm moved T361851: db2214 crashed from Backlog to Hardware Failure / Troubleshoot on the ops-codfw board.
Apr 5 2024, 2:42 PM · SRE, ops-codfw, Patch-For-Review, DBA
Jhancock.wm added a comment to T361525: Degraded RAID on elastic2088.

follow up: still going back and forth with Dell.

Apr 5 2024, 2:34 PM · ops-codfw, Data-Platform-SRE (2024.04.15 - 2024.05.05)

Apr 4 2024

Jhancock.wm closed T361779: decommission db2104.codfw.wmnet as Resolved.
Apr 4 2024, 4:28 PM · SRE, ops-codfw, DC-Ops, DBA, decommission-hardware
Jhancock.wm closed T361779: decommission db2104.codfw.wmnet, a subtask of T361543: Upgrade s2 to MariaDB 10.6, as Resolved.
Apr 4 2024, 4:27 PM · Patch-For-Review, DBA
Jhancock.wm closed T361779: decommission db2104.codfw.wmnet, a subtask of T358741: Decommission db2096-db2120, as Resolved.
Apr 4 2024, 4:27 PM · DBA
Jhancock.wm added a comment to T361856: Moving 1G servers out of rack D4 in prep of switch migration.

refresh task: https://phabricator.wikimedia.org/T325215

Apr 4 2024, 4:22 PM · serviceops, SRE, ops-codfw
Jhancock.wm created T361856: Moving 1G servers out of rack D4 in prep of switch migration.
Apr 4 2024, 4:21 PM · serviceops, SRE, ops-codfw
Jhancock.wm updated the task description for T361229: titan200[12] RAM/SSD upgrade coordination.
Apr 4 2024, 3:53 PM · SRE Observability (FY2023/2024-Q4), SRE, observability, ops-codfw
Jhancock.wm updated the task description for T361229: titan200[12] RAM/SSD upgrade coordination.
Apr 4 2024, 3:24 PM · SRE Observability (FY2023/2024-Q4), SRE, observability, ops-codfw
Jhancock.wm moved T361779: decommission db2104.codfw.wmnet from Backlog to Decommission on the ops-codfw board.
Apr 4 2024, 1:31 PM · SRE, ops-codfw, DC-Ops, DBA, decommission-hardware

Apr 3 2024

Jhancock.wm closed T361584: decommission db2100.codfw.wmnet as Resolved.
Apr 3 2024, 3:08 PM · SRE, ops-codfw, DBA, decommission-hardware
Jhancock.wm closed T361584: decommission db2100.codfw.wmnet, a subtask of T358741: Decommission db2096-db2120, as Resolved.
Apr 3 2024, 3:07 PM · DBA
Jhancock.wm updated the task description for T361229: titan200[12] RAM/SSD upgrade coordination.
Apr 3 2024, 1:29 PM · SRE Observability (FY2023/2024-Q4), SRE, observability, ops-codfw
Jhancock.wm moved T361584: decommission db2100.codfw.wmnet from Backlog to Decommission on the ops-codfw board.
Apr 3 2024, 1:28 PM · SRE, ops-codfw, DBA, decommission-hardware

Apr 2 2024

Jhancock.wm closed T361603: aqs2001.codfw.wmnet down as Resolved.

replaced the SFP, server is pingable again.

Apr 2 2024, 3:58 PM · SRE, Cassandra, ops-codfw
Jhancock.wm claimed T361525: Degraded RAID on elastic2088.

this error reoccured.

Apr 2 2024, 2:29 PM · ops-codfw, Data-Platform-SRE (2024.04.15 - 2024.05.05)
Jhancock.wm moved T361525: Degraded RAID on elastic2088 from Backlog to Hardware Failure / Troubleshoot on the ops-codfw board.
Apr 2 2024, 2:14 PM · ops-codfw, Data-Platform-SRE (2024.04.15 - 2024.05.05)

Apr 1 2024

Jhancock.wm closed T361305: decommission elastic20[37-54].codfw.wmnet as Resolved.
Apr 1 2024, 4:57 PM · SRE, ops-codfw, decommission-hardware
Jhancock.wm closed T361305: decommission elastic20[37-54].codfw.wmnet, a subtask of T358882: Decommission elastic2037-2054, as Resolved.
Apr 1 2024, 4:56 PM · Data-Platform-SRE (2024.03.25 - 2024.04.14)
Jhancock.wm added a comment to T361305: decommission elastic20[37-54].codfw.wmnet.

elastic2049 was already decommissioned under https://phabricator.wikimedia.org/T313842

Apr 1 2024, 3:43 PM · SRE, ops-codfw, decommission-hardware