Page MenuHomePhabricator

Papaul (Papaul)
User

Projects (7)

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Thursday

  • Clear sailing ahead.

User Details

User Since
Dec 18 2014, 3:39 PM (491 w, 5 d)
Availability
Available
LDAP User
Papaul
MediaWiki User
Unknown

Recent Activity

Today

Papaul closed T365455: msw1-codfw links are connected to wrong ports as Resolved.

complete

Tue, May 21, 4:56 PM · SRE, DC-Ops, ops-codfw
Papaul moved T365204: Problem re-imaging hosts on row-wide vlan on EVPN switches from Hardware Failure / Troubleshoot to Codfw Switch migration on the ops-codfw board.
Tue, May 21, 1:47 PM · DC-Ops, Patch-For-Review, ops-codfw, netops, Infrastructure-Foundations, SRE

Yesterday

Papaul added a comment to T365204: Problem re-imaging hosts on row-wide vlan on EVPN switches.

@Jhancock.wm it looks like we have another sretest2002 setup in b7 the switch has that configuration already so i went and delete the one in b7 since you have another in a8.

Mon, May 20, 11:39 PM · DC-Ops, Patch-For-Review, ops-codfw, netops, Infrastructure-Foundations, SRE
Restricted Application added a project to T362033: Degraded RAID on aqs1013: DC-Ops.

@Eevans like you mentioned on IRC "it's the same slot(s) that are having issues" I think we need to replace the main board and see. We have 4 decom PowerEdge R440's. I will ping @Jclark-ctr or @VRiley-WMF to see if they can coordinate with you and try to pull the main board from one of those servers and replace the one in aqs1013.After that, you can try to re-image the server.
@Jclark-ctr @VRiley-WMF please see above if you have time to work with @Eevans on this.
Thanks

Mon, May 20, 10:14 PM · DC-Ops, Cassandra, SRE, ops-eqiad

Sat, May 18

Papaul moved T365204: Problem re-imaging hosts on row-wide vlan on EVPN switches from Backlog to Hardware Failure / Troubleshoot on the ops-codfw board.
Sat, May 18, 12:05 AM · DC-Ops, Patch-For-Review, ops-codfw, netops, Infrastructure-Foundations, SRE
Papaul moved T365291: ml-serve2002 memory errors on DIMM_B1 from Backlog to Hardware Failure / Troubleshoot on the ops-codfw board.
Sat, May 18, 12:05 AM · SRE, Machine-Learning-Team, ops-codfw, DC-Ops
Papaul closed T363209: Q4:rack/setup/install kafka-main200[6789] & kafka-main2010 as Resolved.
Sat, May 18, 12:05 AM · SRE, ops-codfw, serviceops, DC-Ops
Papaul updated the task description for T363209: Q4:rack/setup/install kafka-main200[6789] & kafka-main2010.
Sat, May 18, 12:04 AM · SRE, ops-codfw, serviceops, DC-Ops
Papaul added a comment to T363209: Q4:rack/setup/install kafka-main200[6789] & kafka-main2010.

@Jhancock.wm thank you for working on this. Like I mentioned to you this morning the reason kafka-main2009 was failing is because it was contacting the wrong puppet server for cert request. (see below) what I did was to delete the cert req on puppetmaster and restart the re-image.

Sat, May 18, 12:04 AM · SRE, ops-codfw, serviceops, DC-Ops

Fri, May 17

Papaul added a comment to T363209: Q4:rack/setup/install kafka-main200[6789] & kafka-main2010.

Thank you will do

Fri, May 17, 7:10 PM · SRE, ops-codfw, serviceops, DC-Ops

Tue, May 14

Papaul updated the task description for T360789: codfw row C/D upgrade racking task.
Tue, May 14, 5:16 PM · SRE, Infrastructure-Foundations, netops, ops-codfw, DC-Ops
Papaul updated the task description for T360789: codfw row C/D upgrade racking task.
Tue, May 14, 4:10 PM · SRE, Infrastructure-Foundations, netops, ops-codfw, DC-Ops
Papaul updated the task description for T360789: codfw row C/D upgrade racking task.
Tue, May 14, 4:09 PM · SRE, Infrastructure-Foundations, netops, ops-codfw, DC-Ops
Papaul updated the task description for T360789: codfw row C/D upgrade racking task.
Tue, May 14, 4:08 PM · SRE, Infrastructure-Foundations, netops, ops-codfw, DC-Ops
Papaul moved T364809: ManagementSSHDown from Backlog to Hardware Failure / Troubleshoot on the ops-codfw board.
Tue, May 14, 1:28 PM · SRE, ops-codfw
Papaul moved T364810: ManagementSSHDown from Backlog to Hardware Failure / Troubleshoot on the ops-codfw board.
Tue, May 14, 1:28 PM · SRE, ops-codfw
Papaul closed T361871: codfw: use old asw switches from row A and B as msw switches in row C and D as Resolved.

All the old mgmt switch are back in place

Tue, May 14, 12:38 AM · SRE, netops, Infrastructure-Foundations, ops-codfw

Wed, May 8

Papaul added a comment to T364464: Comms to msw-d2-codfw down.

@cmooney I think this is just a human error issue. We were racking all the lsw1-d* yesterday and maybe we accidentally bumped into the cable. We will check once on site.

Wed, May 8, 12:52 PM · netops, SRE, Infrastructure-Foundations, ops-codfw
Papaul updated the task description for T360789: codfw row C/D upgrade racking task.
Wed, May 8, 12:42 AM · SRE, Infrastructure-Foundations, netops, ops-codfw, DC-Ops
Papaul closed T364097: Decom lsw1-a1-codfw as Resolved.
Wed, May 8, 12:36 AM · ops-codfw, netops, Infrastructure-Foundations, SRE
Papaul closed T364097: Decom lsw1-a1-codfw, a subtask of T364095: Codfw row C/D switch installation & configuration, as Resolved.
Wed, May 8, 12:36 AM · ops-codfw, netops, Infrastructure-Foundations, SRE
Papaul updated the task description for T364097: Decom lsw1-a1-codfw.
Wed, May 8, 12:35 AM · ops-codfw, netops, Infrastructure-Foundations, SRE

Mon, May 6

Papaul updated the task description for T360789: codfw row C/D upgrade racking task.
Mon, May 6, 7:58 PM · SRE, Infrastructure-Foundations, netops, ops-codfw, DC-Ops
Papaul updated the task description for T360789: codfw row C/D upgrade racking task.
Mon, May 6, 7:05 PM · SRE, Infrastructure-Foundations, netops, ops-codfw, DC-Ops

Fri, May 3

Papaul closed T364061: ManagementSSHDown as Resolved.

Resolved by rebooting both switches

Fri, May 3, 4:23 PM · SRE, ops-codfw
Papaul moved T364095: Codfw row C/D switch installation & configuration from Backlog to Codfw Switch migration on the ops-codfw board.
Fri, May 3, 4:23 PM · ops-codfw, netops, Infrastructure-Foundations, SRE
Papaul moved T364097: Decom lsw1-a1-codfw from Backlog to Decommission on the ops-codfw board.
Fri, May 3, 4:22 PM · ops-codfw, netops, Infrastructure-Foundations, SRE
Papaul added a comment to T363660: Degraded RAID on centrallog1002.

@Jclark-ctr @VRiley-WMF when the task was auto generated, it shows that disk sdg1 failed see in task description line below (F)

Fri, May 3, 4:44 AM · SRE, ops-eqiad

Sat, Apr 27

Papaul updated the task description for T362729: Q4:rack/setup/install cp70[01-16].
Sat, Apr 27, 8:41 PM · Traffic, ops-magru, DC-Ops

Apr 18 2024

Papaul moved T362793: decommission db2112.codfw.wmnet from Backlog to Decommission on the ops-codfw board.
Apr 18 2024, 4:37 PM · SRE, ops-codfw, decommission-hardware
Papaul moved T362792: decommission db2113.codfw.wmnet from Backlog to Decommission on the ops-codfw board.
Apr 18 2024, 4:37 PM · SRE, ops-codfw, decommission-hardware
Papaul moved T362790: decommission db2119.codfw.wmnet from Backlog to Decommission on the ops-codfw board.
Apr 18 2024, 4:37 PM · SRE, ops-codfw, decommission-hardware
Papaul moved T362794: decommission db2111.codfw.wmnet from Blocked to Decommission on the ops-codfw board.
Apr 18 2024, 4:36 PM · SRE, ops-codfw, decommission-hardware
Papaul moved T362794: decommission db2111.codfw.wmnet from Backlog to Blocked on the ops-codfw board.
Apr 18 2024, 4:36 PM · SRE, ops-codfw, decommission-hardware
Papaul moved T362795: decommission db2110.codfw.wmnet from Backlog to Decommission on the ops-codfw board.
Apr 18 2024, 4:36 PM · SRE, ops-codfw, decommission-hardware
Papaul moved T362797: decommission db2108.codfw.wmnet from Backlog to Decommission on the ops-codfw board.
Apr 18 2024, 4:36 PM · SRE, ops-codfw, decommission-hardware
Papaul moved T362798: decommission db2107.codfw.wmnet from Backlog to Decommission on the ops-codfw board.
Apr 18 2024, 4:35 PM · SRE, ops-codfw, decommission-hardware
Papaul moved T362799: decommission db2106.codfw.wmnet from Backlog to Decommission on the ops-codfw board.
Apr 18 2024, 4:35 PM · SRE, ops-codfw, decommission-hardware
Papaul moved T362796: decommission db2109.codfw.wmnet from Backlog to Decommission on the ops-codfw board.
Apr 18 2024, 4:35 PM · SRE, ops-codfw, decommission-hardware
Papaul moved T362800: decommission db2105.codfw.wmnet from Backlog to Decommission on the ops-codfw board.
Apr 18 2024, 4:34 PM · SRE, ops-codfw, decommission-hardware
Papaul moved T362801: decommission db2103.codfw.wmnet from Backlog to Decommission on the ops-codfw board.
Apr 18 2024, 3:29 PM · SRE, ops-codfw, decommission-hardware

Apr 17 2024

Papaul added a comment to T350179: Reimage cookbook on new eqiad hosts stuck at PXE booting.

@ssingh After 2 days working on this issue, I finally got at the bottom of the of problem. After many reboots on cp1115, I checked the model of the NIC (Broadcom 57414) and decided to test every single firmware available on Dell web site.
All the versions 22.xx , server pxe boot but give you the error "Failed to load ldlinux.c32"
versions 21.8x server boots sometimes and other times gets stuck
The last version, version 21.60.22.11 which was not listed on Dell product-support web site https://www.dell.com/support/home/en-us/product-support/servicetag/0-bTkxNWhsYWF2OFdQRm04TmF3QjhwZz090/drivers
is the only working version. I installed this version and reboot cp1115 six times and all the six times it did reboot in pxe without an issue.

Apr 17 2024, 5:34 AM · SRE, Traffic, SRE-swift-storage, ops-codfw, DC-Ops, ops-eqiad
Papaul closed T354896: Q3:rack/setup/install cloudcontrol2009-dev.codfw.wmnet as Resolved.

Complete

Apr 17 2024, 12:23 AM · SRE, ops-codfw, cloud-services-team (Hardware), DC-Ops
Papaul added a comment to T356216: Q#:rack/setup/install (2) cloudbackup hosts.

@Jhancock.wm anything else left to be done on this task?

Apr 17 2024, 12:23 AM · SRE, ops-codfw, cloud-services-team (Hardware), DC-Ops
Papaul updated the task description for T354896: Q3:rack/setup/install cloudcontrol2009-dev.codfw.wmnet.
Apr 17 2024, 12:21 AM · SRE, ops-codfw, cloud-services-team (Hardware), DC-Ops

Apr 16 2024

Papaul updated the task description for T354896: Q3:rack/setup/install cloudcontrol2009-dev.codfw.wmnet.
Apr 16 2024, 10:15 PM · SRE, ops-codfw, cloud-services-team (Hardware), DC-Ops
Papaul updated the task description for T354896: Q3:rack/setup/install cloudcontrol2009-dev.codfw.wmnet.
Apr 16 2024, 6:15 PM · SRE, ops-codfw, cloud-services-team (Hardware), DC-Ops
Papaul updated the task description for T354896: Q3:rack/setup/install cloudcontrol2009-dev.codfw.wmnet.
Apr 16 2024, 5:01 PM · SRE, ops-codfw, cloud-services-team (Hardware), DC-Ops
Papaul added a comment to T361305: decommission elastic20[37-54].codfw.wmnet.

@blink is there anything left for DC-ops to do on this task? Thanks

Apr 16 2024, 4:45 PM · SRE, ops-codfw, decommission-hardware
Papaul claimed T354896: Q3:rack/setup/install cloudcontrol2009-dev.codfw.wmnet.
Apr 16 2024, 1:08 PM · SRE, ops-codfw, cloud-services-team (Hardware), DC-Ops
Papaul added a comment to T350179: Reimage cookbook on new eqiad hosts stuck at PXE booting.

@ssingh unfortunately using the fs DAC didn't fix the issue. So we are back to zero. I am still working on it

Apr 16 2024, 1:03 PM · SRE, Traffic, SRE-swift-storage, ops-codfw, DC-Ops, ops-eqiad
Papaul closed T361871: codfw: use old asw switches from row A and B as msw switches in row C and D as Resolved.

Since Monday I setup in rack D1 and D2 the juniper switch as management switch and so far no issue. I had to :

  • Setup the root password same as the server management password
  • Disable the management interface
  • Disable the chassis alert for the management interface
  • Setup switch as management switch in Netbox to stop some Librenms and network alerts
Apr 16 2024, 1:02 PM · SRE, netops, Infrastructure-Foundations, ops-codfw

Apr 15 2024

Papaul added a comment to T354896: Q3:rack/setup/install cloudcontrol2009-dev.codfw.wmnet.

@Jhancock.wm cloud-hosts1-b1-codfw (2118)

Apr 15 2024, 5:08 PM · SRE, ops-codfw, cloud-services-team (Hardware), DC-Ops

Apr 13 2024

Papaul moved T362438: decommission cloudbackup200[12].codfw.wmnet from Backlog to Decommission on the ops-codfw board.
Apr 13 2024, 1:12 AM · SRE, ops-codfw, cloud-services-team, decommission-hardware
Papaul added a comment to T350179: Reimage cookbook on new eqiad hosts stuck at PXE booting.

@ssingh I checked also cp2042 we are using FS.com DAC.
FYI W2W= Wave2Wave

Apr 13 2024, 12:55 AM · SRE, Traffic, SRE-swift-storage, ops-codfw, DC-Ops, ops-eqiad

Apr 12 2024

Papaul updated subscribers of T350179: Reimage cookbook on new eqiad hosts stuck at PXE booting.

@ssingh one thing that I found between the server NiC and the switch interface is the vendor . In Eqiad, I checked 3 nodes cp1115, 1113 and 1100 all have for vendor under Transceiver inventory W2W and in Esams the vendor is FS. Since @ayounsi mentioned this morning that the request was not reaching the switch I focused on the media type used in esams and in eqiad so it looks like both connections are Direct Attach Copper but different vendor.

Apr 12 2024, 10:27 PM · SRE, Traffic, SRE-swift-storage, ops-codfw, DC-Ops, ops-eqiad

Apr 11 2024

Papaul updated the task description for T360789: codfw row C/D upgrade racking task.
Apr 11 2024, 1:08 PM · SRE, Infrastructure-Foundations, netops, ops-codfw, DC-Ops
Papaul moved T360789: codfw row C/D upgrade racking task from Racking Tasks to Codfw Switch migration on the ops-codfw board.
Apr 11 2024, 12:35 PM · SRE, Infrastructure-Foundations, netops, ops-codfw, DC-Ops
Papaul added a project to T361856: Moving 1G servers out of rack D4 in prep of switch migration: serviceops.
Apr 11 2024, 12:34 PM · serviceops, SRE, ops-codfw

Apr 8 2024

Papaul moved T361871: codfw: use old asw switches from row A and B as msw switches in row C and D from Backlog to Racking Tasks on the ops-codfw board.
Apr 8 2024, 8:51 PM · SRE, netops, Infrastructure-Foundations, ops-codfw
Papaul added a comment to T361871: codfw: use old asw switches from row A and B as msw switches in row C and D.

@ayounsi yes you are right since it will have an IP address it will be managed so I was thinking over it. Disable the mgmt interface just setup the root password on the switch and use it as a L2 switch so we don't have to deal with managing it.

Apr 8 2024, 1:17 PM · SRE, netops, Infrastructure-Foundations, ops-codfw

Apr 5 2024

Papaul added a comment to T361871: codfw: use old asw switches from row A and B as msw switches in row C and D.

@ayounsi @cmooney thanks for all the inputs. What I am asking is to use the Juniper old switches as dummies switches(L2 config) . I need no automation or monitoring on those I will like to use those just as the existing switches . I just don't want to manually go in the 15 switches to setup the initial and basic setup that is why i was asking if it is possible to setup ZTP to work also with those switches. If it is too mush work to do, on the ZTP side I can setup manually. Please let me know if you have more questions
Thanks.

Apr 5 2024, 1:10 PM · SRE, netops, Infrastructure-Foundations, ops-codfw

Apr 4 2024

Papaul created T361871: codfw: use old asw switches from row A and B as msw switches in row C and D.
Apr 4 2024, 7:24 PM · SRE, netops, Infrastructure-Foundations, ops-codfw

Apr 2 2024

Papaul closed T360776: Decom asw-b-codfw switch stack as Resolved.
Apr 2 2024, 3:23 AM · Patch-For-Review, ops-codfw, Infrastructure-Foundations, netops, SRE
Papaul updated the task description for T360776: Decom asw-b-codfw switch stack.
Apr 2 2024, 3:22 AM · Patch-For-Review, ops-codfw, Infrastructure-Foundations, netops, SRE
Papaul closed T361533: Inbound interface errors as Resolved.
Apr 2 2024, 3:20 AM · SRE, ops-codfw

Apr 1 2024

Papaul closed T361286: Fatal error detected on elastic2088 as Resolved.

@bking the pxe boot issue was that both 10G and 1G nic were set to pxe boot so that is why it was failing. i disable pxe boot on the 1G nic all good now.
You can resume the re-image

Apr 1 2024, 4:07 PM · SRE, ops-codfw, Data-Platform-SRE

Mar 28 2024

Papaul closed T358244: Decom asw-a-codfw switch stack as Resolved.

complete

Mar 28 2024, 1:20 PM · netops, Infrastructure-Foundations, SRE, ops-codfw
Papaul updated the task description for T358244: Decom asw-a-codfw switch stack.
Mar 28 2024, 1:19 PM · netops, Infrastructure-Foundations, SRE, ops-codfw

Mar 27 2024

Papaul updated the task description for T358244: Decom asw-a-codfw switch stack.
Mar 27 2024, 10:15 PM · netops, Infrastructure-Foundations, SRE, ops-codfw
Papaul added a comment to T356216: Q#:rack/setup/install (2) cloudbackup hosts.

@Jhancock.wm this is what 2003 is showing on console

┌───────────────────────┤ [!!] Partition disks ├────────────────────────┐
   │                                                                       │
   │                 Failed to partition the selected disk                 │
   │ This probably happened because the selected disk or free space is too │
   │ small to be automatically partitioned.                                │
   │                                                                       │
   │     <Go Back>
Mar 27 2024, 4:49 PM · SRE, ops-codfw, cloud-services-team (Hardware), DC-Ops
Papaul moved T361037: db2100 crashed (memory error) from Backlog to Hardware Failure / Troubleshoot on the ops-codfw board.
Mar 27 2024, 1:46 AM · SRE, ops-codfw, DC-Ops, Patch-For-Review, Data-Persistence-Backup, database-backups, DBA
Papaul closed T355355: Q3:rack/setup/install dbprov200[56] as Resolved.

With the 2 SSD's back in the server, same issue. Doing more troubleshooting, I found out that when the server was first re-image, it did create 2 LVM volumes one on the HDD and the other one on the SSD so each time I was deleting the HW and recreating ; t was not deleting for some reason the LVM volume. I had to recreate the HW RAID just with the HD's install the OS and then once the OS complete, create the RAID on the SSD's.
@jcrespo all your's

Mar 27 2024, 1:46 AM · Patch-For-Review, SRE, Data-Persistence, ops-codfw, DC-Ops
Papaul updated the task description for T355355: Q3:rack/setup/install dbprov200[56].
Mar 27 2024, 1:37 AM · Patch-For-Review, SRE, Data-Persistence, ops-codfw, DC-Ops

Mar 26 2024

Papaul added a comment to T355355: Q3:rack/setup/install dbprov200[56].

dbprov2005 was failing after installing the OS may times.
after troubleshooting, when the server reboots into the OS after the OS install the cookbook fails. On the server console during the OS boot I get :

 [ TIME ] Timed out waiting for device -4977-ac03-d76d6926114e.
 [ TIME ] Timed out waiting for device -6d79-4292-8546-2e76e67a0aa0.
[DEPEND] Dependency failed for /dev…3-849e-4977-ac03-d76d6926114e.
[DEPEND] Dependency failed for Swap.

Next step, I removed both ssd's and recreate the HW RAID with only the HD's and the server didn't have any issues
I am going to try to put back again the 2 SSD;s and reimage the server to see if it will fail .

Mar 26 2024, 7:07 PM · Patch-For-Review, SRE, Data-Persistence, ops-codfw, DC-Ops
Papaul closed T345803: Connect two hosts in codfw row A/B for switch migration testing as Resolved.
Mar 26 2024, 6:48 PM · Infrastructure-Foundations, netops, ops-codfw, SRE

Mar 25 2024

Papaul updated the task description for T355355: Q3:rack/setup/install dbprov200[56].
Mar 25 2024, 6:30 PM · Patch-For-Review, SRE, Data-Persistence, ops-codfw, DC-Ops
Papaul updated the task description for T360776: Decom asw-b-codfw switch stack.
Mar 25 2024, 4:56 PM · Patch-For-Review, ops-codfw, Infrastructure-Foundations, netops, SRE
Papaul updated the task description for T360776: Decom asw-b-codfw switch stack.
Mar 25 2024, 3:06 PM · Patch-For-Review, ops-codfw, Infrastructure-Foundations, netops, SRE
Papaul added a comment to T360446: hw troubleshooting: failed disk for ml-serve2008.codfw.wmnet (not urgent).

@klausman hello please see @Jhancock.wm comment above. Thank you.

Mar 25 2024, 1:06 PM · SRE, ops-codfw, Machine-Learning-Team, DC-Ops
Papaul moved T360776: Decom asw-b-codfw switch stack from Backlog to Codfw Switch migration on the ops-codfw board.
Mar 25 2024, 1:03 PM · Patch-For-Review, ops-codfw, Infrastructure-Foundations, netops, SRE

Mar 22 2024

Papaul updated the task description for T360776: Decom asw-b-codfw switch stack.
Mar 22 2024, 1:35 PM · Patch-For-Review, ops-codfw, Infrastructure-Foundations, netops, SRE
Papaul updated the task description for T360776: Decom asw-b-codfw switch stack.
Mar 22 2024, 1:35 PM · Patch-For-Review, ops-codfw, Infrastructure-Foundations, netops, SRE
Papaul added a comment to T360776: Decom asw-b-codfw switch stack.

@cmooney what works for you works for me as well

Mar 22 2024, 1:34 PM · Patch-For-Review, ops-codfw, Infrastructure-Foundations, netops, SRE

Mar 21 2024

Papaul updated subscribers of T355355: Q3:rack/setup/install dbprov200[56].

@MoritzMuehlenhoff i tried again the re-image once the server reboots after the OS install the cookbook failed with error below.

Exception raised while executing cookbook sre.hosts.reimage:
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/spicerack/_menu.py", line 250, in _run
    raw_ret = runner.run()
  File "/srv/deployment/spicerack/cookbooks/sre/hosts/reimage.py", line 658, in run
    fingerprint = self.puppet_installer.regenerate_certificate()[self.fqdn]
  File "/usr/lib/python3/dist-packages/spicerack/puppet.py", line 294, in regenerate_certificate
    raise PuppetHostsError(
spicerack.puppet.PuppetHostsError: Unable to find CSR fingerprints for all hosts, detected errors are:
Mar 21 2024, 6:36 PM · Patch-For-Review, SRE, Data-Persistence, ops-codfw, DC-Ops
Papaul added a comment to T355355: Q3:rack/setup/install dbprov200[56].

dbprov2005 re-image is stocked at puppet run. When i login to the server and try to manually run puppet i get the error below.

Error: The CRL issued by 'CN=Wikimedia_Internal_Root_CA,OU=Cloud Services,O=Wikimedia Foundation\, Inc,L=San Francisco,ST=California,C=US' has expired, verify time is synchronized
Error: The CRL issued by 'CN=Wikimedia_Internal_Root_CA,OU=Cloud Services,O=Wikimedia Foundation\, Inc,L=San Francisco,ST=California,C=US' has expired, verify time is synchronized
Mar 21 2024, 3:20 PM · Patch-For-Review, SRE, Data-Persistence, ops-codfw, DC-Ops
Papaul moved T360446: hw troubleshooting: failed disk for ml-serve2008.codfw.wmnet (not urgent) from Backlog to Hardware Failure / Troubleshoot on the ops-codfw board.
Mar 21 2024, 11:59 AM · SRE, ops-codfw, Machine-Learning-Team, DC-Ops
Papaul moved T360554: decommission db2096 from Backlog to Decommission on the ops-codfw board.
Mar 21 2024, 11:58 AM · SRE, DC-Ops, ops-codfw, DBA, decommission-hardware

Mar 20 2024

Papaul added a comment to T358244: Decom asw-a-codfw switch stack.

Removed all old cables and unracked 4 switches out of 8

Mar 20 2024, 6:32 PM · netops, Infrastructure-Foundations, SRE, ops-codfw
Papaul updated the task description for T358244: Decom asw-a-codfw switch stack.
Mar 20 2024, 6:31 PM · netops, Infrastructure-Foundations, SRE, ops-codfw

Mar 19 2024

Papaul added a comment to T355355: Q3:rack/setup/install dbprov200[56].

@Jhancock.wm please proceed with this task and let me know if you have any issues.

Mar 19 2024, 9:05 PM · Patch-For-Review, SRE, Data-Persistence, ops-codfw, DC-Ops
Papaul closed T359742: Degraded RAID on elastic2037 as Resolved.

@RKemper Thank you.
This server is in the process to be decommissioned . No action needed. Resolving this task

Mar 19 2024, 9:03 PM · SRE, ops-codfw
Papaul added a comment to T359631: install (2) 1.92TB SSDs from decom into prometheus200[56].

@fgiunchedi hello I will be working with you tomorrow on this since @Jhancock.wm has some things to take care of @16UTC

Mar 19 2024, 8:53 PM · ops-codfw, SRE
Papaul updated subscribers of T359742: Degraded RAID on elastic2037.

@RKemper hello please see @Jhancock.wm comment above.

Mar 19 2024, 8:50 PM · SRE, ops-codfw
Papaul updated the task description for T358244: Decom asw-a-codfw switch stack.
Mar 19 2024, 8:47 PM · netops, Infrastructure-Foundations, SRE, ops-codfw
Papaul updated the task description for T358244: Decom asw-a-codfw switch stack.
Mar 19 2024, 4:31 PM · netops, Infrastructure-Foundations, SRE, ops-codfw
Papaul updated the task description for T358244: Decom asw-a-codfw switch stack.
Mar 19 2024, 4:25 PM · netops, Infrastructure-Foundations, SRE, ops-codfw
Papaul updated the task description for T358244: Decom asw-a-codfw switch stack.
Mar 19 2024, 4:08 PM · netops, Infrastructure-Foundations, SRE, ops-codfw
Papaul updated the task description for T358244: Decom asw-a-codfw switch stack.
Mar 19 2024, 3:47 PM · netops, Infrastructure-Foundations, SRE, ops-codfw
Papaul added a comment to T358244: Decom asw-a-codfw switch stack.

Zeroize done on all the old switches in role a

Mar 19 2024, 3:46 PM · netops, Infrastructure-Foundations, SRE, ops-codfw