Page MenuHomePhabricator

Papaul (Papaul)
User

Projects (11)

Today

  • No visible events.

Tomorrow

  • No visible events.

Tuesday

  • No visible events.

User Details

User Since
Dec 18 2014, 3:39 PM (572 w, 2 d)
Availability
Available
LDAP User
Papaul
MediaWiki User
Unknown

Recent Activity

Fri, Dec 5

Papaul created T411833: Add FIDO backed production SSH key for Papaul.
Fri, Dec 5, 1:08 AM · SRE, SRE-Access-Requests

Tue, Dec 2

Papaul added a comment to T408892: ULSFO: New switch configuration.

@ssingh yes we have to depool the site, yes 10 AM CT

Tue, Dec 2, 11:24 PM · SRE, Infrastructure-Foundations, DC-Ops, netops, ops-ulsfo
Papaul updated subscribers of T408892: ULSFO: New switch configuration.

@ssingh We are planning on doing the first phase(loopback IP change on core routers and management router) of the ULSFO refresh next week Dec 09th at 10:00am. Please let me know if this work for you an your team.

Tue, Dec 2, 5:02 PM · SRE, Infrastructure-Foundations, DC-Ops, netops, ops-ulsfo
Papaul added a comment to T408892: ULSFO: New switch configuration.

@ayounsi @cmooney thanks for the feedback.

Tue, Dec 2, 4:59 PM · SRE, Infrastructure-Foundations, DC-Ops, netops, ops-ulsfo
Papaul added a comment to T408892: ULSFO: New switch configuration.

@ayounsi @cmooney please see below the steps to replace the loopback IPs on cr3/4-ulsfo and mr1-ulsfo If all this looks good, I will setup a maintenance with traffic for December 9th at 11am CT. Thanks

Tue, Dec 2, 5:34 AM · SRE, Infrastructure-Foundations, DC-Ops, netops, ops-ulsfo

Wed, Nov 26

Papaul updated subscribers of T408511: ULSFO:Switch refresh diagram.

@RobH I update the task description with all the connections that we need for phase 1 in December. Please don't forget the Cable ID's. Please let me know if you have any questions. Thanks

Wed, Nov 26, 12:09 AM · SRE, Infrastructure-Foundations, DC-Ops, netops, ops-ulsfo
Papaul updated the task description for T408511: ULSFO:Switch refresh diagram.
Wed, Nov 26, 12:04 AM · SRE, Infrastructure-Foundations, DC-Ops, netops, ops-ulsfo

Tue, Nov 25

Papaul updated the task description for T408511: ULSFO:Switch refresh diagram.
Tue, Nov 25, 5:55 PM · SRE, Infrastructure-Foundations, DC-Ops, netops, ops-ulsfo

Sat, Nov 22

Papaul updated the task description for T408892: ULSFO: New switch configuration.
Sat, Nov 22, 2:31 AM · SRE, Infrastructure-Foundations, DC-Ops, netops, ops-ulsfo

Thu, Nov 20

Papaul added a comment to T250367: Servers exposing incorrect LLDP info.

@ayounsi sretest1005 is the same as 2004 see below. what you can maybe check is the redfish /IDRAC version on sretest2004 and 1005

Thu, Nov 20, 4:22 PM · Patch-For-Review, Infrastructure-Foundations, netops, SRE

Wed, Nov 19

Papaul added a comment to T408511: ULSFO:Switch refresh diagram.

@ayounsi for the feed back i will work on it

Wed, Nov 19, 5:24 PM · SRE, Infrastructure-Foundations, DC-Ops, netops, ops-ulsfo
Papaul added a comment to T408892: ULSFO: New switch configuration.

I think a am wrong on the public vlan for rack 22. We will not be re-imaging the servers in that rack with public vlan just changing the network mask from /28 to /27

Wed, Nov 19, 5:22 PM · SRE, Infrastructure-Foundations, DC-Ops, netops, ops-ulsfo
Papaul reassigned T390813: Upgrade End Of Support Junos from Papaul to cmooney.

Both switches in drmrs are now running Junos: 23.4R2-S5.8. @cmooney i am sending the task to you since you wanted to do the cloud switches.

Wed, Nov 19, 5:03 PM · Traffic, netops, Infrastructure-Foundations
Papaul updated the task description for T408892: ULSFO: New switch configuration.
Wed, Nov 19, 4:46 PM · SRE, Infrastructure-Foundations, DC-Ops, netops, ops-ulsfo
Papaul added a comment to T250367: Servers exposing incorrect LLDP info.

@ayounsi Please see below the steps to disable LLDP in the BIOS for Dell servers.

Wed, Nov 19, 3:40 PM · Patch-For-Review, Infrastructure-Foundations, netops, SRE
Papaul added a comment to T390813: Upgrade End Of Support Junos.

@ayounsi @cmooney on the other QFX5120-48Y in magru we are running version 22.2R3.S3.18 or right now the recommande version for that model is 23.4R2-S5. Do you want me to do 23.4R2-S5 or stick to 22.2R3-S7?

Wed, Nov 19, 12:06 AM · Traffic, netops, Infrastructure-Foundations

Tue, Nov 18

Papaul added a comment to T250367: Servers exposing incorrect LLDP info.

I took a look at xe-1/0/8 as you mentioned it was cp5002 and i saw dns5004 and just to realized that this task has been open since 2020 5 years ago so now on port xe-1/0/8 we have dns5004.

papaul@asw1-eqsin> show lldp neighbors 
Local Interface    Parent Interface    Chassis Id          Port info          System Name
[----]
xe-1/0/8           -                   84:16:0c:5d:9c:70   NIC 1/10/25Gb SFP+ DA Broadcom Adv. Dual 25Gb Ethernet fw_version:AFW_218.0.219.9
[---]
papaul@asw1-eqsin> show lldp neighbors interface xe-1/0/8  
LLDP Neighbor Information:
Local Information:
Index: 734 Time to live: 120 Time mark: Mon Nov 17 21:42:59 2025 Age: 7 secs 
Local Interface    : xe-1/0/8
Parent Interface   : -
Local Port ID      : 559
Ageout Count       : 0
Tue, Nov 18, 4:45 PM · Patch-For-Review, Infrastructure-Foundations, netops, SRE
Papaul added a comment to T408892: ULSFO: New switch configuration.

@cmooney @ayouns I update the task with all the IPV4 and IPV6 addresses for the links, irb's and loopbacks. Please review and let me know if there is anything I need to change or add.

Tue, Nov 18, 6:28 AM · SRE, Infrastructure-Foundations, DC-Ops, netops, ops-ulsfo
Papaul updated the task description for T408892: ULSFO: New switch configuration.
Tue, Nov 18, 6:18 AM · SRE, Infrastructure-Foundations, DC-Ops, netops, ops-ulsfo
Papaul updated the task description for T408892: ULSFO: New switch configuration.
Tue, Nov 18, 5:32 AM · SRE, Infrastructure-Foundations, DC-Ops, netops, ops-ulsfo
Papaul updated the task description for T408892: ULSFO: New switch configuration.
Tue, Nov 18, 2:11 AM · SRE, Infrastructure-Foundations, DC-Ops, netops, ops-ulsfo
Papaul updated the task description for T408892: ULSFO: New switch configuration.
Tue, Nov 18, 1:33 AM · SRE, Infrastructure-Foundations, DC-Ops, netops, ops-ulsfo

Mon, Nov 17

Papaul added a comment to T250367: Servers exposing incorrect LLDP info.

@ayounsi yes I can look into it. Thanks.

Mon, Nov 17, 2:37 PM · Patch-For-Review, Infrastructure-Foundations, netops, SRE

Thu, Nov 13

Papaul added a comment to T401937: codfw:cr* router power not balance on all 4 PEM's.

After swapping both PEM 2 and 3

re0.cr1-codfw> show chassis environment pem    
PEM 0 status:
  State                      Online
  Temperature                             OK                                      
  DC Output           Voltage(V) Current(A)  Power(W)  Load(%)
                        58          1             58       2      
PEM 1 status:
  State                      Online
  Temperature                             OK                                      
  DC Output           Voltage(V) Current(A)  Power(W)  Load(%)
                        58          32            1856     90     
PEM 2 status:
  State                      Online
  Temperature                             OK                                      
  DC Output           Voltage(V) Current(A)  Power(W)  Load(%)
                        58          0             0        0      
PEM 3 status:
  State                      Online
  Temperature                             OK                                      
  DC Output           Voltage(V) Current(A)  Power(W)  Load(%)
                        58          2             116      5
re0.cr2-codfw> show chassis environment pem    
PEM 0 status:
  State                      Online
  Temperature                             OK                                      
  DC Output           Voltage(V) Current(A)  Power(W)  Load(%)
                        59          0             0        0      
PEM 1 status:
  State                      Online
  Temperature                             OK                                      
  DC Output           Voltage(V) Current(A)  Power(W)  Load(%)
                        60          13            780      38     
PEM 2 status:
  State                      Online
  Temperature                             OK                                      
  DC Output           Voltage(V) Current(A)  Power(W)  Load(%)
                        57          0             0        0      
PEM 3 status:
  State                      Online
  Temperature                             OK                                      
  DC Output           Voltage(V) Current(A)  Power(W)  Load(%)
                        55          0             0        0
Thu, Nov 13, 4:01 PM · SRE, Infrastructure-Foundations, DC-Ops, netbox, ops-codfw

Nov 5 2025

Papaul added a comment to T390813: Upgrade End Of Support Junos.

@ssingh @Vgutierrez planning on doing this on Nov 19th @10:am CT. Thank you

Nov 5 2025, 4:48 PM · Traffic, netops, Infrastructure-Foundations
Papaul closed T393996: Downgrade pfw1-codfw to Junos 23.4R2-S3, a subtask of T337585: FR-Tech FY2425Q4 maintenance window, as Resolved.
Nov 5 2025, 4:43 PM · Fundraising Sprint: Jollof Rice, Fundraising-Tech-Roadmap, fundraising-tech-ops, Fundraising-Backlog
Papaul closed T393996: Downgrade pfw1-codfw to Junos 23.4R2-S3, a subtask of T390052: Enable gNMI on SRX devices and fasw, as Resolved.
Nov 5 2025, 4:43 PM · Patch-For-Review, netops, Infrastructure-Foundations
Papaul closed T393996: Downgrade pfw1-codfw to Junos 23.4R2-S3 as Resolved.

Bother firewalls are not running Junos: 23.4R2-S5.5. Thanks to @Jgreen and @Dwisehaupt.
Closing this task now

Nov 5 2025, 4:43 PM · fundraising-tech-ops, Infrastructure-Foundations, netops

Nov 4 2025

Papaul updated the task description for T408892: ULSFO: New switch configuration.
Nov 4 2025, 12:10 AM · SRE, Infrastructure-Foundations, DC-Ops, netops, ops-ulsfo
Papaul updated the task description for T408892: ULSFO: New switch configuration.
Nov 4 2025, 12:07 AM · SRE, Infrastructure-Foundations, DC-Ops, netops, ops-ulsfo
Papaul added a comment to T408892: ULSFO: New switch configuration.

@cmooney thanks for the feedback we can clarify this tomorrow during the meeting and have all ready and run it by @ayounsi when he is back.

Nov 4 2025, 12:04 AM · SRE, Infrastructure-Foundations, DC-Ops, netops, ops-ulsfo
Papaul updated the task description for T408892: ULSFO: New switch configuration.
Nov 4 2025, 12:02 AM · SRE, Infrastructure-Foundations, DC-Ops, netops, ops-ulsfo

Nov 3 2025

Papaul added a comment to T408892: ULSFO: New switch configuration.

@cmooney i update all the IP's to match the other POP sites. I will be re-running the configuration and validation sometimes this week in my lab and post back the result. I update also the irb interfaces configuation. I will update also the ip addresses of the link to eqsin and codfw later in the description.

Nov 3 2025, 5:04 AM · SRE, Infrastructure-Foundations, DC-Ops, netops, ops-ulsfo
Papaul updated the task description for T408892: ULSFO: New switch configuration.
Nov 3 2025, 4:59 AM · SRE, Infrastructure-Foundations, DC-Ops, netops, ops-ulsfo

Oct 31 2025

Papaul triaged T408892: ULSFO: New switch configuration as Medium priority.
Oct 31 2025, 5:03 AM · SRE, Infrastructure-Foundations, DC-Ops, netops, ops-ulsfo
Papaul removed a subtask for T408510: ULSFO: switch refresh: T408892: ULSFO: New switch configuration.
Oct 31 2025, 4:28 AM · Traffic, SRE, Infrastructure-Foundations, DC-Ops, netops, ops-ulsfo
Papaul edited parent tasks for T408892: ULSFO: New switch configuration, added: Unknown Object (Task); removed: T408510: ULSFO: switch refresh.
Oct 31 2025, 4:28 AM · SRE, Infrastructure-Foundations, DC-Ops, netops, ops-ulsfo
Papaul created T408892: ULSFO: New switch configuration.
Oct 31 2025, 4:26 AM · SRE, Infrastructure-Foundations, DC-Ops, netops, ops-ulsfo

Oct 30 2025

Papaul added a comment to T393996: Downgrade pfw1-codfw to Junos 23.4R2-S3.

@Dwisehaupt yes Wednesday 11/5 is ok with me. Let us do 10:00am CT. Thank you.

Oct 30 2025, 1:15 PM · fundraising-tech-ops, Infrastructure-Foundations, netops

Oct 29 2025

Papaul added a comment to T393996: Downgrade pfw1-codfw to Junos 23.4R2-S3.

@Dwisehaupt hello yes we can do this during the maintenance windows in November. Any day you prefer for that week? Thank you

Oct 29 2025, 4:51 PM · fundraising-tech-ops, Infrastructure-Foundations, netops
Papaul claimed T393996: Downgrade pfw1-codfw to Junos 23.4R2-S3.
Oct 29 2025, 4:47 PM · fundraising-tech-ops, Infrastructure-Foundations, netops
Papaul added a comment to T401937: codfw:cr* router power not balance on all 4 PEM's.

We still have an ongoing email section going on with Juniper on this to understanding why in Eqiad the power is balance on all PEM's and not in codfw. Please see below for the last update we had from Juniper. Thanks.

Oct 29 2025, 4:46 PM · SRE, Infrastructure-Foundations, DC-Ops, netbox, ops-codfw
Papaul added a subtask for T408510: ULSFO: switch refresh: Unknown Object (Task).
Oct 29 2025, 12:06 AM · Traffic, SRE, Infrastructure-Foundations, DC-Ops, netops, ops-ulsfo

Oct 28 2025

Papaul updated the task description for T408511: ULSFO:Switch refresh diagram.
Oct 28 2025, 11:48 PM · SRE, Infrastructure-Foundations, DC-Ops, netops, ops-ulsfo
Papaul added a comment to T408511: ULSFO:Switch refresh diagram.

@cmooney thanks for the feedback, I will upgrade the diagram to match the 100G links between the core routers and the switches and the type of transceivers needed.

Oct 28 2025, 1:39 PM · SRE, Infrastructure-Foundations, DC-Ops, netops, ops-ulsfo
Papaul triaged T408511: ULSFO:Switch refresh diagram as Medium priority.
Oct 28 2025, 6:43 AM · SRE, Infrastructure-Foundations, DC-Ops, netops, ops-ulsfo
Papaul triaged T408510: ULSFO: switch refresh as Medium priority.
Oct 28 2025, 6:43 AM · Traffic, SRE, Infrastructure-Foundations, DC-Ops, netops, ops-ulsfo
Papaul created T408511: ULSFO:Switch refresh diagram.
Oct 28 2025, 6:43 AM · SRE, Infrastructure-Foundations, DC-Ops, netops, ops-ulsfo
Papaul created T408510: ULSFO: switch refresh.
Oct 28 2025, 6:16 AM · Traffic, SRE, Infrastructure-Foundations, DC-Ops, netops, ops-ulsfo

Oct 23 2025

Papaul added a comment to T406964: No disk boot option when moving ms-be2078 to UEFI.

@elukey no problem

Oct 23 2025, 1:32 PM · User-Elukey, SRE, ops-codfw, Infrastructure-Foundations, DC-Ops
Papaul added a comment to T390813: Upgrade End Of Support Junos.

@ssingh thanks for the update. I am planning on doing it before Thanksgiving any day during the week of November 17th works for me. Let me know if that woks for you and I can get back with you on the exact day and time.

Oct 23 2025, 1:31 PM · Traffic, netops, Infrastructure-Foundations

Oct 22 2025

Papaul added a comment to T406964: No disk boot option when moving ms-be2078 to UEFI.

While trying to use the firmware upgrade cookbook with "sudo cookbook sre.hardware.upgrade-firmware ms-be2078 --new" i get the error below so i have to to run the cookbook by passing the flag for each component
"sudo cookbook sre.hardware.upgrade-firmware ms-be2078 -c bios --new " this works only for the BIOS and when doing the same for the IDRAC i get the second error below.
Is it possible please to look into the code and see why this is failing? In the main time i was able to manually upgrade the IDRAC. Thanks

Oct 22 2025, 11:24 PM · User-Elukey, SRE, ops-codfw, Infrastructure-Foundations, DC-Ops
Papaul added a comment to T406964: No disk boot option when moving ms-be2078 to UEFI.

@elukey i think the next step will be to try to install the OS without setting up the boot disk and let the OS take care of it. maybe this is one of the many cases where it is not possible to set out the boot disk before the OS install
Thanks.

Oct 22 2025, 4:56 PM · User-Elukey, SRE, ops-codfw, Infrastructure-Foundations, DC-Ops
Papaul added a comment to T406964: No disk boot option when moving ms-be2078 to UEFI.

@elukey on can you please provide me with one of the node that is working like you said so i can check what is different from this node and the one that is not working?

Oct 22 2025, 3:18 PM · User-Elukey, SRE, ops-codfw, Infrastructure-Foundations, DC-Ops
Papaul added a comment to T406964: No disk boot option when moving ms-be2078 to UEFI.

@elukey @MatthewVernon thank you that was very helpful information. Now I can answer you question
"In UEFI Boot Mode, fixed media (see Hard Disk items in the earlier section) may or may not be added to the
boot sequence. Unlike legacy Boot Mode, in UEFI Boot Mode, the OS has the ability to add to and modify the
boot sequence"

Oct 22 2025, 1:28 PM · User-Elukey, SRE, ops-codfw, Infrastructure-Foundations, DC-Ops
Papaul added a comment to T390813: Upgrade End Of Support Junos.

@ssingh @Vgutierrez hello just checking in to see if you have a day and time for this for drmrs.
Thanks

Oct 22 2025, 3:36 AM · Traffic, netops, Infrastructure-Foundations

Oct 21 2025

Papaul added a comment to T406964: No disk boot option when moving ms-be2078 to UEFI.

can you please provide me with some context here on what we are trying to do, The only thing I see in the task is we are testing UEFI mode on the node.
1- Are we moving from Debain 11 to Debian 12?
2- What partman recipe are we using for testing?

Oct 21 2025, 5:27 PM · User-Elukey, SRE, ops-codfw, Infrastructure-Foundations, DC-Ops

Oct 16 2025

Papaul added a comment to T407488: mr1-codfw is single-homed to lsw1-a2-codfw.

I do agree with you that we should have redundancy link to another switch. I have been thinking also for long term on the mgmt network design if we will have to go 2 links from the mr* to two different switches we are fixing the issue of we lost 1 switch we still have access to the mgmt network but we are not addressing the issue of what about if we lose the mgmt router itself. i know some will say no the mgmt network is not that critical. But if we are redesigning the mgmt network to have 2 links to 2 different switches then we are put in some type of redundancy but that is not a full redundancy because if the mgmt router goes down those 2 links are useless.

Oct 16 2025, 4:15 PM · netops, Infrastructure-Foundations, SRE

Oct 14 2025

Papaul moved T406964: No disk boot option when moving ms-be2078 to UEFI from Backlog to Hardware Failure / Troubleshoot on the ops-codfw board.
Oct 14 2025, 6:31 PM · User-Elukey, SRE, ops-codfw, Infrastructure-Foundations, DC-Ops

Oct 7 2025

Papaul updated the task description for T405618: codfw:frack:rack/install/configuration new switches in rack F5.
Oct 7 2025, 3:01 PM · SRE, netops, Infrastructure-Foundations, DC-Ops, ops-codfw
Papaul updated the task description for T405618: codfw:frack:rack/install/configuration new switches in rack F5.
Oct 7 2025, 3:01 PM · SRE, netops, Infrastructure-Foundations, DC-Ops, ops-codfw

Sep 30 2025

Papaul added a comment to T400412: Q1:rack/setup/install dbprov1007.

Th node sent the puppet request to the wrong puppet master. I cleaned it up, you can re-run the cookbook with the --no-pxe flag

pt1979@puppetmaster1001:~$ sudo puppet cert --list
Warning: `puppet cert` is deprecated and will be removed in a future release.
   (location: /usr/lib/ruby/vendor_ruby/puppet/application.rb:370:in `run')
  "dbprov1007.eqiad.wmnet" (SHA256) 52:F7:CA:C9:25:18:85:D7:1C:C7:6B:DA:77:51:80:41:C2:1F:83:FC:EF:AA:2B:82:FB:A3:C2:48:A6:56:8D:9A
Sep 30 2025, 11:50 PM · SRE, Data-Persistence, ops-eqiad, DC-Ops
Papaul updated the task description for T405618: codfw:frack:rack/install/configuration new switches in rack F5.
Sep 30 2025, 5:23 PM · SRE, netops, Infrastructure-Foundations, DC-Ops, ops-codfw
Papaul updated the task description for T405618: codfw:frack:rack/install/configuration new switches in rack F5.
Sep 30 2025, 5:19 PM · SRE, netops, Infrastructure-Foundations, DC-Ops, ops-codfw
Papaul added a comment to T399778: Q1:rack/setup/install dse-k8s-worker2003.

@Jhancock.wm see below why the server is failing. You have 2 options change the role int site.pp to insetup role to finish the install or have the server owner fix the puppet error below.

Error: Could not retrieve catalog from remote server: Error 500 on SERVER: Server Error: Evaluation Error: Operator '[]' is not applicable to an Undef Value. (file: /srv/puppet_code/environments/production/modules/profile/manifests/kubernetes/node.pp, line: 140, column: 15) on node dse-k8s-worker2003.codfw.wmnet
Sep 30 2025, 1:26 AM · Essential-Work, Data-Platform-SRE (2025.09.26 - 2025.10.17), SRE, ops-codfw, DC-Ops

Sep 29 2025

Papaul updated the task description for T405618: codfw:frack:rack/install/configuration new switches in rack F5.
Sep 29 2025, 9:16 PM · SRE, netops, Infrastructure-Foundations, DC-Ops, ops-codfw
Papaul triaged T405618: codfw:frack:rack/install/configuration new switches in rack F5 as Medium priority.
Sep 29 2025, 8:43 PM · SRE, netops, Infrastructure-Foundations, DC-Ops, ops-codfw
Papaul updated the task description for T405618: codfw:frack:rack/install/configuration new switches in rack F5.
Sep 29 2025, 8:43 PM · SRE, netops, Infrastructure-Foundations, DC-Ops, ops-codfw
Papaul closed T401297: Move pfw1b-codfw to rack F5 as Resolved.

We decided that the next time we have a n open window again for this @cmooney will himself drive the test. For this test, the move is complete.

Sep 29 2025, 8:41 PM · Infrastructure-Foundations, fundraising-tech-ops, netops
Papaul added a comment to T405618: codfw:frack:rack/install/configuration new switches in rack F5.

I had a meeting today with @Jgreen about the new switch configuration. what we will be doing is to move the frack-fundraising VLAN to the new rack. See below for the process
-Create reht1 and add both et-0/1/0 and et-7/1/0 interfaces to it

redundancy-group 2 {
    node 0 priority 100;
    node 1 priority 1;
    interface-monitor {
        et-0/1/0 weight 255;
        et-7/1/0 weight 255;
    }
}
  • Create and add interface interface reth1.2135 to the trust security zone after removing the ip address from reth0.2135
  • Setup interface et-0/0/47 on on both switch as Tagged
  • Have one server in f5 to test before moving all the servers
Sep 29 2025, 4:06 PM · SRE, netops, Infrastructure-Foundations, DC-Ops, ops-codfw
Papaul updated the task description for T405618: codfw:frack:rack/install/configuration new switches in rack F5.
Sep 29 2025, 3:52 PM · SRE, netops, Infrastructure-Foundations, DC-Ops, ops-codfw

Sep 26 2025

Papaul updated the task description for T405618: codfw:frack:rack/install/configuration new switches in rack F5.
Sep 26 2025, 2:53 AM · SRE, netops, Infrastructure-Foundations, DC-Ops, ops-codfw
Papaul updated the task description for T405618: codfw:frack:rack/install/configuration new switches in rack F5.
Sep 26 2025, 2:52 AM · SRE, netops, Infrastructure-Foundations, DC-Ops, ops-codfw

Sep 25 2025

Papaul updated the task description for T405618: codfw:frack:rack/install/configuration new switches in rack F5.
Sep 25 2025, 8:21 PM · SRE, netops, Infrastructure-Foundations, DC-Ops, ops-codfw
Papaul created T405618: codfw:frack:rack/install/configuration new switches in rack F5.
Sep 25 2025, 4:02 PM · SRE, netops, Infrastructure-Foundations, DC-Ops, ops-codfw

Sep 23 2025

Papaul added a comment to T401937: codfw:cr* router power not balance on all 4 PEM's.

last update from Juniper yesterday

Sep 23 2025, 2:36 PM · SRE, Infrastructure-Foundations, DC-Ops, netbox, ops-codfw

Sep 22 2025

Papaul added a comment to T401297: Move pfw1b-codfw to rack F5.

I added the second fabric link xe-0/2/2

Sep 22 2025, 10:44 PM · Infrastructure-Foundations, fundraising-tech-ops, netops
Papaul added a comment to T401297: Move pfw1b-codfw to rack F5.

We move pfw1b-codfw today from rack C8 in DH7 to rack F5 in DH5 and all is back up online. Before the move we did some testing
1- Disconnect second HA link from both the firewalls
All was good
2- Disconnect first and second HA links, lost connectivity to node 0 (unknown state) and node 1 pass to ineligible state after some minutes about 1 minute node 1 went into disabled state but we still have connectivity from eqiad to codfw
3- Connect back HA link 1 node 0 came back online and node 1 automatically reboot and went into hold state after about a minute it went into the initial state secondary node

Redundancy group: 1 , Failover count: 0
node0  0        lost                 n/a     n/a      n/a
node1  1        disabled             no      no       None\
Sep 22 2025, 10:19 PM · Infrastructure-Foundations, fundraising-tech-ops, netops
Papaul added a comment to T405258: cloudcephosd1025 won't reimage.

@Jclark-ctr cable are plugged into the wrong switch port nic 1 is connected to port xe-0/0/21 and nic 2 is connected to port xe-0/0/20 it should be the other way around see netbox
https://netbox.wikimedia.org/dcim/devices/3980/interfaces/
output on the switch is showing that mac address ending with 91 witch is nic 2 is connected to xe-0/0/20

Sep 22 2025, 10:02 PM · cloud-services-team, Ceph, SRE, ops-eqiad, DC-Ops, Cloud-VPS

Sep 18 2025

Papaul added a comment to T401937: codfw:cr* router power not balance on all 4 PEM's.

update from Juniper after our phone call today.

Hello Teams,
​
Thank you for your time on our call.
​
During our call we replaced the PEM we reboot the chassis and we were removing one by ones the PEMs we cofirm that we have power load balance. we did not lost the router, we confirm that the core issue of "unbalanced" power is not related to the physical cabling. It seems to be a part of the router's design. The router will continue to pull power from only the PEMs it needs, leaving others in a standby state for efficiency and redundancy. The load that was on PEM 1, for example, will not shift to PEM 2 or PEM 3 just because you switched the cables, however , I will ask you time to continue checking internally , I will do internal consultation we will share updates in 24hr or so, however, I will ask you to share us the logs and ouputs with the session logs.
Sep 18 2025, 6:36 PM · SRE, Infrastructure-Foundations, DC-Ops, netbox, ops-codfw
Papaul added a comment to P83437 (An Untitled Masterwork).

out put of todays' troubleshooting

Last login: Tue May 20 13:04:15 on ttyu0
Sep 18 2025, 6:35 PM
Papaul added a comment to T401937: codfw:cr* router power not balance on all 4 PEM's.

out put of todays' troubleshooting

Last login: Tue May 20 13:04:15 on ttyu0
Sep 18 2025, 4:04 PM · SRE, Infrastructure-Foundations, DC-Ops, netbox, ops-codfw
Papaul added a comment to T401937: codfw:cr* router power not balance on all 4 PEM's.

PXL_20250915_170321994.jpg (3×4 px, 2 MB)

Sep 18 2025, 3:24 PM · SRE, Infrastructure-Foundations, DC-Ops, netbox, ops-codfw

Sep 16 2025

Papaul closed T387504: codfw expansion infrastructure racking task as Resolved.

The BIO reader is installed now and working. so closing this task

Sep 16 2025, 2:36 AM · SRE, ops-codfw, DC-Ops
Papaul updated the task description for T387504: codfw expansion infrastructure racking task.
Sep 16 2025, 2:35 AM · SRE, ops-codfw, DC-Ops
Papaul added a comment to T401937: codfw:cr* router power not balance on all 4 PEM's.

@cmooney we have the spare PEM on site. I need to get on a call with Juniper to troubleshooting this. Do you think Thursday will be a good day to put the router in maintenance mode and I can communicate with Juniper the day and time we can work with them on this. I replace PEM0 with the spare PEM sent by Juniper, there is a little bit change, PEM0 is now getting power but PEM! is still pulling a lot of power

Sep 16 2025, 2:32 AM · SRE, Infrastructure-Foundations, DC-Ops, netbox, ops-codfw

Sep 15 2025

Papaul added a comment to T400198: Q1:rack/setup/install es1049-es1057.

@VRiley-WMF es1056 added, you can resume with your install.

Sep 15 2025, 10:01 PM · SRE, Data-Persistence, ops-eqiad, DC-Ops

Sep 12 2025

Papaul added a comment to T400198: Q1:rack/setup/install es1049-es1057.

@VRiley-WMF the issue is that es1056 is missing in the this patch
https://gerrit.wikimedia.org/r/c/operations/puppet/+/1172182/1/modules/profile/data/profile/installserver/preseed.yaml
you can add es1056 and add me to +2 your code.

Sep 12 2025, 11:35 PM · SRE, Data-Persistence, ops-eqiad, DC-Ops
Papaul added a comment to T401297: Move pfw1b-codfw to rack F5.

We tested the last 2 cross cage links for the frack migration and all is working now. We are ready for the move on the 22nd.

Sep 12 2025, 11:20 PM · Infrastructure-Foundations, fundraising-tech-ops, netops
Papaul added a comment to T401937: codfw:cr* router power not balance on all 4 PEM's.

Juniper shipped out a new PEM to replace with PEM0 and see if that will fix the issue.

Sep 12 2025, 11:19 PM · SRE, Infrastructure-Foundations, DC-Ops, netbox, ops-codfw
Papaul closed T403634: codfw: document SCS ports in Netbox as Resolved.

Complete

Sep 12 2025, 11:17 PM · SRE, ops-codfw, DC-Ops
Papaul closed T403965: decommission frdata2001.frack.codfw.wmnet, a subtask of T403673: frmx2002.frack.codfw.wmnet final setup , as Resolved.
Sep 12 2025, 11:16 PM · Patch-For-Review, fundraising-tech-ops
Papaul closed T403965: decommission frdata2001.frack.codfw.wmnet as Resolved.
Sep 12 2025, 11:16 PM · SRE, DC-Ops, ops-codfw, decommission-hardware

Sep 10 2025

Papaul added a comment to T403965: decommission frdata2001.frack.codfw.wmnet.

Done on the switch side

Sep 10 2025, 5:37 PM · SRE, DC-Ops, ops-codfw, decommission-hardware
Papaul closed T403970: decommission frmx2001.frack.codfw.wmnet, a subtask of T403673: frmx2002.frack.codfw.wmnet final setup , as Resolved.
Sep 10 2025, 5:36 PM · Patch-For-Review, fundraising-tech-ops
Papaul closed T403970: decommission frmx2001.frack.codfw.wmnet as Resolved.

Done on the switch side

Sep 10 2025, 5:36 PM · SRE, DC-Ops, ops-codfw, decommission-hardware
Papaul closed T294845: Management routers: use BGP instead of OSPF as Resolved.

mr1-eqsin and cr2/3-eqsin are now running BGP for the management network. Resolving this task. Thanks @ayounsi

Sep 10 2025, 4:18 PM · SRE, Infrastructure-Foundations, netops

Sep 9 2025

Papaul added a comment to T401297: Move pfw1b-codfw to rack F5.

Tested all the cross cage links (7) only 2 links are not coming up. I will do more testing tomorrow.

Sep 9 2025, 2:10 AM · Infrastructure-Foundations, fundraising-tech-ops, netops
Papaul updated subscribers of T401937: codfw:cr* router power not balance on all 4 PEM's.

@ayounsi @cmooney can you do the test Juniper asked us to do tomorrow Sept. 9th after the meeting link around 11:15am CT?

Sep 9 2025, 2:08 AM · SRE, Infrastructure-Foundations, DC-Ops, netbox, ops-codfw

Sep 8 2025

Papaul added a comment to T294845: Management routers: use BGP instead of OSPF.

BGP is up on mr1-eqsin cr2/3-eqsin

mr1-eqsin# run show route protocol ospf
Sep 8 2025, 6:08 PM · SRE, Infrastructure-Foundations, netops