Page MenuHomePhabricator

Papaul (Papaul)
User

Projects (7)

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Thursday

  • Clear sailing ahead.

User Details

User Since
Dec 18 2014, 3:39 PM (482 w, 4 d)
Availability
Available
LDAP User
Papaul
MediaWiki User
Unknown

Recent Activity

Today

Papaul closed T360395: Inbound interface errors as Resolved.
Tue, Mar 19, 3:05 AM · SRE, ops-codfw

Yesterday

Papaul added a comment to T358244: Decom asw-a-codfw switch stack.

Zeroize done on asw-a1
setups:

  • delete the member from the master
  • Disconnect both cable going to asw-a2 and asw-a7
  • while login into to console run the zeroize command
Mon, Mar 18, 5:54 PM · netops, Infrastructure-Foundations, SRE, ops-codfw
Papaul updated the task description for T358244: Decom asw-a-codfw switch stack.
Mon, Mar 18, 4:19 PM · netops, Infrastructure-Foundations, SRE, ops-codfw
Papaul updated the task description for T358244: Decom asw-a-codfw switch stack.
Mon, Mar 18, 4:18 PM · netops, Infrastructure-Foundations, SRE, ops-codfw
Papaul updated the task description for T358244: Decom asw-a-codfw switch stack.
Mon, Mar 18, 4:18 PM · netops, Infrastructure-Foundations, SRE, ops-codfw
Papaul updated the task description for T358244: Decom asw-a-codfw switch stack.
Mon, Mar 18, 4:14 PM · netops, Infrastructure-Foundations, SRE, ops-codfw
Papaul updated the task description for T358244: Decom asw-a-codfw switch stack.
Mon, Mar 18, 3:29 PM · netops, Infrastructure-Foundations, SRE, ops-codfw
Papaul updated the task description for T358244: Decom asw-a-codfw switch stack.
Mon, Mar 18, 2:59 PM · netops, Infrastructure-Foundations, SRE, ops-codfw
Papaul added a comment to T358489: mw2420-mw2451 do have unnecessary raid controllers (configured).

@JMeybohm hello is there anything DC-ops need to do on this task?

Mon, Mar 18, 1:20 PM · SRE, serviceops
Papaul closed T358417: Inbound interface errors as Resolved.

We have been having this issue a long time ago with this same server so I always close the task when i can the inbound interface error on this server.
The error is not the cable, since the cable was replaced already once and not the interface because we changed also the interface once. see https://T330218.
According to the discussion with @jcrespo in the pass, this server is getting a lot of traffic maybe the 1G NIC is not capable of handling that amount off traffic.
Since this error has no impact on the server, I am resolving this task .If you have any questions please fell free to re-open

Mon, Mar 18, 1:19 PM · SRE, ops-codfw

Jan 31 2024

Papaul added a comment to T355350: Q#:rack/setup/install db2196-db2220.

@Marostegui yes we will put some in row C and D as well. Just the once in row A and B will be connected to 10G is has 10G NIC.
Thanks

Jan 31 2024, 1:54 PM · Data-Persistence, SRE, ops-codfw, DC-Ops

Jan 30 2024

Papaul added a comment to T355350: Q#:rack/setup/install db2196-db2220.

@Marostegui if those hosts have a 10G NIC you don't have a problem for those going into row A and B to connect them to a 10G interface?

Jan 30 2024, 10:07 PM · Data-Persistence, SRE, ops-codfw, DC-Ops
Papaul closed T355437: Relocating servers out of A1 in codfw as Resolved.
Jan 30 2024, 4:51 PM · Data-Persistence, SRE, ops-codfw
Papaul added a comment to T355437: Relocating servers out of A1 in codfw.

We did the last server move today. Thanks for All

Jan 30 2024, 4:50 PM · Data-Persistence, SRE, ops-codfw
Papaul updated the task description for T355437: Relocating servers out of A1 in codfw.
Jan 30 2024, 4:49 PM · Data-Persistence, SRE, ops-codfw
Papaul moved T355830: Hardware error on elastic2094 - Comm Error: Backplane 0. from Backlog to Hardware Failure / Troubleshoot on the ops-codfw board.
Jan 30 2024, 1:47 PM · SRE, ops-codfw, Data-Platform-SRE (2024.01.22 - 2024.02.11), DC-Ops
Papaul closed T356138: Inbound interface errors as Resolved.
Jan 30 2024, 1:38 PM · SRE, ops-codfw
Papaul moved T356146: ManagementSSHDown from Backlog to Hardware Failure / Troubleshoot on the ops-codfw board.
Jan 30 2024, 1:38 PM · SRE, ops-codfw

Jan 24 2024

Papaul added a comment to T355437: Relocating servers out of A1 in codfw.

Today's work is complete. The only node left to relocation is gitlab2002. Service ops will get back with us with a day for sometimes next week. All old ports in netbox and on asw-a1-codfw removed.

Jan 24 2024, 7:24 PM · Data-Persistence, SRE, ops-codfw
Papaul updated the task description for T355437: Relocating servers out of A1 in codfw.
Jan 24 2024, 5:14 PM · Data-Persistence, SRE, ops-codfw
Papaul updated the task description for T355437: Relocating servers out of A1 in codfw.
Jan 24 2024, 5:04 PM · Data-Persistence, SRE, ops-codfw
Papaul updated the task description for T355437: Relocating servers out of A1 in codfw.
Jan 24 2024, 4:58 PM · Data-Persistence, SRE, ops-codfw
Papaul updated the task description for T355437: Relocating servers out of A1 in codfw.
Jan 24 2024, 4:38 PM · Data-Persistence, SRE, ops-codfw
Papaul added a comment to T355437: Relocating servers out of A1 in codfw.

@klausman thank you

Jan 24 2024, 2:16 PM · Data-Persistence, SRE, ops-codfw
Papaul added a comment to T355437: Relocating servers out of A1 in codfw.

@Marostegui thank you.

Jan 24 2024, 2:08 PM · Data-Persistence, SRE, ops-codfw

Jan 23 2024

Papaul updated the task description for T355437: Relocating servers out of A1 in codfw.
Jan 23 2024, 2:26 AM · Data-Persistence, SRE, ops-codfw
Papaul closed Unknown Object (Task), a subtask of T329219: Main Tracking Task for ESAMS Migration to KNAMS, as Resolved.
Jan 23 2024, 2:19 AM · Patch-For-Review, SRE, ops-esams, DC-Ops

Jan 22 2024

Papaul added a comment to T344363: Q1:unified decommission task for old esams hosts (knams migration).

@ssingh can we close this task?

Jan 22 2024, 11:54 PM · Traffic
Papaul reopened Unknown Object (Task), a subtask of T329219: Main Tracking Task for ESAMS Migration to KNAMS, as Open.
Jan 22 2024, 11:53 PM · Patch-For-Review, SRE, ops-esams, DC-Ops
Papaul added a comment to T345803: Connect two hosts in codfw row A/B for switch migration testing.

@cmooney can we get those 2 hosts back in decom? Thanks

Jan 22 2024, 11:46 PM · Infrastructure-Foundations, netops, ops-codfw, SRE
Papaul added a comment to T355437: Relocating servers out of A1 in codfw.

@Marostegui thank you @cmooney i will again take a look at it thanks

Jan 22 2024, 7:41 PM · Data-Persistence, SRE, ops-codfw
Papaul updated the task description for T355437: Relocating servers out of A1 in codfw.
Jan 22 2024, 4:21 PM · Data-Persistence, SRE, ops-codfw
Papaul moved T355437: Relocating servers out of A1 in codfw from Backlog to Racking Tasks on the ops-codfw board.
Jan 22 2024, 4:19 PM · Data-Persistence, SRE, ops-codfw
Papaul updated the task description for T355437: Relocating servers out of A1 in codfw.
Jan 22 2024, 4:17 PM · Data-Persistence, SRE, ops-codfw
Papaul updated the task description for T355437: Relocating servers out of A1 in codfw.
Jan 22 2024, 4:00 PM · Data-Persistence, SRE, ops-codfw
Papaul updated the task description for T355437: Relocating servers out of A1 in codfw.
Jan 22 2024, 3:58 PM · Data-Persistence, SRE, ops-codfw
Papaul updated the task description for T355437: Relocating servers out of A1 in codfw.
Jan 22 2024, 3:56 PM · Data-Persistence, SRE, ops-codfw
Papaul updated the task description for T355437: Relocating servers out of A1 in codfw.
Jan 22 2024, 3:50 PM · Data-Persistence, SRE, ops-codfw
Papaul moved T355343: Q3:rack/setup/install es[2035-2040] from Backlog to Racking Tasks on the ops-codfw board.
Jan 22 2024, 3:41 PM · SRE, Data-Persistence, ops-codfw, DC-Ops
Papaul moved T355544: Migrate hosts from codfw row A/B ASW to new LSW devices from Backlog to Codfw Switch migration on the ops-codfw board.
Jan 22 2024, 3:41 PM · ops-codfw, Infrastructure-Foundations, netops, SRE
Papaul moved T355549: Migrate servers in codfw rack B5 from asw-b5-codfw to lsw1-b5-codfw from Backlog to Codfw Switch migration on the ops-codfw board.
Jan 22 2024, 3:40 PM · Data-Persistence, ops-codfw, netops, Infrastructure-Foundations, SRE

Jan 18 2024

Papaul updated the task description for T355343: Q3:rack/setup/install es[2035-2040].
Jan 18 2024, 4:33 PM · SRE, Data-Persistence, ops-codfw, DC-Ops
Papaul added a comment to T355343: Q3:rack/setup/install es[2035-2040].

@Marostegui thank you

Jan 18 2024, 4:32 PM · SRE, Data-Persistence, ops-codfw, DC-Ops
Papaul added a parent task for T355343: Q3:rack/setup/install es[2035-2040]: Unknown Object (Task).
Jan 18 2024, 4:31 PM · SRE, Data-Persistence, ops-codfw, DC-Ops
Papaul updated the task description for T355343: Q3:rack/setup/install es[2035-2040].
Jan 18 2024, 4:30 PM · SRE, Data-Persistence, ops-codfw, DC-Ops
Papaul created T355343: Q3:rack/setup/install es[2035-2040].
Jan 18 2024, 4:29 PM · SRE, Data-Persistence, ops-codfw, DC-Ops
Papaul closed T354732: cr2-codfw:FPC0 failure as Resolved.

linecard removed from cr2 and deleted from netbox

Jan 18 2024, 4:09 PM · SRE, ops-codfw

Jan 17 2024

Papaul updated subscribers of T354732: cr2-codfw:FPC0 failure.

@RobH In the process of creating the RMA for the linecard in FPC0 on cr2-codfw the Juniper team did let me know that the linecard has only technical support and no hardware support for it so impossible to RMA it.

Jan 17 2024, 3:10 AM · SRE, ops-codfw

Jan 16 2024

Papaul added a comment to T354732: cr2-codfw:FPC0 failure.
Hello Papaul
Jan 16 2024, 5:12 PM · SRE, ops-codfw
Papaul added a comment to T354732: cr2-codfw:FPC0 failure.

After moving the lincard in cr1, we are seeing the error now in cr1. I email Support to request again a replacement

Jan 16 2024, 5:01 PM · SRE, ops-codfw
Papaul closed T348164: Migrate mr1-codfw from asw-a1-codfw to lsw1-a2-codfw, a subtask of T348128: Codfw row A-B migration - non-standard device moves, as Resolved.
Jan 16 2024, 4:47 PM · ops-codfw, netops, Infrastructure-Foundations, SRE
Papaul closed T348164: Migrate mr1-codfw from asw-a1-codfw to lsw1-a2-codfw as Resolved.

Link removed

Jan 16 2024, 4:47 PM · ops-codfw, netops, Infrastructure-Foundations, SRE

Jan 12 2024

Papaul added a comment to T354732: cr2-codfw:FPC0 failure.

I will go for option 2 but I will have to do that next week since today is Friday. Thanks

Jan 12 2024, 1:46 PM · SRE, ops-codfw

Jan 11 2024

Papaul created P54627 pip_error.
Jan 11 2024, 4:23 PM
Papaul added a comment to T354732: cr2-codfw:FPC0 failure.

@ayounsi

Hello Papaul
Jan 11 2024, 3:38 PM · SRE, ops-codfw

Jan 10 2024

Papaul added a comment to T354732: cr2-codfw:FPC0 failure.

@ayounsi see below email from Juniper support

Jan 10 2024, 7:54 PM · SRE, ops-codfw
Papaul added a comment to T354732: cr2-codfw:FPC0 failure.
Case Number
2024-0110-046148
Case Type
Tech
Priority
P2 - High
Platform
MX480
Status
Dispatch
Jan 10 2024, 5:10 PM · SRE, ops-codfw
Papaul added a comment to T352758: Move lvs2014 link to row A and connect to new row A/B vlans.

@cmooney link moved to ssw1-a8

Jan 10 2024, 4:16 PM · Traffic, ops-codfw, netops, Infrastructure-Foundations, SRE
Papaul added a comment to T354732: cr2-codfw:FPC0 failure.

@ayounsi will do

Jan 10 2024, 2:21 PM · SRE, ops-codfw
Papaul moved T354732: cr2-codfw:FPC0 failure from Backlog to Hardware Failure / Troubleshoot on the ops-codfw board.
Jan 10 2024, 2:18 PM · SRE, ops-codfw

Jan 9 2024

Papaul moved T354685: RAM upgrade for prometheus200[56] from Backlog to Hardware Failure / Troubleshoot on the ops-codfw board.
Jan 9 2024, 11:00 PM · SRE, ops-codfw, Observability-Metrics
Papaul closed T354193: Broken CPU on mw2394 as Resolved.

@Jhancock.wm what i did for the provision cookbook to PASSws to reset the IDRAC password and re-run the cookbook again
@Dzahn the host is backup .

Jan 9 2024, 1:26 AM · SRE, serviceops, ops-codfw

Jan 8 2024

Papaul added a comment to T354180: Disk (sdh) failed in ms-be2068.

Waiting to received the replacement disk before closing the task.

Jan 8 2024, 6:01 PM · SRE, SRE-swift-storage, ops-codfw, DC-Ops
Papaul updated subscribers of T354193: Broken CPU on mw2394.

mainboard repalced by @Jhancock.wm . She is running the provision cookbook now.

Jan 8 2024, 6:00 PM · SRE, serviceops, ops-codfw
Papaul updated the task description for T352883: Test IP-renumbering on kubestage2002.codfw.wmnet.
Jan 8 2024, 5:58 PM · Prod-Kubernetes, serviceops, netops, Infrastructure-Foundations, SRE
Papaul added a comment to T352883: Test IP-renumbering on kubestage2002.codfw.wmnet.

@cmooney xe-0/0/26

Jan 8 2024, 5:57 PM · Prod-Kubernetes, serviceops, netops, Infrastructure-Foundations, SRE
Papaul added a comment to T353935: Move ganeti2033 and ganeti2034 to new codfw rows A/B switches.

ganeti2033 on xe-0/0/8 on lsw1-b7-codfw
ganeti2034 on xe-0/0/12 on lsw1-a4-codfw

Jan 8 2024, 5:11 PM · Ganeti
Papaul added a comment to T352883: Test IP-renumbering on kubestage2002.codfw.wmnet.

@Clement_Goubert thanks will work on it in a minute

Jan 8 2024, 4:23 PM · Prod-Kubernetes, serviceops, netops, Infrastructure-Foundations, SRE

Jan 4 2024

Papaul added a comment to T354180: Disk (sdh) failed in ms-be2068.

Request replacement

Jan 4 2024, 3:18 AM · SRE, SRE-swift-storage, ops-codfw, DC-Ops
Papaul added a comment to T354193: Broken CPU on mw2394.
Your dispatch shipped on 1/3/2024 4:20 PM
Jan 4 2024, 3:06 AM · SRE, serviceops, ops-codfw

Jan 3 2024

Papaul added a comment to T354180: Disk (sdh) failed in ms-be2068.

disk replaced

Jan 3 2024, 6:47 PM · SRE, SRE-swift-storage, ops-codfw, DC-Ops
Papaul closed T354249: Degraded RAID on logstash2033 as Resolved.

@colewhite disk replaced

Jan 3 2024, 6:38 PM · SRE, ops-codfw
Papaul added a comment to T354193: Broken CPU on mw2394.

Create Dispatch: Success
You have successfully submitted request SR182660280.

Jan 3 2024, 6:21 PM · SRE, serviceops, ops-codfw
Papaul added a comment to T354193: Broken CPU on mw2394.

After swapping the CPU and DIMM now i am getting

	CPU 2 MEM012 VPP PG voltage is outside of range. 	Wed 03 Jan 2024 17:43:07
	CPU 1 MEM012 VPP PG voltage is outside of range.

and the server is no longer powering up
i will put in an order for Dell to send us a main board

Jan 3 2024, 6:12 PM · SRE, serviceops, ops-codfw
Papaul moved T354193: Broken CPU on mw2394 from Backlog to Hardware Failure / Troubleshoot on the ops-codfw board.
Jan 3 2024, 5:23 PM · SRE, serviceops, ops-codfw
Papaul added a comment to T354193: Broken CPU on mw2394.
Multi-bit memory errors detected on a memory device at location(s) DIMM_B1. 	Sun 31 Dec 2023 19:43:14
	Multi-bit memory errors detected on a memory device at location(s) DIMM_B1. 	Sun 31 Dec 2023 19:43:14
	CPU 1 machine check error detected. 	Sun 31 Dec 2023 19:43:14
	CPU 1 machine check error detected.
Jan 3 2024, 5:23 PM · SRE, serviceops, ops-codfw
Papaul added a comment to T354249: Degraded RAID on logstash2033.

@colewhite unfortunately this serer is out of warranty since 2023-11-18. You have 1 options
1- See if we have some 1.92 TB SSD's from decom nodes that we can use
2- Purchase 1.92TB SSD's

Jan 3 2024, 5:14 PM · SRE, ops-codfw
Papaul closed T354155: Inbound interface errors - ge-6/0/22 - db2099 as Resolved.

We know about this

Jan 3 2024, 4:56 PM · SRE, ops-codfw
Papaul moved T354180: Disk (sdh) failed in ms-be2068 from Backlog to Hardware Failure / Troubleshoot on the ops-codfw board.
Jan 3 2024, 4:55 PM · SRE, SRE-swift-storage, ops-codfw, DC-Ops
Papaul moved T354249: Degraded RAID on logstash2033 from Backlog to Hardware Failure / Troubleshoot on the ops-codfw board.
Jan 3 2024, 4:55 PM · SRE, ops-codfw

Dec 20 2023

Papaul closed T353681: Inbound interface errors as Resolved.
Dec 20 2023, 3:11 AM · SRE, ops-codfw

Dec 19 2023

Papaul moved T353679: mw2448.codfw.wmnet is down from Backlog to Hardware Failure / Troubleshoot on the ops-codfw board.
Dec 19 2023, 7:00 PM · ops-codfw, serviceops, SRE
Papaul closed T353743: Degraded RAID on testhost2001 as Resolved.

This was a false alert it is a new server that was half way installed. I just finished the install now so resolving this task for now.

Dec 19 2023, 6:59 PM · SRE, ops-codfw

Dec 13 2023

Papaul added a comment to T352876: cp4037 reimage for cookbook getting stuck at PXE boot.

@Vgutierrez I had a meeting with network and automation team today. We discussed about this issue and we same to not know the really cause of this issue. We decided we let traffic take back this server and put it in service and we can still track this issue @ T350179.

Dec 13 2023, 11:19 PM · Traffic, DC-Ops
Papaul assigned T351279: PDU sensor over limit to VRiley-WMF.
Dec 13 2023, 5:40 PM · SRE, ops-eqiad
Papaul added a comment to T352876: cp4037 reimage for cookbook getting stuck at PXE boot.

@Vgutierrez please give me until the end of today. Thank you

Dec 13 2023, 2:36 PM · Traffic, DC-Ops

Dec 12 2023

Papaul closed T353215: Inbound interface errors as Resolved.
Dec 12 2023, 5:06 AM · SRE, ops-codfw

Dec 9 2023

Papaul added a comment to T349934: Q2:rack/setup/install ceph200[1-3].codfw.wmnet.

@Jhancock.wm on 2002 try to check network possible re-run the switch config cookbook

Dec 9 2023, 1:17 AM · SRE, Data-Engineering, ops-codfw, DC-Ops

Dec 8 2023

Papaul added a comment to T349876: Q2:rack/setup/install 3 sessionstore hosts (codfw).

Servers were missing in site.pp and 2006 was missing in preseed.yaml file I send a patch to fix this . You an try again the re-image
https://gerrit.wikimedia.org/r/c/operations/puppet/+/981544

Dec 8 2023, 2:48 PM · SRE, serviceops, ops-codfw, DC-Ops
Papaul updated the task description for T349876: Q2:rack/setup/install 3 sessionstore hosts (codfw).
Dec 8 2023, 2:47 PM · SRE, serviceops, ops-codfw, DC-Ops
Papaul updated the task description for T349934: Q2:rack/setup/install ceph200[1-3].codfw.wmnet.
Dec 8 2023, 12:41 AM · SRE, Data-Engineering, ops-codfw, DC-Ops
Papaul added a comment to T349934: Q2:rack/setup/install ceph200[1-3].codfw.wmnet.

@Jhancock.wm i send a patch to fix it. you can resume the install
https://gerrit.wikimedia.org/r/c/operations/puppet/+/981413

Dec 8 2023, 12:02 AM · SRE, Data-Engineering, ops-codfw, DC-Ops

Dec 7 2023

Papaul added a comment to T349934: Q2:rack/setup/install ceph200[1-3].codfw.wmnet.

@Jhancock.wm did you read my comment on Wed, Dec 6, 2:53 PM?

Dec 7 2023, 11:00 PM · SRE, Data-Engineering, ops-codfw, DC-Ops
Papaul added a comment to T350179: Reimage cookbook on new eqiad hosts stuck at PXE booting.

@Volans did the test 4 times. the first 2 times the server did pxe boot but the last 2 times it didn't

Dec 7 2023, 10:55 PM · Traffic, SRE-swift-storage, ops-codfw, DC-Ops, ops-eqiad
Papaul moved T352758: Move lvs2014 link to row A and connect to new row A/B vlans from Backlog to Codfw Switch migration on the ops-codfw board.
Dec 7 2023, 2:52 PM · Traffic, ops-codfw, netops, Infrastructure-Foundations, SRE
Papaul moved T352784: Move lvs2013 link to row A and connect to new row A/B vlans from Backlog to Codfw Switch migration on the ops-codfw board.
Dec 7 2023, 2:52 PM · Traffic, ops-codfw, netops, Infrastructure-Foundations, SRE
Papaul moved T352909: Move lvs2012 primary uplink and connect to new row A/B vlans from Backlog to Codfw Switch migration on the ops-codfw board.
Dec 7 2023, 2:52 PM · ops-codfw, Traffic, netops, Infrastructure-Foundations, SRE
Papaul moved T352912: Move lvs2011 primary uplink and connect to new row A/B vlans from Backlog to Codfw Switch migration on the ops-codfw board.
Dec 7 2023, 2:51 PM · ops-codfw, Traffic, netops, Infrastructure-Foundations, SRE
Papaul moved T352918: Move lvs2012 from private1-b-codfw (row) to private1-b2-codfw (rack) vlan from Backlog to Codfw Switch migration on the ops-codfw board.
Dec 7 2023, 2:51 PM · Traffic, netops, Infrastructure-Foundations, SRE
Papaul moved T352920: Move lvs2011 from private1-a-codfw (row) to private1-a2-codfw (rack) vlan from Backlog to Codfw Switch migration on the ops-codfw board.
Dec 7 2023, 2:51 PM · Traffic, netops, Infrastructure-Foundations, SRE