Page MenuHomePhabricator

Papaul (Papaul)
User

Projects (7)

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Wednesday

  • Clear sailing ahead.

User Details

User Since
Dec 18 2014, 3:39 PM (370 w, 3 d)
Availability
Available
LDAP User
Papaul
MediaWiki User
Unknown

Recent Activity

Tue, Jan 18

Papaul updated the task description for T294946: (Need By: TBD) rack/setup/install ml-staging200[12].
Tue, Jan 18, 6:18 PM · SRE, Machine-Learning-Team, ops-codfw, DC-Ops
Papaul updated the task description for T294945: (Need By: TBD) rack/setup/install ml-serve200[5-8].
Tue, Jan 18, 6:11 PM · SRE, Machine-Learning-Team, ops-codfw, DC-Ops
Papaul added a comment to T299427: ml-serve2001 logged a corrected memory error.

confirmed all green in IDRAC

Tue, Jan 18, 5:09 PM · SRE, ops-codfw, Lift-Wing
Papaul added a comment to T299426: Possible cable issue on restbase2010 management interface.

@hnowlan looks like an IDRAC reset and firmware upgrade too on this server will fix the issue

Tue, Jan 18, 5:07 PM · SRE, ops-codfw
Papaul updated the task description for T283582: DRAC firmware upgrades codfw (was: Flapping codfw management alarm ( contint2001.mgmt/SSH is CRITICAL ))).
Tue, Jan 18, 5:03 PM · SRE, netops, ops-codfw, DC-Ops, Continuous-Integration-Infrastructure
Papaul added a comment to T283582: DRAC firmware upgrades codfw (was: Flapping codfw management alarm ( contint2001.mgmt/SSH is CRITICAL ))).

@hashar no problem you can close the task once all is back online.

Tue, Jan 18, 4:56 PM · SRE, netops, ops-codfw, DC-Ops, Continuous-Integration-Infrastructure
Papaul added a comment to T283582: DRAC firmware upgrades codfw (was: Flapping codfw management alarm ( contint2001.mgmt/SSH is CRITICAL ))).

reset IDRAC, uograde BIOS and IDRAC.

Tue, Jan 18, 4:46 PM · SRE, netops, ops-codfw, DC-Ops, Continuous-Integration-Infrastructure
Papaul added a comment to T283582: DRAC firmware upgrades codfw (was: Flapping codfw management alarm ( contint2001.mgmt/SSH is CRITICAL ))).

@hashar let me know when this is offline so i can take over

Tue, Jan 18, 3:56 PM · SRE, netops, ops-codfw, DC-Ops, Continuous-Integration-Infrastructure

Fri, Jan 14

Papaul closed T299098: hw troubleshooting: IPMI Power Supply Failure (PS2) for wdqs2003.codfw.wmnet as Resolved.

PS2 replaced

Fri, Jan 14, 5:19 PM · SRE, Discovery-Search (Current work), ops-codfw, DC-Ops
Papaul added a comment to T283582: DRAC firmware upgrades codfw (was: Flapping codfw management alarm ( contint2001.mgmt/SSH is CRITICAL ))).

@hashar since Monday is a Holiday, let is do this on the 18th at 10am CT. Thanks

Fri, Jan 14, 12:05 AM · SRE, netops, ops-codfw, DC-Ops, Continuous-Integration-Infrastructure

Thu, Jan 13

Papaul moved T299098: hw troubleshooting: IPMI Power Supply Failure (PS2) for wdqs2003.codfw.wmnet from Backlog to Hardware Failure / Troubleshoot on the ops-codfw board.
Thu, Jan 13, 8:17 PM · SRE, Discovery-Search (Current work), ops-codfw, DC-Ops

Wed, Jan 12

Papaul added a comment to T298861: contint2001.mgmt disappeared from Icinga.

The IDRAC on this server needs reset. Please coordinate a day and time that is best for this server to be taken off line.

Wed, Jan 12, 7:55 PM · SRE, Release-Engineering-Team (Radar), serviceops-radar, ops-codfw, Continuous-Integration-Infrastructure
Papaul closed T298853: Degraded RAID on elastic2035 as Resolved.
Wed, Jan 12, 7:52 PM · Discovery-Search (Current work), SRE, ops-codfw
Papaul moved T298861: contint2001.mgmt disappeared from Icinga from Backlog to Hardware Failure / Troubleshoot on the ops-codfw board.
Wed, Jan 12, 3:46 PM · SRE, Release-Engineering-Team (Radar), serviceops-radar, ops-codfw, Continuous-Integration-Infrastructure
Papaul added a comment to T298980: Rack msw2-eqiad in new cage.

@Jclark-ctr thanks make sense

Wed, Jan 12, 3:40 PM · SRE, ops-eqiad, DC-Ops
Papaul added a comment to T298980: Rack msw2-eqiad in new cage.

@Jclark-ctr looking at the image you shared at https://usercontent.irccloud-cdn.com/file/5YslcsIX/1641945459.JPG i see you are using orange cables to msw2 and not to the console server. We use orange cables for console and green cables for mgmt. Any reason why we are using orange cables in this case?`

Wed, Jan 12, 2:18 AM · SRE, ops-eqiad, DC-Ops

Tue, Jan 11

Papaul updated the task description for T296966: eqiad: Master Tracking Ticket for eqiad expansion cage.
Tue, Jan 11, 11:50 PM · SRE, ops-eqiad, DC-Ops

Mon, Jan 10

Papaul closed T298800: host ps1-d1-codfw down since a long time but still monitored as Resolved.

thanks @Dzahn

Mon, Jan 10, 7:51 PM · ops-codfw, SRE

Fri, Jan 7

Papaul closed T298674: Degraded RAID on elastic2051 as Resolved.

This is ready but according to @jbond, it still has some puppet errors but that looks like it is related to this puppet policy not being ready for debian bullseye.

Fri, Jan 7, 8:44 PM · Discovery-Search (Current work), Patch-For-Review, SRE, ops-codfw
Papaul added a comment to T283582: DRAC firmware upgrades codfw (was: Flapping codfw management alarm ( contint2001.mgmt/SSH is CRITICAL ))).

@Dzahn no

Fri, Jan 7, 7:37 PM · SRE, netops, ops-codfw, DC-Ops, Continuous-Integration-Infrastructure
Papaul closed T298293: cp2029 crashed, hardware memory error as Resolved.

Checked the server today no error so far on DIMM B1, closing the task. if we have the problem we can re-open the task.

Fri, Jan 7, 5:12 PM · SRE, Traffic, ops-codfw
Papaul closed T298301: Degraded RAID on db2147 as Resolved.

Disk replaced

Fri, Jan 7, 4:58 PM · DBA, SRE, ops-codfw

Thu, Jan 6

Papaul added a comment to T298674: Degraded RAID on elastic2051.

@Gehel we have some disks that we took out from decom servers I will look when i am back on site tomorrow if we can find one.

Thu, Jan 6, 8:44 PM · Discovery-Search (Current work), Patch-For-Review, SRE, ops-codfw

Mon, Jan 3

Papaul triaged T298293: cp2029 crashed, hardware memory error as Medium priority.
Mon, Jan 3, 3:53 PM · SRE, Traffic, ops-codfw
Papaul added a comment to T298293: cp2029 crashed, hardware memory error.

I swapped DIMM A1 wiht DIMM B1 to see if the error shows on B1. I am leaving the task open for now .

Mon, Jan 3, 3:52 PM · SRE, Traffic, ops-codfw
Papaul claimed T298293: cp2029 crashed, hardware memory error.
Mon, Jan 3, 3:51 PM · SRE, Traffic, ops-codfw
Papaul added a comment to T298301: Degraded RAID on db2147.

You have successfully submitted request SR1080456967

Mon, Jan 3, 3:46 PM · DBA, SRE, ops-codfw
Papaul added a comment to T298293: cp2029 crashed, hardware memory error.

@Vgutierrez Happy new year can I power this server off so I can swap DIMM A1 with DIMM B1?

Mon, Jan 3, 3:37 PM · SRE, Traffic, ops-codfw
Papaul removed a project from T297588: connect 2nd cloudcontrol200x-dev NIC to vlan 2105: ops-codfw.
Mon, Jan 3, 3:30 PM · SRE, Infrastructure-Foundations, netops, cloud-services-team (Kanban)
Papaul moved T298293: cp2029 crashed, hardware memory error from Backlog to Hardware Failure / Troubleshoot on the ops-codfw board.
Mon, Jan 3, 3:30 PM · SRE, Traffic, ops-codfw
Papaul moved T298301: Degraded RAID on db2147 from Backlog to Hardware Failure / Troubleshoot on the ops-codfw board.
Mon, Jan 3, 3:30 PM · DBA, SRE, ops-codfw
Papaul added a comment to T297933: ms-be2065 failed drive sdq.

Tracking information

Mon, Jan 3, 3:29 PM · SRE, ops-codfw
Papaul closed T297933: ms-be2065 failed drive sdq as Resolved.

@fgiunchedi disk replaced

Mon, Jan 3, 3:26 PM · SRE, ops-codfw

Dec 22 2021

Papaul closed T267662: Rack new cloud-dev servers in same rack as Resolved.
Dec 22 2021, 4:45 PM · cloud-services-team (Kanban), SRE, ops-codfw
Papaul moved T297933: ms-be2065 failed drive sdq from Backlog to Hardware Failure / Troubleshoot on the ops-codfw board.
Dec 22 2021, 4:40 PM · SRE, ops-codfw

Dec 21 2021

Papaul added a comment to T297933: ms-be2065 failed drive sdq.

Current Status:

Dec 21 2021, 10:30 PM · SRE, ops-codfw
Papaul updated the task description for T294945: (Need By: TBD) rack/setup/install ml-serve200[5-8].
Dec 21 2021, 2:37 PM · SRE, Machine-Learning-Team, ops-codfw, DC-Ops
Papaul closed T296856: Installation issues on PowerEdge R440 Ganeti servers with buster / firmware update needed as Resolved.

@MoritzMuehlenhoff no problem closing this task now

Dec 21 2021, 2:30 PM · SRE, ops-codfw
Papaul added a comment to T297933: ms-be2065 failed drive sdq.

Create Dispatch: Success
You have successfully submitted request SR1079308386.

Dec 21 2021, 12:36 AM · SRE, ops-codfw

Dec 20 2021

Papaul closed T296856: Installation issues on PowerEdge R440 Ganeti servers with buster / firmware update needed as Resolved.

@MoritzMuehlenhoff complete

Dec 20 2021, 10:18 PM · SRE, ops-codfw
Papaul updated the task description for T296856: Installation issues on PowerEdge R440 Ganeti servers with buster / firmware update needed.
Dec 20 2021, 10:17 PM · SRE, ops-codfw

Dec 17 2021

Papaul triaged T297933: ms-be2065 failed drive sdq as Medium priority.
Dec 17 2021, 4:03 PM · SRE, ops-codfw

Dec 16 2021

Papaul closed T297422: Installation issues on PowerEdge R440 Kafka main codfw servers with buster / firmware update needed, a subtask of T296641: Upgrade kafka-main nodes to buster, as Resolved.
Dec 16 2021, 5:11 PM · Patch-For-Review, serviceops
Papaul closed T297422: Installation issues on PowerEdge R440 Kafka main codfw servers with buster / firmware update needed as Resolved.
Dec 16 2021, 5:11 PM · SRE, ops-codfw, serviceops
Papaul updated the task description for T297422: Installation issues on PowerEdge R440 Kafka main codfw servers with buster / firmware update needed.
Dec 16 2021, 4:58 PM · SRE, ops-codfw, serviceops
Papaul updated the task description for T296856: Installation issues on PowerEdge R440 Ganeti servers with buster / firmware update needed.
Dec 16 2021, 3:59 PM · SRE, ops-codfw
Papaul updated the task description for T297422: Installation issues on PowerEdge R440 Kafka main codfw servers with buster / firmware update needed.
Dec 16 2021, 3:36 PM · SRE, ops-codfw, serviceops
Papaul updated the task description for T294152: Q2:(Need By: 2021-12-17) rack/setup/install elastic108[4-8].
Dec 16 2021, 3:04 PM · SRE, Discovery-Search, Elasticsearch, ops-eqiad, DC-Ops
jcrespo awarded T294973: (Need By: TBD) rack/setup/install backup2008 a Like token.
Dec 16 2021, 8:51 AM · Patch-For-Review, SRE, Data-Persistence-Backup, ops-codfw, DC-Ops

Dec 15 2021

Papaul closed T297422: Installation issues on PowerEdge R440 Kafka main codfw servers with buster / firmware update needed as Resolved.

This is complete

Dec 15 2021, 5:45 PM · SRE, ops-codfw, serviceops
Papaul closed T297422: Installation issues on PowerEdge R440 Kafka main codfw servers with buster / firmware update needed, a subtask of T296641: Upgrade kafka-main nodes to buster, as Resolved.
Dec 15 2021, 5:45 PM · Patch-For-Review, serviceops
Papaul closed T294973: (Need By: TBD) rack/setup/install backup2008 as Resolved.
Dec 15 2021, 5:44 PM · Patch-For-Review, SRE, Data-Persistence-Backup, ops-codfw, DC-Ops
Papaul updated the task description for T294973: (Need By: TBD) rack/setup/install backup2008.
Dec 15 2021, 5:44 PM · Patch-For-Review, SRE, Data-Persistence-Backup, ops-codfw, DC-Ops
Papaul updated subscribers of T294973: (Need By: TBD) rack/setup/install backup2008.

@jcrespo this is complete

Dec 15 2021, 5:44 PM · Patch-For-Review, SRE, Data-Persistence-Backup, ops-codfw, DC-Ops
Papaul claimed T297422: Installation issues on PowerEdge R440 Kafka main codfw servers with buster / firmware update needed.
Dec 15 2021, 5:24 PM · SRE, ops-codfw, serviceops
Papaul updated the task description for T294973: (Need By: TBD) rack/setup/install backup2008.
Dec 15 2021, 1:24 PM · Patch-For-Review, SRE, Data-Persistence-Backup, ops-codfw, DC-Ops

Dec 14 2021

Papaul updated the task description for T296856: Installation issues on PowerEdge R440 Ganeti servers with buster / firmware update needed.
Dec 14 2021, 10:37 PM · SRE, ops-codfw
Papaul closed T294009: cr2-eqdfw: PEM 1 Input Voltage Out Of Range flapping as Resolved.

The was a breaker problem . This is now resolved

Dec 14 2021, 10:36 PM · SRE, ops-eqdfw
Papaul placed T297588: connect 2nd cloudcontrol200x-dev NIC to vlan 2105 up for grabs.
Dec 14 2021, 9:04 PM · SRE, Infrastructure-Foundations, netops, cloud-services-team (Kanban)
Papaul updated the task description for T294973: (Need By: TBD) rack/setup/install backup2008.
Dec 14 2021, 2:02 PM · Patch-For-Review, SRE, Data-Persistence-Backup, ops-codfw, DC-Ops
Papaul added a comment to T297588: connect 2nd cloudcontrol200x-dev NIC to vlan 2105.

@aborrero are we doing trunk so i can assign this task to netops?

Dec 14 2021, 1:56 PM · SRE, Infrastructure-Foundations, netops, cloud-services-team (Kanban)

Dec 13 2021

Papaul added a comment to T297588: connect 2nd cloudcontrol200x-dev NIC to vlan 2105.

The 2nd NIC is connected to port ge-1/0/34 only thing left is to do the config in Netbox.

Dec 13 2021, 10:21 PM · SRE, Infrastructure-Foundations, netops, cloud-services-team (Kanban)
Papaul updated the task description for T296856: Installation issues on PowerEdge R440 Ganeti servers with buster / firmware update needed.
Dec 13 2021, 5:09 PM · SRE, ops-codfw
Papaul updated the task description for T294973: (Need By: TBD) rack/setup/install backup2008.
Dec 13 2021, 4:24 PM · Patch-For-Review, SRE, Data-Persistence-Backup, ops-codfw, DC-Ops
Papaul updated the task description for T296856: Installation issues on PowerEdge R440 Ganeti servers with buster / firmware update needed.
Dec 13 2021, 3:58 PM · SRE, ops-codfw
Papaul closed T265435: codfw: Testing Out Sample PDUs as Resolved.

This is complete.

Dec 13 2021, 3:16 PM · Patch-For-Review, User-fgiunchedi, observability, ops-codfw, SRE, DC-Ops
Papaul closed T294139: Q2:(Need By: TBD) rack/setup/install ganeti202[78].codfw.wmnet as Resolved.
Dec 13 2021, 3:16 PM · SRE, ops-codfw, DC-Ops
Papaul updated the task description for T297588: connect 2nd cloudcontrol200x-dev NIC to vlan 2105.
Dec 13 2021, 3:07 PM · SRE, Infrastructure-Foundations, netops, cloud-services-team (Kanban)
Papaul updated the task description for T297588: connect 2nd cloudcontrol200x-dev NIC to vlan 2105.
Dec 13 2021, 3:07 PM · SRE, Infrastructure-Foundations, netops, cloud-services-team (Kanban)
Papaul added a comment to T297588: connect 2nd cloudcontrol200x-dev NIC to vlan 2105.

@aborrero can we test this for now on only one server to see if it works before moving it to the other servers?

Dec 13 2021, 3:00 PM · SRE, Infrastructure-Foundations, netops, cloud-services-team (Kanban)

Dec 10 2021

Papaul closed T297518: eqad: asw2-c7-eqiad PEM1 not powered as Resolved.

John went back to check, it was a loose power cable. All good now. resolving this .

Dec 10 2021, 8:52 PM · SRE, Infrastructure-Foundations, netbox, ops-eqiad
Papaul updated the task description for T297518: eqad: asw2-c7-eqiad PEM1 not powered.
Dec 10 2021, 8:35 PM · SRE, Infrastructure-Foundations, netbox, ops-eqiad
Papaul triaged T297518: eqad: asw2-c7-eqiad PEM1 not powered as Medium priority.
Dec 10 2021, 8:30 PM · SRE, Infrastructure-Foundations, netbox, ops-eqiad
Papaul created T297518: eqad: asw2-c7-eqiad PEM1 not powered.
Dec 10 2021, 8:29 PM · SRE, Infrastructure-Foundations, netbox, ops-eqiad

Dec 9 2021

Papaul added a comment to T294009: cr2-eqdfw: PEM 1 Input Voltage Out Of Range flapping.

I create Order # 1-214270167279 to be on site next week on the 14th at 3:00 PM to meet with the Equinix smart hands tech to perform the troubleshooting while i am on site.

Dec 9 2021, 4:53 PM · SRE, ops-eqdfw

Dec 8 2021

Papaul added a comment to T294009: cr2-eqdfw: PEM 1 Input Voltage Out Of Range flapping.

After putting in the new PDU's we still have the same problem.

Dec 8 2021, 10:58 PM · SRE, ops-eqdfw
Papaul closed T295921: eqdfw:pdus as Resolved.

This is complete

Dec 8 2021, 10:43 PM · SRE, ops-eqdfw, DC-Ops
Papaul updated the task description for T295921: eqdfw:pdus.
Dec 8 2021, 10:43 PM · SRE, ops-eqdfw, DC-Ops

Dec 7 2021

Papaul updated the task description for T296856: Installation issues on PowerEdge R440 Ganeti servers with buster / firmware update needed.
Dec 7 2021, 7:37 PM · SRE, ops-codfw
Papaul updated the task description for T296856: Installation issues on PowerEdge R440 Ganeti servers with buster / firmware update needed.
Dec 7 2021, 6:55 PM · SRE, ops-codfw
Papaul closed T294377: Q2:(Need By: TBD) rack/setup/install restbase202[456].codfw.wmnet as Resolved.

This is complete

Dec 7 2021, 4:51 PM · Platform Team Workboards (Platform Engineering Reliability), SRE, RESTBase, ops-codfw, DC-Ops
Papaul updated the task description for T294377: Q2:(Need By: TBD) rack/setup/install restbase202[456].codfw.wmnet.
Dec 7 2021, 4:50 PM · Platform Team Workboards (Platform Engineering Reliability), SRE, RESTBase, ops-codfw, DC-Ops
Papaul closed T296930: codfw: relocate servers in rack D6 as Resolved.

@Marostegui @Kormat all the servers are back up online from my end.

Dec 7 2021, 4:46 PM · SRE, DBA, ops-codfw
Papaul updated the task description for T296930: codfw: relocate servers in rack D6.
Dec 7 2021, 4:45 PM · SRE, DBA, ops-codfw
Papaul updated the task description for T294973: (Need By: TBD) rack/setup/install backup2008.
Dec 7 2021, 1:31 AM · Patch-For-Review, SRE, Data-Persistence-Backup, ops-codfw, DC-Ops
Papaul added a comment to T294973: (Need By: TBD) rack/setup/install backup2008.

This was shipped today.

Dec 7 2021, 1:30 AM · Patch-For-Review, SRE, Data-Persistence-Backup, ops-codfw, DC-Ops

Dec 6 2021

Papaul updated the task description for T296856: Installation issues on PowerEdge R440 Ganeti servers with buster / firmware update needed.
Dec 6 2021, 7:59 PM · SRE, ops-codfw
Papaul updated the task description for T296856: Installation issues on PowerEdge R440 Ganeti servers with buster / firmware update needed.
Dec 6 2021, 5:08 PM · SRE, ops-codfw
Papaul closed T297126: Possible faulty cable between asw-d-codfw and ml-serve2004 as Resolved.
Dec 6 2021, 4:11 PM · SRE, Machine-Learning-Team, ops-codfw
Papaul added a comment to T297126: Possible faulty cable between asw-d-codfw and ml-serve2004.

@elukey
Replaced the cable

Interface       Admin Link Description
ge-6/0/4        up    up   ml-serve2004``

note: If ml-serve200[1-4] are in service can you please change the status in Network from Stage to active?

Dec 6 2021, 4:11 PM · SRE, Machine-Learning-Team, ops-codfw

Dec 3 2021

Papaul moved T296856: Installation issues on PowerEdge R440 Ganeti servers with buster / firmware update needed from Backlog to Hardware Failure / Troubleshoot on the ops-codfw board.
Dec 3 2021, 4:31 PM · SRE, ops-codfw
Papaul moved T296930: codfw: relocate servers in rack D6 from Backlog to Racking Tasks on the ops-codfw board.
Dec 3 2021, 4:31 PM · SRE, DBA, ops-codfw
Papaul closed Unknown Object (Task), a subtask of T293012: Productionise mc20[38-55], as Resolved.
Dec 3 2021, 4:04 PM · serviceops
Papaul updated the task description for T296856: Installation issues on PowerEdge R440 Ganeti servers with buster / firmware update needed.
Dec 3 2021, 2:26 PM · SRE, ops-codfw
Papaul updated the task description for T294377: Q2:(Need By: TBD) rack/setup/install restbase202[456].codfw.wmnet.
Dec 3 2021, 2:07 AM · Platform Team Workboards (Platform Engineering Reliability), SRE, RESTBase, ops-codfw, DC-Ops

Dec 2 2021

Papaul updated the task description for T296856: Installation issues on PowerEdge R440 Ganeti servers with buster / firmware update needed.
Dec 2 2021, 9:26 PM · SRE, ops-codfw
Papaul added a comment to T296930: codfw: relocate servers in rack D6.

Thanks guys

Dec 2 2021, 3:15 PM · SRE, DBA, ops-codfw
Papaul created T296930: codfw: relocate servers in rack D6.
Dec 2 2021, 1:13 PM · SRE, DBA, ops-codfw
Papaul added a comment to T294377: Q2:(Need By: TBD) rack/setup/install restbase202[456].codfw.wmnet.

@ayounsi thank you

Dec 2 2021, 12:52 PM · Platform Team Workboards (Platform Engineering Reliability), SRE, RESTBase, ops-codfw, DC-Ops
Papaul added a comment to T296856: Installation issues on PowerEdge R440 Ganeti servers with buster / firmware update needed.

I prefer to just reuse/extent the task by adding the nodes in the description like i did for ganeti2009 and 2010 so we keep better tack.

Dec 2 2021, 12:50 PM · SRE, ops-codfw