Page MenuHomePhabricator

wiki_willy
User

Projects

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Saturday

  • Clear sailing ahead.

User Details

User Since
Apr 16 2019, 9:00 PM (30 w, 1 d)
Availability
Available
LDAP User
Wpao
MediaWiki User
Unknown

Recent Activity

Yesterday

wiki_willy closed T226778: Install new PDUs in rows A/B (Top level tracking task) as Resolved.

Resolving parent task for PDU upgrades. Much appreciated to @Cmjohnson and @Jclark-ctr for taking care of these. Thanks, Willy

Wed, Nov 13, 5:16 PM · DC-Ops, Operations, ops-eqiad

Thu, Nov 7

wiki_willy assigned T236497: cp3056 hardware issue to RobH.
Thu, Nov 7, 7:20 PM · DC-Ops, ops-esams, Operations, Traffic
wiki_willy added a comment to T237582: frqueue1001 system battery needs replacement.

Thanks @Jgreen - much appreciated.

Thu, Nov 7, 5:52 PM · ops-eqiad, Operations
wiki_willy added a comment to T237582: frqueue1001 system battery needs replacement.

@Jgreen - Child task T237651 created to order the part. For any hardware repair requests going forward, can you follow the template here - https://phabricator.wikimedia.org/maniphest/task/edit/form/55/ . There's a few things in the description that we're trying to get in advance to help us with scheduling downtime, prioritization, etc. Much appreciated.

Thu, Nov 7, 5:20 PM · ops-eqiad, Operations
wiki_willy added a subtask for T237582: frqueue1001 system battery needs replacement: Unknown Object (Task).
Thu, Nov 7, 5:16 PM · ops-eqiad, Operations
wiki_willy assigned T237582: frqueue1001 system battery needs replacement to Jclark-ctr.

@Jgreen - looks like the warranty ended for the server a few months ago in May. Let me know if you're looking to decommission this server soon or if you would like us to purchase the replacement part.

Thu, Nov 7, 12:00 AM · ops-eqiad, Operations

Tue, Nov 5

wiki_willy added a comment to T236187: decom cobalt.

@Dzahn - sounds good to me.

Tue, Nov 5, 11:58 PM · serviceops, Operations

Mon, Nov 4

wiki_willy closed Unknown Object (Task), a subtask of T228606: Degraded RAID on elastic1046, as Resolved.
Mon, Nov 4, 11:21 PM · Patch-For-Review, Discovery-Search (Current work), ops-eqiad, Operations
wiki_willy closed T228606: Degraded RAID on elastic1046 as Resolved.
Mon, Nov 4, 11:18 PM · Patch-For-Review, Discovery-Search (Current work), ops-eqiad, Operations
wiki_willy added a comment to T228606: Degraded RAID on elastic1046.

Thanks @Gehel , thanks @Jclark-ctr - I'll go ahead and resolve this task. Thanks, Willy

Mon, Nov 4, 11:18 PM · Patch-For-Review, Discovery-Search (Current work), ops-eqiad, Operations
wiki_willy assigned T227542: b7-eqiad pdu refresh (Tuesday 11/5 @12pm UTC) to Cmjohnson.
Mon, Nov 4, 6:28 PM · DC-Ops, Operations, ops-eqiad
wiki_willy renamed T227542: b7-eqiad pdu refresh (Tuesday 11/5 @12pm UTC) from b7-eqiad pdu refresh (Tuesday 11/5 @10am UTC) to b7-eqiad pdu refresh (Tuesday 11/5 @12pm UTC).
Mon, Nov 4, 4:19 PM · DC-Ops, Operations, ops-eqiad

Fri, Nov 1

wiki_willy assigned T237133: analytics1062 lost one of its power supplies to Jclark-ctr.

@Jclark-ctr - looks like this one is from last Thursday's PDU upgrade. Can you check if it's maybe a loose cord? If not, we'll have to RMA it (server under warranty thru March 2020) . Thanks, Willy

Fri, Nov 1, 10:35 PM · Analytics, Operations, Analytics-Cluster, ops-eqiad

Thu, Oct 31

wiki_willy added a comment to T237055: Terminate OE10,11,12,13 Racks.

Emailed Jim Buatti last Tuesday to provide overview of what we're trying to do with the contract (have OE14,15,16 renew in May 2021 and terminate OE10,11,12,13 in Nov 2020 with clause to term early if another customer can be found to lease the racks), and received confirmation that he'll help us put something together. Thanks, Willy

Thu, Oct 31, 9:45 PM · ops-esams, Operations
wiki_willy added a subtask for T235805: ESAMS Refresh/Rebuild (October 2019): T237055: Terminate OE10,11,12,13 Racks.
Thu, Oct 31, 9:42 PM · Patch-For-Review, DC-Ops, Operations, ops-esams
wiki_willy added a parent task for T237055: Terminate OE10,11,12,13 Racks: T235805: ESAMS Refresh/Rebuild (October 2019).
Thu, Oct 31, 9:42 PM · ops-esams, Operations
wiki_willy created T237055: Terminate OE10,11,12,13 Racks.
Thu, Oct 31, 9:41 PM · ops-esams, Operations
wiki_willy reassigned T227543: b8-eqiad pdu refresh (Thursday 10/31 @11am UTC) from Cmjohnson to RobH.

Cable reseated (clip was bent) by @Jclark-ctr - reassigning back to @RobH for configuration.

Thu, Oct 31, 7:51 PM · DC-Ops, Operations, ops-eqiad
wiki_willy reassigned T236437: rack/setup/install mw13[49-84].eqiad.wmnet from Joe to Jclark-ctr.

Assigning to @Jclark-ctr since he's going to be taking care of the install, but @Joe - let us know if there are any specific racking instructions for these. Thanks, Willy

Thu, Oct 31, 7:41 PM · Operations, ops-eqiad

Wed, Oct 30

wiki_willy added a comment to T227867: mw1239 memory errors .

@jijiki - just following up to see if this is still an issue or if we can resolve this. Thanks, Willy

Wed, Oct 30, 8:14 PM · ops-eqiad, DC-Ops, Operations, serviceops
wiki_willy reassigned T234076: (Need by Aug 1) rack/setup/install dumpsdata1003.eqiad.wmnet from ArielGlenn to Cmjohnson.

Reassigning to @Cmjohnson for Ariel's RAID question

Wed, Oct 30, 8:12 PM · Operations
wiki_willy closed T234785: Degraded RAID on analytics1049 as Resolved.
Wed, Oct 30, 8:07 PM · Patch-For-Review, ops-eqiad, Operations
wiki_willy added a comment to T234785: Degraded RAID on analytics1049.

Thanks @elukey I'll close out this request, if all the alerting is suppressed now.

Wed, Oct 30, 8:07 PM · Patch-For-Review, ops-eqiad, Operations
wiki_willy renamed T227542: b7-eqiad pdu refresh (Tuesday 11/5 @12pm UTC) from b7-eqiad pdu refresh (Tuesday 11/5 @11am UTC) to b7-eqiad pdu refresh (Tuesday 11/5 @10am UTC).
Wed, Oct 30, 12:26 AM · DC-Ops, Operations, ops-eqiad

Mon, Oct 28

wiki_willy reassigned T236601: Degraded RAID on elastic1039 from Gehel to Jclark-ctr.
Mon, Oct 28, 6:54 PM · Discovery-Search (Current work), ops-eqiad, Operations
wiki_willy added a comment to T236601: Degraded RAID on elastic1039.

Thanks @MoritzMuehlenhoff - no worries though, since this task looks like it was autogenerated. (I'll have to talk to Ricardo on how we can modify the autogenerated ones) @Gehel - child task T236725 created to order the replacement disk for the out of warranty system. Thanks, Willy

Mon, Oct 28, 6:54 PM · Discovery-Search (Current work), ops-eqiad, Operations
wiki_willy added a subtask for T236601: Degraded RAID on elastic1039: Unknown Object (Task).
Mon, Oct 28, 6:51 PM · Discovery-Search (Current work), ops-eqiad, Operations
wiki_willy assigned T227538: b2-eqiad pdu refresh (Tuesday 10/29 @11am UTC) to Cmjohnson.
Mon, Oct 28, 6:04 PM · DC-Ops, Operations, ops-eqiad
wiki_willy assigned T236601: Degraded RAID on elastic1039 to Gehel.

Per my conversation with Guillaume, this system will be decommissioned, so assigning it to @Gehel for now.

Mon, Oct 28, 5:58 PM · Discovery-Search (Current work), ops-eqiad, Operations
wiki_willy assigned T227543: b8-eqiad pdu refresh (Thursday 10/31 @11am UTC) to Cmjohnson.
Mon, Oct 28, 5:22 PM · DC-Ops, Operations, ops-eqiad
wiki_willy assigned T227538: b2-eqiad pdu refresh (Tuesday 10/29 @11am UTC) to Cmjohnson.
Mon, Oct 28, 5:21 PM · DC-Ops, Operations, ops-eqiad

Thu, Oct 24

wiki_willy assigned T227143: a7-eqiad pdu refresh to RobH.

@RobH - can you check if the configuration on this one is complete? It was one of the PDUs you and Chris upgraded, when you went out to eqiad. Thanks, Willy

Thu, Oct 24, 10:47 PM · DC-Ops, Operations, ops-eqiad
wiki_willy added a comment to T220853: VMs on cloudvirt1015 crashing - bad mainboard/memory.

Pointed this task out to our Dell account rep today. @Jclark-ctr - let me know if the steps they provided don't work, and then I'll forward our case number over to them...to see if we can just get a new server.

Thu, Oct 24, 10:21 PM · Operations, ops-eqiad, DC-Ops, User-Zppix, cloud-services-team (Kanban)
wiki_willy reassigned T230575: Degraded RAID on cloudvirt1018 from Bstorm to Jclark-ctr.

Talked to our Dell rep on this one, who can reach out to the Dell tech support rep directly, after we re-open the ticket. He basically confirmed the same thing @Bstorm had found from the earlier comments...that the 1.9tb drive was sent from Dell previously as a RMA. @Jclark-ctr - can you coordinate with Brooke to update the firmware on this (which might fix things, with all the drive failures), and then call in a request with Dell again, if they drive continues to fail? Shoot me the support case number as well, so I can forward it over to our account rep. Thanks, Willy

Thu, Oct 24, 10:19 PM · ops-eqiad, Operations

Wed, Oct 23

wiki_willy reassigned T236331: Degraded RAID on cloudvirt1018 from wiki_willy to Jclark-ctr.

@Bstorm - that's really weird. If it's just the drive size that Dell has on file for us, I'll just shoot this over @Jclark-ctr to have that RMA'd. Thanks, Willy

Wed, Oct 23, 11:54 PM · ops-eqiad, Operations
wiki_willy claimed T236331: Degraded RAID on cloudvirt1018.
Wed, Oct 23, 11:51 PM · ops-eqiad, Operations
wiki_willy added a comment to T230575: Degraded RAID on cloudvirt1018.

Hey @Bstorm - thanks for tracking all these previous tasks down. It's definitely helpful...I'll bring it up to Dell tomorrow during my bi-weekly sync up call with them, and see if I can more details. Worse case, we may just have to buy a replacement drive. Thanks, Willy

Wed, Oct 23, 9:25 PM · ops-eqiad, Operations
wiki_willy added a comment to T227142: a6-eqiad pdu refresh (Tuesday 10/22 @11am UTC).

Hi @jijiki - I think there are a couple things that @Jclark-ctr needs to check and resolve, before @RobH can configure it. After that, the alert should go away. Thanks, Willy

Wed, Oct 23, 4:17 PM · DC-Ops, Operations, ops-eqiad

Tue, Oct 22

wiki_willy added a comment to T226782: a1-eqiad pdu refresh (Tuesday 10/15 @11am UTC).

@RobH - ps2 was swapped last Tuesday on 10/15

Tue, Oct 22, 4:43 PM · DC-Ops, Operations, ops-eqiad
wiki_willy reassigned T228606: Degraded RAID on elastic1046 from wiki_willy to Jclark-ctr.
Tue, Oct 22, 4:15 PM · Patch-For-Review, Discovery-Search (Current work), ops-eqiad, Operations
wiki_willy added a comment to T228606: Degraded RAID on elastic1046.

Procurement task created for Rob to order replacement drive. Thanks, Willy

Tue, Oct 22, 4:11 PM · Patch-For-Review, Discovery-Search (Current work), ops-eqiad, Operations
wiki_willy added a subtask for T228606: Degraded RAID on elastic1046: Unknown Object (Task).
Tue, Oct 22, 4:10 PM · Patch-For-Review, Discovery-Search (Current work), ops-eqiad, Operations

Mon, Oct 21

wiki_willy assigned T227540: b4-eqiad pdu refresh (Thursday 10/24 @11am UTC) to Cmjohnson.
Mon, Oct 21, 4:26 PM · DC-Ops, Operations, ops-eqiad
wiki_willy assigned T227142: a6-eqiad pdu refresh (Tuesday 10/22 @11am UTC) to Cmjohnson.
Mon, Oct 21, 4:25 PM · DC-Ops, Operations, ops-eqiad

Fri, Oct 18

wiki_willy assigned T235877: db1105 rebooted itself to Cmjohnson.
Fri, Oct 18, 4:52 PM · ops-eqiad, DC-Ops, Operations, DBA

Thu, Oct 17

wiki_willy added a comment to T234076: (Need by Aug 1) rack/setup/install dumpsdata1003.eqiad.wmnet.

Updating the Need by Date in the subject line, based on the procurement task. @Cmjohnson - can you provide an ETA on when this can be completed? Thanks, Willy

Thu, Oct 17, 11:23 PM · Operations
wiki_willy renamed T234076: (Need by Aug 1) rack/setup/install dumpsdata1003.eqiad.wmnet from rack/setup/install dumpsdata1003.eqiad.wmnet to (Need by Aug 1) rack/setup/install dumpsdata1003.eqiad.wmnet.
Thu, Oct 17, 10:54 PM · Operations
wiki_willy added a comment to T184064: Prepare racks OE14, OE15 and OE16 with new infrastructure.

Redundant power has been added to OE14,15,16 by Iron Mountain free of charge. Replacement Servertech PDUs have been ordered, and are scheduled to be arriving today.

Thu, Oct 17, 9:30 PM · Operations, ops-esams
wiki_willy added a parent task for T184061: SRE 2017-18 Q3 goal Cleanup esams and refresh servers and infrastructure (tracking): T235805: ESAMS Refresh/Rebuild (October 2019).
Thu, Oct 17, 9:28 PM · Operations, Epic, ops-esams
wiki_willy added a subtask for T235805: ESAMS Refresh/Rebuild (October 2019): T184061: SRE 2017-18 Q3 goal Cleanup esams and refresh servers and infrastructure (tracking).
Thu, Oct 17, 9:28 PM · Patch-For-Review, DC-Ops, Operations, ops-esams
wiki_willy added a subtask for T235805: ESAMS Refresh/Rebuild (October 2019): Unknown Object (Task).
Thu, Oct 17, 9:27 PM · Patch-For-Review, DC-Ops, Operations, ops-esams
wiki_willy added a subtask for T235805: ESAMS Refresh/Rebuild (October 2019): Unknown Object (Task).
Thu, Oct 17, 9:27 PM · Patch-For-Review, DC-Ops, Operations, ops-esams
wiki_willy added a subtask for T235805: ESAMS Refresh/Rebuild (October 2019): Unknown Object (Task).
Thu, Oct 17, 9:26 PM · Patch-For-Review, DC-Ops, Operations, ops-esams
wiki_willy added a subtask for T235805: ESAMS Refresh/Rebuild (October 2019): Unknown Object (Task).
Thu, Oct 17, 9:26 PM · Patch-For-Review, DC-Ops, Operations, ops-esams
wiki_willy added a subtask for T235805: ESAMS Refresh/Rebuild (October 2019): Unknown Object (Task).
Thu, Oct 17, 9:26 PM · Patch-For-Review, DC-Ops, Operations, ops-esams
wiki_willy added a subtask for T235805: ESAMS Refresh/Rebuild (October 2019): Unknown Object (Task).
Thu, Oct 17, 9:25 PM · Patch-For-Review, DC-Ops, Operations, ops-esams
wiki_willy added a subtask for T235805: ESAMS Refresh/Rebuild (October 2019): Unknown Object (Task).
Thu, Oct 17, 9:24 PM · Patch-For-Review, DC-Ops, Operations, ops-esams
wiki_willy added a subtask for T235805: ESAMS Refresh/Rebuild (October 2019): Unknown Object (Task).
Thu, Oct 17, 9:24 PM · Patch-For-Review, DC-Ops, Operations, ops-esams
wiki_willy added a subtask for T235805: ESAMS Refresh/Rebuild (October 2019): Unknown Object (Task).
Thu, Oct 17, 9:24 PM · Patch-For-Review, DC-Ops, Operations, ops-esams
wiki_willy created T235805: ESAMS Refresh/Rebuild (October 2019).
Thu, Oct 17, 9:23 PM · Patch-For-Review, DC-Ops, Operations, ops-esams
wiki_willy added a comment to T235770: decommission eeden.

I just made one slight change - changed the point person to @Jclark-ctr for assigning eqiad decom tasks

Thu, Oct 17, 4:24 PM · Operations, DC-Ops, decommission
wiki_willy updated the task description for T235770: decommission eeden.
Thu, Oct 17, 4:23 PM · Operations, DC-Ops, decommission

Wed, Oct 16

wiki_willy assigned T234698: ms-be1020 - firmware upgrade: (was: host went down) to Cmjohnson.
Wed, Oct 16, 8:24 PM · ops-eqiad, User-fgiunchedi, SRE-swift-storage, Operations
wiki_willy reassigned T230289: Degraded RAID on cloudvirt1024 -- Filesystem mounted read-only from Cmjohnson to Jclark-ctr.
Wed, Oct 16, 7:33 PM · cloud-services-team, ops-eqiad, Operations
wiki_willy reassigned T220853: VMs on cloudvirt1015 crashing - bad mainboard/memory from Cmjohnson to Jclark-ctr.

Hi @Andrew - apologies for the delay. Chris has been out, but @Jclark-ctr is going to follow up on this. Thanks, Willy

Wed, Oct 16, 7:33 PM · Operations, ops-eqiad, DC-Ops, User-Zppix, cloud-services-team (Kanban)
wiki_willy added a comment to T230289: Degraded RAID on cloudvirt1024 -- Filesystem mounted read-only.

@Andrew or @Bstorm - are you ok with us taking the machine down to troubleshoot? Thanks, Willy

Wed, Oct 16, 7:04 PM · cloud-services-team, ops-eqiad, Operations
wiki_willy added a comment to T235406: maps1002: Failed power supply.

Awesome, thanks @Jclark-ctr

Wed, Oct 16, 6:55 PM · Discovery-Search (Current work), Operations, ops-eqiad

Tue, Oct 15

wiki_willy assigned T235406: maps1002: Failed power supply to Jclark-ctr.

@Jclark-ctr - looks like this one is barely out of warranty. Before we order the part though, can you doublecheck that it's not something simply like a loose power cord or anything? Thanks, Willy

Tue, Oct 15, 10:12 PM · Discovery-Search (Current work), Operations, ops-eqiad
wiki_willy added a comment to T86541: setup wifi in codfw.

Hi @Papaul - if there aren't any objections from anyone, I think we can just resolve this. You have your primary connection via MIFI and a backup option via CyrusOne. And since it seems to be have been working ok for that past 4-5yrs without issue, I'm fine with not moving forward with a wifi setup. Thanks, Willy

Tue, Oct 15, 5:26 PM · DC-Ops, Operations, ops-codfw, netops

Oct 14 2019

wiki_willy added a comment to T233273: labsdb1009 broken PSU.

@Jclark-ctr @Marostegui - thanks guys

Oct 14 2019, 6:35 PM · Operations, DC-Ops, ops-eqiad, DBA

Oct 11 2019

wiki_willy added a comment to T227025: (Need By: August 31) rack/setup/install (3) new zookeeper nodes.

Great job @Papaul in troubleshooting this and tracking it down to the root cause. Thanks! ~Willy

Oct 11 2019, 4:55 PM · User-Elukey, Operations, ops-eqiad
wiki_willy added a comment to T233273: labsdb1009 broken PSU.

Yup, it should be a hot swap. So @Jclark-ctr - please reach out to @Marostegui before replacing it. Thanks, Willy

Oct 11 2019, 4:45 PM · Operations, DC-Ops, ops-eqiad, DBA
wiki_willy added a comment to T233273: labsdb1009 broken PSU.

@Jclark-ctr - this arrived Thursday via https://www.fedex.com/en-us/home.html. Just a heads up, this will need to be replaced before the PDU upgrade next Tuesday, to retain redundant power on labsdb1009. Thanks, Willy

Oct 11 2019, 4:42 PM · Operations, DC-Ops, ops-eqiad, DBA
wiki_willy added a comment to T227025: (Need By: August 31) rack/setup/install (3) new zookeeper nodes.

I'll dig around a bit and check with Dell to see if we can figure why Com1 and Com2 have to be flipped to get it working. Talked to Luca and worse case, if we can't find any answers to why it's happening, then we'll just leave them as is. Thanks, Willy

Oct 11 2019, 8:39 AM · User-Elukey, Operations, ops-eqiad
wiki_willy added a comment to T226778: Install new PDUs in rows A/B (Top level tracking task).

Hi @ayounsi - I talked to a couple other people who had the same concern the other day, and I agree as well...so I started scheduling downtime for the PDU alerts in Icinga starting from today's B1 PDU upgrade, and will continue for the remaining PDU swaps. Thanks, Willy

Oct 11 2019, 8:18 AM · DC-Ops, Operations, ops-eqiad

Oct 9 2019

wiki_willy assigned T235125: Move kafka200[123] to logstash202[012] to Papaul.

Hi @Papaul - this task is relabel, update in Netbox, and update switchport descriptions to the newly renamed hostnames. Thanks, Willy

Oct 9 2019, 8:24 PM · DC-Ops, Operations, ops-codfw
wiki_willy assigned T235124: Move kafka100[123] to logstash102[012] to Cmjohnson.

@Cmjohnson - this task is relabel, update in Netbox, and update switchport descriptions to the newly renamed hostnames

Oct 9 2019, 8:23 PM · DC-Ops, Operations, ops-eqiad
wiki_willy reassigned T227539: b3-eqiad pdu refresh (Tuesday 9/17 @11am UTC) from RobH to Jclark-ctr.

@Jclark-ctr - can you wrap up the netbox entries on this one, and then close out the task? Thanks, Willy

Oct 9 2019, 4:41 PM · DC-Ops, Operations, ops-eqiad
wiki_willy closed T227541: b6-eqiad pdu refresh (Tuesday 9/10 @11am UTC), a subtask of T226778: Install new PDUs in rows A/B (Top level tracking task), as Resolved.
Oct 9 2019, 4:35 PM · DC-Ops, Operations, ops-eqiad
wiki_willy closed T227541: b6-eqiad pdu refresh (Tuesday 9/10 @11am UTC) as Resolved.

Thanks for confirming @ayounsi Resolving task.

Oct 9 2019, 4:35 PM · DC-Ops, Operations, ops-eqiad
wiki_willy reassigned T231525: cp1085 - IPMI not working from Cmjohnson to Jclark-ctr.

Hi @Jclark-ctr - can you hit up @Vgutierrez when you get in during the AM sometime this week to depool the host? You guys have overlap in the mornings, until about 10am ET. Thanks, Willy

Oct 9 2019, 3:59 AM · ops-eqiad, Traffic, Operations

Oct 8 2019

wiki_willy reassigned T227138: a2-eqiad pdu refresh (Tuesday 10/8 @11am UTC) from Cmjohnson to RobH.

Re-assigning to @RobH to complete install/updating of new PDU. Thanks, Willy

Oct 8 2019, 6:16 PM · DC-Ops, Operations, ops-eqiad
wiki_willy reassigned T232367: (2019-09-15) rack/setup/install ms-be105[1-6].eqiad.wmnet from Jclark-ctr to RobH.

@RobH - can you take care of DNS for this to get things completed from the dc-ops side for this install? This one's super urgent, so if you can complete in the AM, it would be much appreciated. Thanks, Willy

Oct 8 2019, 4:04 PM · User-fgiunchedi, Operations

Oct 7 2019

wiki_willy added a comment to T231525: cp1085 - IPMI not working.

Ok @Dzahn - just let us know when it's ready to go. Thanks, Willy

Oct 7 2019, 9:40 PM · ops-eqiad, Traffic, Operations
wiki_willy added a comment to T228606: Degraded RAID on elastic1046.

@Cmjohnson - let me know if we need to order a replacement drive (along with what type of disk), since it's out of warranty. Thanks, Willy

Oct 7 2019, 9:16 PM · Patch-For-Review, Discovery-Search (Current work), ops-eqiad, Operations
wiki_willy added a comment to T231525: cp1085 - IPMI not working.

@Dzahn - just wanted to confirm that this has been depooled. Thanks, Willy

Oct 7 2019, 9:12 PM · ops-eqiad, Traffic, Operations
wiki_willy added a comment to T232069: analytics1045 - RAID failure and /var/lib/hadoop/data/j can't be mounted.

Thanks @elukey . Should we ignore/resolve this alert then? Thanks, Willy

Oct 7 2019, 9:03 PM · ops-eqiad, DC-Ops, Analytics, Operations, Analytics-Cluster
wiki_willy assigned T234785: Degraded RAID on analytics1049 to Cmjohnson.

Hi @elukey - looks like this host is out of warranty (ended in June 2018). Let me know if you want us to purchase a replacement part or if this system is close to being decommissioned. Thanks, Willy

Oct 7 2019, 8:58 PM · Patch-For-Review, ops-eqiad, Operations
wiki_willy assigned T227536: b1-eqiad pdu refresh (Thursday 10/10 @11am UTC) to Jclark-ctr.
Oct 7 2019, 3:42 PM · DC-Ops, Operations, ops-eqiad
wiki_willy assigned T227138: a2-eqiad pdu refresh (Tuesday 10/8 @11am UTC) to Cmjohnson.
Oct 7 2019, 3:41 PM · DC-Ops, Operations, ops-eqiad
wiki_willy added a comment to T226782: a1-eqiad pdu refresh (Tuesday 10/15 @11am UTC).

@Marostegui - it was ordered last Friday morning. We haven't received the tracking number from the vendor yet, but will update that in T233277 once provided. There's still a chance it arrives before the 15th, but we should know have an ETA soon. Thanks, Willy

Oct 7 2019, 7:44 AM · DC-Ops, Operations, ops-eqiad

Oct 1 2019

wiki_willy added a comment to T196560: rack/setup/install LVS200[7-10].

Hi @Vgutierrez - just following up on this to see if there was an ETA, since these are supposed to replace lvs2001-2006...which are all past their 5yr mark, and have the following hardware issues associated with them:

Oct 1 2019, 9:25 PM · ops-codfw, Traffic, Operations
wiki_willy added a comment to T226782: a1-eqiad pdu refresh (Tuesday 10/15 @11am UTC).

@Marostegui - sure, will do. This week is the approval & ordering phase of the procurement cycle, so it shouldn't be an issue getting the PO submitted for labsdb1009. Thanks, Willy

Oct 1 2019, 7:24 AM · DC-Ops, Operations, ops-eqiad

Sep 30 2019

wiki_willy assigned T233578: hw troubleshooting: Memory correctable errors -EDAC- for elastic1029.eqiad.wmnet to Cmjohnson.
Sep 30 2019, 7:21 PM · Operations, ops-eqiad, DC-Ops
wiki_willy added a comment to T227133: a8-eqiad pdu refresh (Thursday 10/17 @11am UTC).

New target date for upgrading the PDUs on this network rack is Thursday 10/17 @11am UTC. @ayounsi will be in Europe this week to oversee, in case any potential issues occur. Thanks, Willy

Sep 30 2019, 5:20 PM · DC-Ops, Operations, ops-eqiad
wiki_willy renamed T227133: a8-eqiad pdu refresh (Thursday 10/17 @11am UTC) from a8-eqiad pdu refresh (Date TBA) to a8-eqiad pdu refresh (Thursday 10/17 @11am UTC).
Sep 30 2019, 5:18 PM · DC-Ops, Operations, ops-eqiad
wiki_willy added a comment to T226782: a1-eqiad pdu refresh (Tuesday 10/15 @11am UTC).

New date for upgrading the remaining PDU on the network rack A1 will be targeting Tuesday, 10/15 at 11am UTC. Thanks, Willy

Sep 30 2019, 5:15 PM · DC-Ops, Operations, ops-eqiad
wiki_willy renamed T226782: a1-eqiad pdu refresh (Tuesday 10/15 @11am UTC) from a1-eqiad pdu refresh (Tuesday 10/15 @11am UTC to a1-eqiad pdu refresh (Tuesday 10/15 @11am UTC).
Sep 30 2019, 5:13 PM · DC-Ops, Operations, ops-eqiad
wiki_willy renamed T226782: a1-eqiad pdu refresh (Tuesday 10/15 @11am UTC) from a1-eqiad pdu refresh (Date TBD) to a1-eqiad pdu refresh (Tuesday 10/15 @11am UTC.
Sep 30 2019, 5:13 PM · DC-Ops, Operations, ops-eqiad

Sep 25 2019

wiki_willy assigned T233642: apply hostname labels for krb1001/WMF5173 to Cmjohnson.
Sep 25 2019, 8:33 PM · Operations, ops-eqiad