Page MenuHomePhabricator
Feed Advanced Search

Yesterday

wiki_willy added a comment to T233273: labsdb1009 broken PSU.

@Jclark-ctr @Marostegui - thanks guys

Mon, Oct 14, 6:35 PM · Operations, DC-Ops, ops-eqiad, DBA

Fri, Oct 11

wiki_willy added a comment to T227025: (Need By: August 31) rack/setup/install (3) new zookeeper nodes.

Great job @Papaul in troubleshooting this and tracking it down to the root cause. Thanks! ~Willy

Fri, Oct 11, 4:55 PM · User-Elukey, Operations, ops-eqiad
wiki_willy added a comment to T233273: labsdb1009 broken PSU.

Yup, it should be a hot swap. So @Jclark-ctr - please reach out to @Marostegui before replacing it. Thanks, Willy

Fri, Oct 11, 4:45 PM · Operations, DC-Ops, ops-eqiad, DBA
wiki_willy added a comment to T233273: labsdb1009 broken PSU.

@Jclark-ctr - this arrived Thursday via https://www.fedex.com/en-us/home.html. Just a heads up, this will need to be replaced before the PDU upgrade next Tuesday, to retain redundant power on labsdb1009. Thanks, Willy

Fri, Oct 11, 4:42 PM · Operations, DC-Ops, ops-eqiad, DBA
wiki_willy added a comment to T227025: (Need By: August 31) rack/setup/install (3) new zookeeper nodes.

I'll dig around a bit and check with Dell to see if we can figure why Com1 and Com2 have to be flipped to get it working. Talked to Luca and worse case, if we can't find any answers to why it's happening, then we'll just leave them as is. Thanks, Willy

Fri, Oct 11, 8:39 AM · User-Elukey, Operations, ops-eqiad
wiki_willy added a comment to T226778: Install new PDUs in rows A/B (Top level tracking task).

Hi @ayounsi - I talked to a couple other people who had the same concern the other day, and I agree as well...so I started scheduling downtime for the PDU alerts in Icinga starting from today's B1 PDU upgrade, and will continue for the remaining PDU swaps. Thanks, Willy

Fri, Oct 11, 8:18 AM · DC-Ops, Operations, ops-eqiad

Wed, Oct 9

wiki_willy assigned T235125: Move kafka200[123] to logstash202[012] to Papaul.

Hi @Papaul - this task is relabel, update in Netbox, and update switchport descriptions to the newly renamed hostnames. Thanks, Willy

Wed, Oct 9, 8:24 PM · DC-Ops, Operations, ops-codfw
wiki_willy assigned T235124: Move kafka100[123] to logstash102[012] to Cmjohnson.

@Cmjohnson - this task is relabel, update in Netbox, and update switchport descriptions to the newly renamed hostnames

Wed, Oct 9, 8:23 PM · DC-Ops, Operations, ops-eqiad
wiki_willy reassigned T227539: b3-eqiad pdu refresh (Tuesday 9/17 @11am UTC) from RobH to Jclark-ctr.

@Jclark-ctr - can you wrap up the netbox entries on this one, and then close out the task? Thanks, Willy

Wed, Oct 9, 4:41 PM · DC-Ops, Operations, ops-eqiad
wiki_willy closed T227541: b6-eqiad pdu refresh (Tuesday 9/10 @11am UTC), a subtask of T226778: Install new PDUs in rows A/B (Top level tracking task), as Resolved.
Wed, Oct 9, 4:35 PM · DC-Ops, Operations, ops-eqiad
wiki_willy closed T227541: b6-eqiad pdu refresh (Tuesday 9/10 @11am UTC) as Resolved.

Thanks for confirming @ayounsi Resolving task.

Wed, Oct 9, 4:35 PM · DC-Ops, Operations, ops-eqiad
wiki_willy reassigned T231525: cp1085 - IPMI not working from Cmjohnson to Jclark-ctr.

Hi @Jclark-ctr - can you hit up @Vgutierrez when you get in during the AM sometime this week to depool the host? You guys have overlap in the mornings, until about 10am ET. Thanks, Willy

Wed, Oct 9, 3:59 AM · ops-eqiad, Traffic, Operations

Tue, Oct 8

wiki_willy reassigned T227138: a2-eqiad pdu refresh (Tuesday 10/8 @11am UTC) from Cmjohnson to RobH.

Re-assigning to @RobH to complete install/updating of new PDU. Thanks, Willy

Tue, Oct 8, 6:16 PM · DC-Ops, Operations, ops-eqiad
wiki_willy reassigned T232367: (2019-09-15) rack/setup/install ms-be105[1-6].eqiad.wmnet from Jclark-ctr to RobH.

@RobH - can you take care of DNS for this to get things completed from the dc-ops side for this install? This one's super urgent, so if you can complete in the AM, it would be much appreciated. Thanks, Willy

Tue, Oct 8, 4:04 PM · User-fgiunchedi, Operations

Mon, Oct 7

wiki_willy added a comment to T231525: cp1085 - IPMI not working.

Ok @Dzahn - just let us know when it's ready to go. Thanks, Willy

Mon, Oct 7, 9:40 PM · ops-eqiad, Traffic, Operations
wiki_willy added a comment to T228606: Degraded RAID on elastic1046.

@Cmjohnson - let me know if we need to order a replacement drive (along with what type of disk), since it's out of warranty. Thanks, Willy

Mon, Oct 7, 9:16 PM · Discovery-Search (Current work), ops-eqiad, Operations
wiki_willy added a comment to T231525: cp1085 - IPMI not working.

@Dzahn - just wanted to confirm that this has been depooled. Thanks, Willy

Mon, Oct 7, 9:12 PM · ops-eqiad, Traffic, Operations
wiki_willy added a comment to T232069: analytics1045 - RAID failure and /var/lib/hadoop/data/j can't be mounted.

Thanks @elukey . Should we ignore/resolve this alert then? Thanks, Willy

Mon, Oct 7, 9:03 PM · ops-eqiad, DC-Ops, Analytics, Operations, Analytics-Cluster
wiki_willy assigned T234785: Degraded RAID on analytics1049 to Cmjohnson.

Hi @elukey - looks like this host is out of warranty (ended in June 2018). Let me know if you want us to purchase a replacement part or if this system is close to being decommissioned. Thanks, Willy

Mon, Oct 7, 8:58 PM · Patch-For-Review, ops-eqiad, Operations
wiki_willy assigned T227536: b1-eqiad pdu refresh (Thursday 10/10 @11am UTC) to Jclark-ctr.
Mon, Oct 7, 3:42 PM · DC-Ops, Operations, ops-eqiad
wiki_willy assigned T227138: a2-eqiad pdu refresh (Tuesday 10/8 @11am UTC) to Cmjohnson.
Mon, Oct 7, 3:41 PM · DC-Ops, Operations, ops-eqiad
wiki_willy added a comment to T226782: a1-eqiad pdu refresh (Tuesday 10/15 @11am UTC).

@Marostegui - it was ordered last Friday morning. We haven't received the tracking number from the vendor yet, but will update that in T233277 once provided. There's still a chance it arrives before the 15th, but we should know have an ETA soon. Thanks, Willy

Mon, Oct 7, 7:44 AM · DC-Ops, Operations, ops-eqiad

Tue, Oct 1

wiki_willy added a comment to T196560: rack/setup/install LVS200[7-10].

Hi @Vgutierrez - just following up on this to see if there was an ETA, since these are supposed to replace lvs2001-2006...which are all past their 5yr mark, and have the following hardware issues associated with them:

Tue, Oct 1, 9:25 PM · ops-codfw, Traffic, Operations
wiki_willy added a comment to T226782: a1-eqiad pdu refresh (Tuesday 10/15 @11am UTC).

@Marostegui - sure, will do. This week is the approval & ordering phase of the procurement cycle, so it shouldn't be an issue getting the PO submitted for labsdb1009. Thanks, Willy

Tue, Oct 1, 7:24 AM · DC-Ops, Operations, ops-eqiad

Mon, Sep 30

wiki_willy assigned T233578: hw troubleshooting: Memory correctable errors -EDAC- for elastic1029.eqiad.wmnet to Cmjohnson.
Mon, Sep 30, 7:21 PM · Operations, ops-eqiad, DC-Ops
wiki_willy added a comment to T227133: a8-eqiad pdu refresh (Thursday 10/17 @11am UTC).

New target date for upgrading the PDUs on this network rack is Thursday 10/17 @11am UTC. @ayounsi will be in Europe this week to oversee, in case any potential issues occur. Thanks, Willy

Mon, Sep 30, 5:20 PM · DC-Ops, Operations, ops-eqiad
wiki_willy renamed T227133: a8-eqiad pdu refresh (Thursday 10/17 @11am UTC) from a8-eqiad pdu refresh (Date TBA) to a8-eqiad pdu refresh (Thursday 10/17 @11am UTC).
Mon, Sep 30, 5:18 PM · DC-Ops, Operations, ops-eqiad
wiki_willy added a comment to T226782: a1-eqiad pdu refresh (Tuesday 10/15 @11am UTC).

New date for upgrading the remaining PDU on the network rack A1 will be targeting Tuesday, 10/15 at 11am UTC. Thanks, Willy

Mon, Sep 30, 5:15 PM · DC-Ops, Operations, ops-eqiad
wiki_willy renamed T226782: a1-eqiad pdu refresh (Tuesday 10/15 @11am UTC) from a1-eqiad pdu refresh (Tuesday 10/15 @11am UTC to a1-eqiad pdu refresh (Tuesday 10/15 @11am UTC).
Mon, Sep 30, 5:13 PM · DC-Ops, Operations, ops-eqiad
wiki_willy renamed T226782: a1-eqiad pdu refresh (Tuesday 10/15 @11am UTC) from a1-eqiad pdu refresh (Date TBD) to a1-eqiad pdu refresh (Tuesday 10/15 @11am UTC.
Mon, Sep 30, 5:13 PM · DC-Ops, Operations, ops-eqiad

Wed, Sep 25

wiki_willy assigned T233642: apply hostname labels for krb1001/WMF5173 to Cmjohnson.
Wed, Sep 25, 8:33 PM · ops-eqiad, Operations

Mon, Sep 23

herron awarded T233189: Requesting access to Ops Group for papaul@ a Love token.
Mon, Sep 23, 7:32 PM · Operations, SRE-Access-Requests
Dzahn awarded T233189: Requesting access to Ops Group for papaul@ a Love token.
Mon, Sep 23, 5:38 PM · Operations, SRE-Access-Requests
wiki_willy assigned T233534: db1075 (s3 master) crashed - BBU failure to Cmjohnson.
Mon, Sep 23, 2:20 AM · Wikimedia-Incident, ops-eqiad, Operations, DBA
wiki_willy added a subtask for T233534: db1075 (s3 master) crashed - BBU failure: Unknown Object (Task).
Mon, Sep 23, 2:19 AM · Wikimedia-Incident, ops-eqiad, Operations, DBA

Sat, Sep 21

Marostegui awarded T233189: Requesting access to Ops Group for papaul@ a Love token.
Sat, Sep 21, 2:10 PM · Operations, SRE-Access-Requests

Thu, Sep 19

wiki_willy assigned T233289: Decommission ms-be1027 to Cmjohnson.
Thu, Sep 19, 4:40 PM · decommission, Operations, ops-eqiad
akosiaris awarded T233189: Requesting access to Ops Group for papaul@ a Love token.
Thu, Sep 19, 3:37 PM · Operations, SRE-Access-Requests
wiki_willy added a subtask for T233273: labsdb1009 broken PSU: Unknown Object (Task).
Thu, Sep 19, 6:29 AM · Operations, DC-Ops, ops-eqiad, DBA
wiki_willy assigned T233273: labsdb1009 broken PSU to Jclark-ctr.
Thu, Sep 19, 6:22 AM · Operations, DC-Ops, ops-eqiad, DBA

Wed, Sep 18

wiki_willy created T233189: Requesting access to Ops Group for papaul@.
Wed, Sep 18, 7:14 AM · Operations, SRE-Access-Requests

Tue, Sep 17

wiki_willy updated subscribers of T227025: (Need By: August 31) rack/setup/install (3) new zookeeper nodes.

@Jclark-ctr - since Chris had to use a sick day, can one of you guys take a look at this for Luca? Thanks, Willy

Tue, Sep 17, 7:36 PM · User-Elukey, Operations, ops-eqiad

Mon, Sep 16

wiki_willy added a comment to T227541: b6-eqiad pdu refresh (Tuesday 9/10 @11am UTC).

Checked with @Cmjohnson , who says he'll follow up to check the connections.

Mon, Sep 16, 8:38 PM · DC-Ops, Operations, ops-eqiad
wiki_willy updated subscribers of T232069: analytics1045 - RAID failure and /var/lib/hadoop/data/j can't be mounted.

Hi @Dzahn @jbond - looks like this host is out of warranty, and about 3/4 of a year away from a hardware refresh....so just wanted to double-check if you're considering to retire this system soon or if you'd like us to purchase the hardware part for replacement? Thanks, Willy

Mon, Sep 16, 7:28 PM · ops-eqiad, DC-Ops, Analytics, Operations, Analytics-Cluster
wiki_willy assigned T232069: analytics1045 - RAID failure and /var/lib/hadoop/data/j can't be mounted to Cmjohnson.
Mon, Sep 16, 7:24 PM · ops-eqiad, DC-Ops, Analytics, Operations, Analytics-Cluster
wiki_willy assigned T227133: a8-eqiad pdu refresh (Thursday 10/17 @11am UTC) to Cmjohnson.

Originally scheduled for Thursday 9/19, but will reschedule for a later date, since this is a network rack.

Mon, Sep 16, 5:04 PM · DC-Ops, Operations, ops-eqiad
wiki_willy renamed T227133: a8-eqiad pdu refresh (Thursday 10/17 @11am UTC) from a8-eqiad pdu refresh (Thursday 9/19 @11am UTC) to a8-eqiad pdu refresh (Date TBA).
Mon, Sep 16, 5:03 PM · DC-Ops, Operations, ops-eqiad
wiki_willy assigned T227539: b3-eqiad pdu refresh (Tuesday 9/17 @11am UTC) to Cmjohnson.

@Cmjohnson - good to go for tomorrow's PDU upgrade, but please confirm with @Marostegui before you start that DBs have been depooled. Thanks, Willy

Mon, Sep 16, 5:01 PM · DC-Ops, Operations, ops-eqiad
wiki_willy added a comment to T232882: backup1001 failed disk (degraded RAID).

Thanks @Jclark-ctr , can you have the drive replaced this week? Also, you might need to coordinate with @jcrespo via IRC to get a couple other things completed to get backup1001 up and running. Thanks, Willy

Mon, Sep 16, 4:28 PM · ops-eqiad, Operations

Sep 13 2019

wiki_willy assigned T232367: (2019-09-15) rack/setup/install ms-be105[1-6].eqiad.wmnet to Jclark-ctr.
Sep 13 2019, 10:02 PM · User-fgiunchedi, Operations
wiki_willy updated subscribers of T229452: db1114 crashed due to memory issues (server under warranty).

@Cmjohnson or @Jclark-ctr - can one of you guys check this out early next week? Thanks, Willy

Sep 13 2019, 9:18 PM · ops-eqiad, Operations, DBA
wiki_willy added a comment to T229612: asw2-c-eqiad:xe-2/0/45 inbound interface errors.

@Cmjohnson - can you provide an update on this one next week? Thanks, Willy

Sep 13 2019, 9:05 PM · netops, ops-eqiad, Operations
wiki_willy added a comment to T231525: cp1085 - IPMI not working.

Hi @Dzahn - just following up on this one, to see when the server can be taken down. Thanks, Willy

Sep 13 2019, 9:04 PM · ops-eqiad, Traffic, Operations
wiki_willy assigned T232882: backup1001 failed disk (degraded RAID) to Jclark-ctr.
Sep 13 2019, 6:24 PM · ops-eqiad, Operations

Sep 11 2019

wiki_willy closed T232591: helium array has slot 3 disk failed as Resolved.
Sep 11 2019, 4:31 PM · ops-eqiad, Operations

Sep 10 2019

wiki_willy added a comment to T224794: Degraded RAID on helium.

Talked to @akosiaris, who will open up a new task to replace the newly failed drive. We ordered a few of them last time, so hopefully we'll have more spares lying around.

Sep 10 2019, 7:58 PM · ops-eqiad, Operations
wiki_willy added a comment to T232505: Degraded RAID on db2060.

Thanks @Marostegui

Sep 10 2019, 6:58 PM · Operations, ops-codfw
wiki_willy added a comment to T222950: (OoW) cloudvirt1006 - RAID battery failed.

@Cmjohnson - just following up to see if we have the correct part

Sep 10 2019, 6:33 PM · cloud-services-team, ops-eqiad, Operations
wiki_willy added a comment to T228606: Degraded RAID on elastic1046.

@Cmjohnson - could be the drive is seated securely or possibly a loose cable /connection

Sep 10 2019, 6:29 PM · Discovery-Search (Current work), ops-eqiad, Operations
wiki_willy added a comment to T232505: Degraded RAID on db2060.

Looks like the warranty expired on Jan. 14, 2018. @Papaul - let me know if you have any spares lying around or if we need to purchase a new disk. Thanks, Willy

Sep 10 2019, 6:22 PM · Operations, ops-codfw
wiki_willy reassigned T232505: Degraded RAID on db2060 from Cmjohnson to Papaul.
Sep 10 2019, 6:21 PM · Operations, ops-codfw
wiki_willy assigned T232505: Degraded RAID on db2060 to Cmjohnson.
Sep 10 2019, 6:16 PM · Operations, ops-codfw

Sep 9 2019

wiki_willy reassigned T220853: VMs on cloudvirt1015 crashing - bad mainboard/memory from wiki_willy to Cmjohnson.

Here's the response I got from Dell (pasted below). @Cmjohnson or @Jclark-ctr : can one of you guys call Dell at 1-800-456-3355, explain to them the numerous parts we've already replaced (and that it continues to crash on load) and get them to analyze the logs for the system? Let me know how it goes.

Sep 9 2019, 5:20 PM · Operations, ops-eqiad, DC-Ops, User-Zppix, cloud-services-team (Kanban)
wiki_willy moved T232367: (2019-09-15) rack/setup/install ms-be105[1-6].eqiad.wmnet from Backlog to Racking Tasks on the ops-eqiad board.
Sep 9 2019, 5:05 PM · User-fgiunchedi, Operations
wiki_willy renamed T226782: a1-eqiad pdu refresh (Tuesday 10/15 @11am UTC) from a1-eqiad pdu refresh (Thursday 9/12 @11am UTC) to a1-eqiad pdu refresh (Date TBD).
Sep 9 2019, 4:44 PM · DC-Ops, Operations, ops-eqiad
wiki_willy added a comment to T226782: a1-eqiad pdu refresh (Tuesday 10/15 @11am UTC).

Per SRE meeting, we'll be rescheduling the PDU upgrades for this rack to a later date TBA due to a lot of the ongoing work related to the recent outages.

Sep 9 2019, 4:44 PM · DC-Ops, Operations, ops-eqiad
wiki_willy assigned T226782: a1-eqiad pdu refresh (Tuesday 10/15 @11am UTC) to Cmjohnson.
Sep 9 2019, 3:57 PM · DC-Ops, Operations, ops-eqiad
wiki_willy assigned T227541: b6-eqiad pdu refresh (Tuesday 9/10 @11am UTC) to Cmjohnson.
Sep 9 2019, 3:57 PM · DC-Ops, Operations, ops-eqiad

Sep 4 2019

wiki_willy added a comment to T220853: VMs on cloudvirt1015 crashing - bad mainboard/memory.

Emailed our Dell account rep, who responded that they will look into what our options are and get back to us. Thanks, Willy

Sep 4 2019, 10:33 PM · Operations, ops-eqiad, DC-Ops, User-Zppix, cloud-services-team (Kanban)
wiki_willy reassigned T230575: Degraded RAID on cloudvirt1018 from wiki_willy to Bstorm.

Assigning to @Bstorm to follow up on the previous comment.

Sep 4 2019, 9:17 PM · ops-eqiad, Operations
wiki_willy added a comment to T220853: VMs on cloudvirt1015 crashing - bad mainboard/memory.

Thanks @Andrew - I'll reach out to our Account Rep, to see if something else can be done.

Sep 4 2019, 9:15 PM · Operations, ops-eqiad, DC-Ops, User-Zppix, cloud-services-team (Kanban)
wiki_willy added a comment to T220853: VMs on cloudvirt1015 crashing - bad mainboard/memory.

Hi @Andrew - I mentioned the ongoing issues with this machine to our Dell account rep last week, since we've basically replaced every CPU/DIMM/MB on this box. They mentioned we could install Live Optics to evaluate load, but I'm not sure this is something we want to run on our hardware. Do you have another cloudvirt machine up and running right now on the same hardware specs? Essentially running at the same CPU usage...mainly so we can compare and try to isolate any other type of config differences between them.

Sep 4 2019, 8:54 PM · Operations, ops-eqiad, DC-Ops, User-Zppix, cloud-services-team (Kanban)
wiki_willy added a comment to T231066: Host decommission improvements.

Hi @Volans - I was wondering in the mean time, would it be possible to give all the FTE dc-ops engineers the necessary permissions to install and decom hosts from beginning to end? Maybe either by adding these rights to a dc-ops group or granting root access for Papaul? He's definitely going to need the ability to do all this in the next 1.5 months, since he'll be in Amsterdam refreshing the entire site. Thanks, Willy

Sep 4 2019, 2:05 AM · Operations, DC-Ops, SRE-tools

Aug 30 2019

wiki_willy reassigned T225121: (Need By: Sept 30) upgrade msw1-eqiad from EX4200 to EX4300 from Papaul to Cmjohnson.
Aug 30 2019, 6:19 PM · netops, Operations, ops-eqiad
wiki_willy reassigned T227025: (Need By: August 31) rack/setup/install (3) new zookeeper nodes from elukey to Cmjohnson.

Assigning over to @Cmjohnson for @elukey 's question.

Aug 30 2019, 6:17 PM · User-Elukey, Operations, ops-eqiad
wiki_willy assigned T231525: cp1085 - IPMI not working to Cmjohnson.
Aug 30 2019, 6:15 PM · ops-eqiad, Traffic, Operations
wiki_willy added a comment to T231638: db1074 crashed: Broken BBU.

Thanks for confirming @Cmjohnson , subtask T231670 created for Rob to order the part. Thanks, Willy

Aug 30 2019, 5:31 PM · ops-eqiad, Operations, DBA
wiki_willy assigned T231638: db1074 crashed: Broken BBU to Cmjohnson.

@Cmjohnson @Jclark-ctr - do you guys know offhand if we have a spare BBU lying around from a decom'd server by any chance? If not, let me know and we'll order the part.

Aug 30 2019, 5:24 PM · ops-eqiad, Operations, DBA

Aug 27 2019

wiki_willy added a comment to T230575: Degraded RAID on cloudvirt1018.

@Bstorm - I was able to confirm we originally ordered this machine to include 1.6tb drives via https://phabricator.wikimedia.org/T155075 , but wasn't able to find any other tasks that showed when/how they were replaced with 1.9tb drives (which Dell won't support). Do you have any details from previous records on where these 1.9tb disks came from? (ie swapped from another server, ordered separately, etc) Thanks, Willy

Aug 27 2019, 9:53 PM · ops-eqiad, Operations
wiki_willy closed T229134: Degraded RAID on sulfur as Resolved.
Aug 27 2019, 8:48 PM · ops-eqiad, Operations
wiki_willy added a comment to T229134: Degraded RAID on sulfur.

@Volans - ah that makes. Thanks, let's just resolve out this task then.

Aug 27 2019, 8:28 PM · ops-eqiad, Operations
wiki_willy added a comment to T224794: Degraded RAID on helium.

@Jclark-ctr - can we resolve this task? Thanks, Willy

Aug 27 2019, 8:16 PM · ops-eqiad, Operations
wiki_willy updated subscribers of T229134: Degraded RAID on sulfur.

@Volans - hey Riccardo, not sure if you're the right person for this, but thought I'd try asking you. Is there a different output we can get for this alert, to help us isolate the disk issue a bit more?

Aug 27 2019, 8:13 PM · ops-eqiad, Operations

Aug 23 2019

wiki_willy updated subscribers of T200209: Decom graphite2001/WMF6160 .

@RobH - I'll leave it up to @Papaul, since he has a better idea on the chances of reusing the parts on this system. Thanks, Willy

Aug 23 2019, 7:22 PM · Patch-For-Review, decommission, ops-codfw, Operations, observability
wiki_willy assigned T231056: Degraded RAID on db2056 to Papaul.
Aug 23 2019, 12:07 AM · Operations, ops-codfw

Aug 20 2019

wiki_willy added a comment to T228606: Degraded RAID on elastic1046.

Confirmed by Chris that the drive arrived on August 8

Aug 20 2019, 6:47 PM · Discovery-Search (Current work), ops-eqiad, Operations
wiki_willy added a comment to T227539: b3-eqiad pdu refresh (Tuesday 9/17 @11am UTC).

Thanks @Marostegui , I appreciate it.

Aug 20 2019, 7:47 AM · DC-Ops, Operations, ops-eqiad

Aug 19 2019

wiki_willy added a comment to T227142: a6-eqiad pdu refresh (Tuesday 10/22 @11am UTC).

@Marostegui - I would say just go for it and fail out in advance, if it's not too much trouble. Master DBs are very critical, so my opinion is to just take the extra precautionary measures. Thanks, Willy

Aug 19 2019, 6:44 PM · Patch-For-Review, DC-Ops, Operations, ops-eqiad
wiki_willy added a comment to T227539: b3-eqiad pdu refresh (Tuesday 9/17 @11am UTC).

@Marostegui - I'll defer to Faidon or Mark for their opinion, but my suggestion is to go ahead and fail out in advance if it's not too much of a hassle. The success rate of us upgrading PDUs without any issues is pretty good, but unexpected accidents can occur, and master DBs are very critical to the infrastructure.

Aug 19 2019, 6:41 PM · DC-Ops, Operations, ops-eqiad

Aug 16 2019

wiki_willy added a comment to T220853: VMs on cloudvirt1015 crashing - bad mainboard/memory.

Thanks Chris, hopefully this will solve things.

Aug 16 2019, 4:36 PM · Operations, ops-eqiad, DC-Ops, User-Zppix, cloud-services-team (Kanban)

Aug 15 2019

wiki_willy assigned T230575: Degraded RAID on cloudvirt1018 to Cmjohnson.
Aug 15 2019, 9:08 PM · ops-eqiad, Operations
wiki_willy renamed T227543: b8-eqiad pdu refresh (Thursday 10/31 @11am UTC) from b8-eqiad pdu refresh to b8-eqiad pdu refresh (Thursday 10/31 @11am UTC).
Aug 15 2019, 5:39 PM · DC-Ops, Operations, ops-eqiad
wiki_willy renamed T227542: b7-eqiad pdu refresh (Tuesday 11/5 @11am UTC) from b7-eqiad pdu refresh to b7-eqiad pdu refresh (Tuesday 11/5 @11am UTC).
Aug 15 2019, 5:38 PM · DC-Ops, Operations, ops-eqiad
wiki_willy renamed T227541: b6-eqiad pdu refresh (Tuesday 9/10 @11am UTC) from b6-eqiad pdu refresh to b6-eqiad pdu refresh (Tuesday 9/10 @11am UTC).
Aug 15 2019, 5:37 PM · DC-Ops, Operations, ops-eqiad
wiki_willy renamed T227540: b4-eqiad pdu refresh (Thursday 10/24 @11am UTC) from b4-eqiad pdu refresh to b4-eqiad pdu refresh (Thursday 10/24 @11am UTC).
Aug 15 2019, 5:36 PM · DC-Ops, Operations, ops-eqiad
wiki_willy renamed T227539: b3-eqiad pdu refresh (Tuesday 9/17 @11am UTC) from b3-eqiad pdu refresh to b3-eqiad pdu refresh (Tuesday 9/17 @11am UTC).
Aug 15 2019, 5:35 PM · DC-Ops, Operations, ops-eqiad
wiki_willy renamed T227538: b2-eqiad pdu refresh (Tuesday 10/29 @11am UTC) from b2-eqiad pdu refresh to b2-eqiad pdu refresh (Tuesday 10/29 @11am UTC).
Aug 15 2019, 5:34 PM · DC-Ops, Operations, ops-eqiad
wiki_willy renamed T227536: b1-eqiad pdu refresh (Thursday 10/10 @11am UTC) from b1-eqiad pdu refresh to b1-eqiad pdu refresh (Thursday 10/10 @11am UTC).
Aug 15 2019, 5:33 PM · DC-Ops, Operations, ops-eqiad
wiki_willy renamed T227133: a8-eqiad pdu refresh (Thursday 10/17 @11am UTC) from a8-eqiad pdu refresh to a8-eqiad pdu refresh (Thursday 9/19 @11am UTC).
Aug 15 2019, 5:32 PM · DC-Ops, Operations, ops-eqiad
wiki_willy renamed T227142: a6-eqiad pdu refresh (Tuesday 10/22 @11am UTC) from a6-eqiad pdu refresh to a6-eqiad pdu refresh (Tuesday 10/22 @11am UTC).
Aug 15 2019, 5:32 PM · Patch-For-Review, DC-Ops, Operations, ops-eqiad