Page MenuHomePhabricator

wiki_willy
User

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Saturday

  • Clear sailing ahead.

User Details

User Since
Apr 16 2019, 9:00 PM (181 w, 1 d)
Availability
Available
LDAP User
Wpao
MediaWiki User
Unknown

Recent Activity

Mon, Oct 3

wiki_willy reassigned T319166: decommission cp5001.eqsin.wmnet from wiki_willy to RobH.
Mon, Oct 3, 5:43 PM · Traffic, SRE, ops-eqsin, decommission-hardware

Wed, Sep 28

wiki_willy assigned T316285: decommission cloudservices1003.wikimedia..org to Cmjohnson.
Wed, Sep 28, 1:10 AM · SRE, ops-eqiad, cloud-services-team (Kanban), decommission-hardware
wiki_willy assigned T317804: Port with no description on access switch to Cmjohnson.
Wed, Sep 28, 1:09 AM · ops-eqiad

Tue, Sep 27

wiki_willy assigned T318691: decommission ms-be10[28-39].eqiad.wmnet to Jclark-ctr.
Tue, Sep 27, 6:14 PM · SRE, SRE-swift-storage, DC-Ops, ops-eqiad, decommission-hardware

Tue, Sep 20

wiki_willy updated subscribers of T318062: db2098 crashed.

Yeah, it looks like we're just past the warranty period. @Papaul - do you want to try and see if we're still able to to submit a RMA? And if it won't let you, let me know if you have any spares around onsite or if we should purchase the parts. Thanks, Willy

Tue, Sep 20, 6:44 AM · Patch-For-Review, SRE, ops-codfw, Data-Persistence, Data-Persistence-Backup, database-backups, DBA

Tue, Sep 13

wiki_willy added a comment to T317662: db1189 broken memory.

@Cmjohnson - just a heads up, this was just recently installed, so it's under warranty for submitting a RMA with Dell. Thanks, Willy

Tue, Sep 13, 3:57 PM · SRE, ops-eqiad, DBA
wiki_willy reassigned T317662: db1189 broken memory from wiki_willy to Cmjohnson.
Tue, Sep 13, 3:56 PM · SRE, ops-eqiad, DBA

Wed, Sep 7

wiki_willy assigned T316996: Degraded RAID on logstash2027 to Papaul.
Wed, Sep 7, 5:06 PM · Observability-Logging, SRE, ops-codfw

Sep 2 2022

wiki_willy added a comment to T314256: cp5001 memory errors on DIMM A2.

Hi @Vgutierrez - yeah, probably makes more sense to replace than purchase a replacement part, since the new servers have already been ordered and are expected to arrive in October. Thanks, Willy

Sep 2 2022, 5:28 PM · ops-eqsin, SRE, DC-Ops, Traffic

Aug 26 2022

wiki_willy assigned T316194: Degraded RAID on ms-be2035 to Papaul.
Aug 26 2022, 1:45 AM · SRE-swift-storage, SRE, ops-codfw
wiki_willy assigned T316121: PDU sensor over limit to Jclark-ctr.
Aug 26 2022, 1:44 AM · ops-eqiad

Aug 23 2022

wiki_willy added a comment to T314256: cp5001 memory errors on DIMM A2.

Assigning over to Rob, who's currently working on getting the eqsin hardware refresh ordered.

Aug 23 2022, 11:16 PM · ops-eqsin, SRE, DC-Ops, Traffic
wiki_willy assigned T314256: cp5001 memory errors on DIMM A2 to RobH.
Aug 23 2022, 11:14 PM · ops-eqsin, SRE, DC-Ops, Traffic
wiki_willy assigned T315989: elastic2054 is down with memory error to Papaul.
Aug 23 2022, 11:13 PM · SRE, Discovery-Search, ops-codfw
wiki_willy assigned T315924: decommission frlog1001.frack.eqiad.wmnet to Jclark-ctr.
Aug 23 2022, 11:09 PM · SRE, ops-eqiad, decommission-hardware

Aug 22 2022

wiki_willy reassigned T315052: Link from lsw1-e1-eqiad to lsw1-f3-eqiad down from Cmjohnson to Jclark-ctr.
Aug 22 2022, 4:02 PM · SRE, netops, ops-eqiad, Infrastructure-Foundations

Aug 18 2022

wiki_willy moved T315437: Two failed disks in ms-be1071 from Backlog to Hardware Failure / Troubleshoot on the ops-eqiad board.
Aug 18 2022, 8:25 PM · SRE, ops-eqiad, SRE-swift-storage, DC-Ops
wiki_willy moved T315480: Degraded RAID on ms-be1054 from Backlog to Hardware Failure / Troubleshoot on the ops-eqiad board.
Aug 18 2022, 8:25 PM · SRE-swift-storage, DC-Ops, SRE, ops-eqiad
wiki_willy assigned T315480: Degraded RAID on ms-be1054 to Jclark-ctr.

Hi @Jclark-ctr - this one shows a purchase date of August 7, 2019. Technically, it's after the 3yr warranty, but can you try submitting a RMA with to see if they'll take it? Thanks, Willy

Aug 18 2022, 8:13 PM · SRE-swift-storage, DC-Ops, SRE, ops-eqiad
wiki_willy assigned T315229: Degraded RAID on db2110 to Papaul.
Aug 18 2022, 4:02 PM · DBA, SRE, ops-codfw

Aug 17 2022

wiki_willy added a comment to T315229: Degraded RAID on db2110.

Most definitely, we'll get it procured in T315462

Aug 17 2022, 5:11 PM · DBA, SRE, ops-codfw
wiki_willy added a subtask for T315229: Degraded RAID on db2110: Unknown Object (Task).
Aug 17 2022, 5:10 PM · DBA, SRE, ops-codfw
wiki_willy assigned T315439: dbprov1002 lost power redundancy to Cmjohnson.
Aug 17 2022, 3:10 PM · SRE, ops-eqiad, DC-Ops
wiki_willy assigned T315437: Two failed disks in ms-be1071 to Jclark-ctr.

Looks like this is still under warranty. Since @Cmjohnson will be out on vacation soon, @Jclark-ctr - can you submit the RMA for this one? Thanks, Willy

Aug 17 2022, 3:07 PM · SRE, ops-eqiad, SRE-swift-storage, DC-Ops

Aug 16 2022

wiki_willy assigned T315344: PDU sensor over limit to Cmjohnson.
Aug 16 2022, 9:35 PM · ops-eqiad
wiki_willy assigned T314998: Inbound interface errors to Cmjohnson.
Aug 16 2022, 9:35 PM · ops-eqiad
wiki_willy assigned T315352: Port with no description on access switch to Cmjohnson.
Aug 16 2022, 9:34 PM · ops-eqiad

Aug 15 2022

wiki_willy assigned T315052: Link from lsw1-e1-eqiad to lsw1-f3-eqiad down to Cmjohnson.
Aug 15 2022, 6:01 PM · SRE, netops, ops-eqiad, Infrastructure-Foundations

Aug 10 2022

wiki_willy assigned T314951: SSH on cp1089.mgmt is flapping to Cmjohnson.
Aug 10 2022, 6:54 PM · Traffic, ops-eqiad, SRE

Aug 5 2022

wiki_willy assigned T314509: Degraded RAID on ms-be2035 to Papaul.
Aug 5 2022, 10:37 PM · DC-Ops, SRE-swift-storage, SRE, ops-codfw
wiki_willy assigned T314607: Degraded RAID on ms-be2035 to Papaul.
Aug 5 2022, 10:37 PM · SRE, ops-codfw
wiki_willy assigned T314628: db2135 (C6) lost power supply redundancy to Papaul.
Aug 5 2022, 10:36 PM · SRE, DC-Ops, ops-codfw
wiki_willy assigned T314427: Degraded RAID on ms-be2032 to Papaul.
Aug 5 2022, 10:36 PM · DC-Ops, SRE-swift-storage, SRE, ops-codfw
wiki_willy assigned T314517: decommission frauth1001.frack.eqiad.wmnet to Cmjohnson.
Aug 5 2022, 10:34 PM · SRE, ops-eqiad, decommission-hardware

Aug 3 2022

wiki_willy assigned T314413: cloudvirt1021 mgmt flapping to Cmjohnson.
Aug 3 2022, 6:02 PM · SRE, ops-eqiad, cloud-services-team (Kanban)

Jul 29 2022

wiki_willy assigned T311408: Decomission conf100[456] to Cmjohnson.
Jul 29 2022, 6:25 PM · SRE, serviceops-radar, DC-Ops, ops-eqiad

Jul 28 2022

wiki_willy assigned T314027: ps1-e4-eqiad alerts to Cmjohnson.
Jul 28 2022, 5:36 PM · SRE, ops-eqiad, DC-Ops
wiki_willy updated subscribers of T314027: ps1-e4-eqiad alerts.
Jul 28 2022, 5:20 PM · SRE, ops-eqiad, DC-Ops

Jul 27 2022

wiki_willy added a comment to T312745: cr2-eqiad:FPC3 partial failure (PIC2/3).

RMA shipped out by Chris on Tuesday, July 26

Jul 27 2022, 5:59 PM · ops-eqiad, SRE, netops, Infrastructure-Foundations

Jul 20 2022

wiki_willy assigned T313384: eqiad row C switch fabric recabling to Jclark-ctr.
Jul 20 2022, 8:39 PM · Sustainability (Incident Followup), SRE, Infrastructure-Foundations, ops-eqiad, netops

Jul 13 2022

wiki_willy assigned T215301: codfw spare pool system for partman testing to Papaul.

Looks like this one fell through the cracks without the "ops-codfw" project tag, so adding it back in. cc @Papaul

Jul 13 2022, 11:34 PM · ops-codfw, DC-Ops, SRE
wiki_willy added a project to T284126: Relabel db1183 to be dbstore1007: ops-eqiad.

Looks like this one fell through the cracks, so adding the "ops-eqiad" project tag. @Cmjohnson or @Jclark-ctr - can one of you guys see if this one already has the correct label on it? Thanks, Willy

Jul 13 2022, 11:30 PM · SRE, ops-eqiad, DC-Ops
wiki_willy added a project to T293111: Failed disk on analytics1069.eqiad.wmnet: ops-eqiad.

Looks like this was missing the "ops-eqiad" project tag, so it fell through the cracks. @BTullis - since the hardware was installed to refresh this host in T293922, do you still need this fixed? Thanks, Willy

Jul 13 2022, 11:27 PM · SRE, ops-eqiad, DC-Ops
wiki_willy added a project to T298621: Please verify location of an-master1001.eqiad.wmnet: ops-eqiad.

Adding "ops-eqiad' project tag

Jul 13 2022, 11:22 PM · SRE, ops-eqiad, DC-Ops
wiki_willy added a project to T298785: Please verify location of an-worker1111.eqiad.wmnet: ops-eqiad.

Hi @BTullis - I'm just coming across this request now. It was missing the "ops-eqiad" project tag, so looks like it fell through the cracks. I'll add the appropriate tag and either @Cmjohnson or @Jclark-ctr can take a look at it. Thanks, Willy

Jul 13 2022, 11:21 PM · SRE, ops-eqiad, DC-Ops

Jul 8 2022

wiki_willy reassigned T312626: Replace RAID controller battery in an-worker1082 from BTullis to Cmjohnson.

Looks like it's a R730 that's out of warranty. @Cmjohnson or @Jclark-ctr - do we still have any extra RAID controller batteries lying around? Thanks, Willy

Jul 8 2022, 9:41 PM · SRE, ops-eqiad, DC-Ops

Jul 5 2022

wiki_willy added a comment to T290899: Q1: eqiad: (32) PDUs for expansion.

Thanks @Papaul!

Jul 5 2022, 11:04 PM · SRE, ops-eqiad, DC-Ops

Jun 30 2022

wiki_willy removed a project from T310451: hdfs client packages for debian Bullseye: ops-eqiad.
Jun 30 2022, 4:47 PM · cloud-services-team (Kanban), Infrastructure-Foundations, SRE
wiki_willy removed a project from T309346: Replace labstore100[67] with clouddumps100[12]: ops-eqiad.
Jun 30 2022, 4:46 PM · Patch-For-Review, cloud-services-team (Kanban), Infrastructure-Foundations, SRE

Jun 29 2022

wiki_willy reassigned T309885: cloudstore1008 - eno2 reporting no carrier from AndrewBonamici to aborrero.
Jun 29 2022, 12:41 AM · ops-eqiad, SRE

Jun 24 2022

wiki_willy added a comment to T302870: Grant cn=nda some sort of read only access to Netbox.

Thanks for checking @ayounsi. My personal opinion on the contacts list is to restrict it if possible. I don't see any issues sharing the generic vendor email addresses and numbers, which is all publicly available anyways, but there are also some individual names of account reps and their contact information listed. If we were to share this (and continue to populate it with future contacts), I think it's appropriate that we check with each individual vendor rep to ensure they're ok with that. And since some reps might be fine with it and others not, it could end up being difficult maintaining the page. I also prefer to block out any specific data center addresses, cage numbers, and rack locations. While I have complete trust in 99% of those with the NDA read-access, I do worry a little about a smaller percentage of potential bad actors down the line. We have experienced threats in the past, so from a security standpoint, the less precise we are with the exact location of our equipment (cage, room, hall, rack, etc), the safer we can keep the hardware and our onsite employees.

Jun 24 2022, 2:10 AM · Patch-For-Review, SRE, Infrastructure-Foundations, netbox

Jun 17 2022

wiki_willy reassigned T294972: Q2:(Need By: TBD) rack/setup/install cloudcephosd10[25-34] from cmooney to Cmjohnson.
Jun 17 2022, 10:41 PM · SRE, cloud-services-team (Hardware), ops-eqiad, DC-Ops
wiki_willy moved T299574: Q3:(Need By: TBD) rack/setup/install cloudvirt10[48-50].eqiad.wmnet from Blocked to Racking Tasks on the ops-eqiad board.
Jun 17 2022, 10:40 PM · cloud-services-team (Hardware), SRE, ops-eqiad, DC-Ops

Jun 15 2022

wiki_willy reassigned T310595: db1173 won't boot up from wiki_willy to Cmjohnson.
Jun 15 2022, 9:46 PM · SRE, ops-eqiad, DBA

Jun 10 2022

wiki_willy assigned T309595: Degraded RAID on ms-be2066 to Papaul.
Jun 10 2022, 6:38 PM · SRE, ops-codfw

Jun 8 2022

wiki_willy assigned T309346: Replace labstore100[67] with clouddumps100[12] to Cmjohnson.
Jun 8 2022, 8:43 PM · Patch-For-Review, cloud-services-team (Kanban), Infrastructure-Foundations, SRE
wiki_willy assigned T310041: Failed PSU on ganeti1023 to Jclark-ctr.
Jun 8 2022, 8:43 PM · SRE, ops-eqiad
wiki_willy assigned T310160: Degraded RAID on ms-be1064 to Cmjohnson.
Jun 8 2022, 8:43 PM · SRE, ops-eqiad
wiki_willy assigned T310181: Degraded RAID on ms-be1064 to Cmjohnson.
Jun 8 2022, 8:42 PM · SRE, ops-eqiad

Jun 6 2022

wiki_willy closed T309576: Degraded RAID on cloudnet1004 as Resolved.
Jun 6 2022, 9:27 PM · cloud-services-team (Kanban), SRE, ops-eqiad
wiki_willy closed T309576: Degraded RAID on cloudnet1004, a subtask of T304888: Q4: (Need By: TBD) rack/setup/install 6 wmcs hosts, as Resolved.
Jun 6 2022, 9:26 PM · Patch-For-Review, SRE, cloud-services-team (Hardware), ops-eqiad, DC-Ops
wiki_willy assigned T309741: Port with no description on access switch to Cmjohnson.
Jun 6 2022, 9:24 PM · ops-eqiad

May 26 2022

wiki_willy reassigned T309291: db1128 faulty memory from wiki_willy to Cmjohnson.

Hi @Marostegui - @Cmjohnson is going to check if we can pull one of the DIMMs from one of these retired pc* hosts:

May 26 2022, 8:23 PM · SRE, ops-eqiad

May 25 2022

wiki_willy added a comment to T308434: Replace RAID controller battery in an-worker1081.

Hi @BTullis - John typically gets into work a bit later in the day, but that should totally work. Thanks for checking!

May 25 2022, 4:13 PM · SRE, ops-eqiad, DC-Ops

May 24 2022

wiki_willy added a comment to T308434: Replace RAID controller battery in an-worker1081.

Hi @BTullis - I noticed analytics1068 has a failed status and is set to be refreshed after @Cmjohnson finishes up T293922. As a quick fix, would we be able to pull the RAID controller battery from analytics1068 and use it for an-worker1081?

May 24 2022, 7:18 PM · SRE, ops-eqiad, DC-Ops

May 17 2022

wiki_willy added a comment to T308434: Replace RAID controller battery in an-worker1081.

Hi @Jclark-ctr - this one is out of warranty, but let me know if you have any spares around or if we should purchase one. Thanks, Willy

May 17 2022, 8:19 AM · SRE, ops-eqiad, DC-Ops
wiki_willy assigned T308434: Replace RAID controller battery in an-worker1081 to Jclark-ctr.
May 17 2022, 8:17 AM · SRE, ops-eqiad, DC-Ops

May 13 2022

wiki_willy assigned T300485: cr3-eqsin:xe-0/1/1 interface errors to RobH.
May 13 2022, 9:13 PM · SRE, ops-eqsin
wiki_willy assigned T308246: db1164 power supply isn't redundant to Jclark-ctr.

Hi @Jclark-ctr - can you check this out, since @Cmjohnson will be on vacation? Thanks, Willy

May 13 2022, 9:11 PM · SRE, ops-eqiad, DBA

Apr 28 2022

wiki_willy updated subscribers of T304989: Finalise design extension of WMCS networks to new cloudsw in Eqiad rows E/F.
Apr 28 2022, 6:11 PM · Patch-For-Review, SRE, Infrastructure-Foundations, netops
wiki_willy added a comment to T306130: Hypervisor hardware config for 2022 and beyond.

Sounds good, thanks @nskaggs

Apr 28 2022, 4:54 PM · cloud-services-team (Kanban)

Apr 27 2022

wiki_willy assigned T307035: Relocate hosts: aqs10[3-5] to Cmjohnson.
Apr 27 2022, 7:49 PM · SRE, DC-Ops, ops-eqiad, Cassandra, User-Eevans

Apr 26 2022

wiki_willy added a comment to T305102: Erroneous node placement (AQS Cassandra cluster).

Got it, that makes sense. Thanks for the details and the feedback @BTullis. It definitely gives us a bit more flexibility knowing we can use different racks in those same rows. We'll go ahead and re-rack some of the new aqs1016-1021 servers to follow the proposed plan. Feel free to submit a Dc-Ops task (with the "ops-eqiad" project tag), along with some proposed timeframes for the physical move, and we'll get aqs1013-1015 migrated as well.

Apr 26 2022, 7:42 PM · Cassandra, User-Eevans

Apr 25 2022

wiki_willy added a comment to T306130: Hypervisor hardware config for 2022 and beyond.

Hi @nskaggs - we have confirmation from Dell that all the components currently used in the R640s can be purchased in a R440. For reference, here are the itemized components attached below on both 1g and 10g. The power supplies on the R440s are a bit lower at 550w (vs 750w on the R640s). However, the issue described in T306130#7871929 is no longer a problem since the SSDs we use today do not have the same constraint from the ones used back in 2018. I'm still working with Dell on getting a lower price point, but if your team is good with the change and still want all the same internal server specs, I'll update Config G to reflect the R440s going forward.

Apr 25 2022, 10:53 PM · cloud-services-team (Kanban)
wiki_willy updated subscribers of T305102: Erroneous node placement (AQS Cassandra cluster).

Hi - John already started racking some of the new aqs1016-1021 servers in T305570. The racking details in that task didn't specify servers needing to go into specific racks (only general disbursement across rows, using the same rows as aqs1010-1015), so just confirming if these servers need to be in the exact racks of A1, D1, B2, E2, C3, F3 outlined in the task description to function properly? Or is there some wiggle room to use other racks in these same rows?

Apr 25 2022, 8:24 PM · Cassandra, User-Eevans

Apr 22 2022

wiki_willy added a comment to T306654: Request sudo access for Jclark-ctr.

Thanks @Papaul. Access for John Clark to run these commands is all approved on my end as well. Thanks, Willy

Apr 22 2022, 3:45 AM · Infrastructure-Foundations (FY2021/2022-Q4), SRE, SRE-Access-Requests

Apr 21 2022

wiki_willy added a comment to T306130: Hypervisor hardware config for 2022 and beyond.

Hi @nskaggs - I asked Dell to get us the config and pricing for that on a R440 chassis, so we should hear back early next week.

Apr 21 2022, 5:12 PM · cloud-services-team (Kanban)

Apr 19 2022

wiki_willy assigned T306129: Port with no description on access switch to Cmjohnson.
Apr 19 2022, 6:08 PM · ops-eqiad

Apr 16 2022

wiki_willy added a comment to T284614: Netbox: define strategy to track standard server configurations.

Thanks for confirming @ayounsi. I don't know how much effort it would be to pull the CPU, memory, or hard drive specs on each server and compare it to the config version as an additional check, but I kind of wonder if it's worth the additional time. There would also be a lot of upkeep going forward, as the internal components continue to evolve for each config version every year. Since we would primarily reference the config version when either a) repurposing hosts or b) cannibalizing parts off a decom'd server (which I don't see happening too often), I think a partial check on the model seems good for now. I'm totally open though if folks want to go different direction with this. Thanks!

Apr 16 2022, 12:37 AM · Infrastructure-Foundations, netbox

Apr 14 2022

wiki_willy added a comment to T306007: Avoid ghost hosts on the network.

Hi @ayounsi - can you provide a few recent examples of when this has triggered alerts? We're trying to align and find some patterns, to tweak things process wise.

Apr 14 2022, 9:27 PM · SRE, Infrastructure-Foundations, netbox, netops, DC-Ops
wiki_willy assigned T304934: Test port-block constraints on QFX5120 devices to Jclark-ctr.
Apr 14 2022, 9:20 PM · ops-eqiad, SRE, Infrastructure-Foundations, netops
wiki_willy assigned T306215: Degraded RAID on ms-be1068 to Cmjohnson.
Apr 14 2022, 7:36 PM · SRE, ops-eqiad
wiki_willy added a comment to T284614: Netbox: define strategy to track standard server configurations.

Thanks for checking @ayounsi. Just to confirm, will the previous Netbox alert error that Riccardo had fixed, continue checking for any configuration mismatches?

Apr 14 2022, 6:35 PM · Infrastructure-Foundations, netbox
wiki_willy updated subscribers of T267219: Netbox: Add rack/U and asset tag fields to AssignIP script.

Adding @Papaul, @Jclark-ctr, @RobH, and @Cmjohnson - do you guys have any preferences or strong opinions on this proposal?

Apr 14 2022, 6:18 PM · Infrastructure-Foundations, netbox

Apr 12 2022

wiki_willy added a comment to T305423: cp5002 memory errors on DIMM A4.

Thanks @ssingh. Rob's working on sourcing the replacement DIMM, so we should have that sorted out soon, and will keep you in the loop via an adjacent procurement task. Thanks, Willy

Apr 12 2022, 7:49 PM · ops-eqsin, SRE, DC-Ops, Traffic

Apr 8 2022

wiki_willy added a comment to T303318: ganeti4002 dimm error.

Hi @RobH - just followingup to see if they ever sent the DIMM for this. Thanks, Willy

Apr 8 2022, 7:57 PM · Traffic, SRE, ops-ulsfo, DC-Ops
wiki_willy assigned T305423: cp5002 memory errors on DIMM A4 to RobH.

Hi @ssingh - since this server is out of warranty and due to be refreshed in a few quarters, do you still want us to purchase a replacement DIMM to keep it up and running in the meantime or are you able to wait it out? Thanks, Willy

Apr 8 2022, 5:44 PM · ops-eqsin, SRE, DC-Ops, Traffic

Apr 5 2022

wiki_willy renamed T304888: Q4: (Need By: TBD) rack/setup/install 6 wmcs hosts from (Need By: TBD) rack/setup/install 6 wmcs hosts to Q4: (Need By: TBD) rack/setup/install 6 wmcs hosts.
Apr 5 2022, 7:09 PM · Patch-For-Review, SRE, cloud-services-team (Hardware), ops-eqiad, DC-Ops

Apr 4 2022

wiki_willy added a comment to T292095: Q2:(Need By: TBD) Rows E/F network racking task.

@Jclark-ctr - just following up Cathal's last comment

Apr 4 2022, 5:20 PM · SRE, Infrastructure-Foundations, netops, ops-eqiad, DC-Ops
wiki_willy reassigned T292095: Q2:(Need By: TBD) Rows E/F network racking task from Cmjohnson to Jclark-ctr.
Apr 4 2022, 5:20 PM · SRE, Infrastructure-Foundations, netops, ops-eqiad, DC-Ops

Mar 29 2022

wiki_willy moved T304873: Degraded RAID on thanos-be1003 from Backlog to High Priority Task on the ops-eqiad board.
Mar 29 2022, 12:39 AM · SRE, ops-eqiad
wiki_willy assigned T303044: decommission kubernetes100[1-4] to Cmjohnson.
Mar 29 2022, 12:39 AM · SRE, ops-eqiad, decommission-hardware
wiki_willy assigned T304873: Degraded RAID on thanos-be1003 to Cmjohnson.
Mar 29 2022, 12:38 AM · SRE, ops-eqiad

Mar 21 2022

wiki_willy assigned T303242: ripe-atlas-esams down to RobH.
Mar 21 2022, 6:06 PM · SRE, DC-Ops, ops-esams
wiki_willy assigned T304280: db1175 not booting up to Cmjohnson.
Mar 21 2022, 4:46 PM · SRE, ops-eqiad

Mar 10 2022

wiki_willy assigned T303183: cp1085 memory errors on DIMM A5 to Cmjohnson.
Mar 10 2022, 8:40 PM · DC-Ops, SRE, ops-eqiad, Traffic
wiki_willy moved T297906: Change physical label from copernicum.wikimedia.org to mirror1001.wikimedia.org from Backlog to Lower Priority Items on the ops-eqiad board.
Mar 10 2022, 5:38 PM · ops-eqiad, Infrastructure-Foundations, DC-Ops
wiki_willy assigned T297906: Change physical label from copernicum.wikimedia.org to mirror1001.wikimedia.org to Cmjohnson.
Mar 10 2022, 5:36 PM · ops-eqiad, Infrastructure-Foundations, DC-Ops
wiki_willy added a comment to T293221: Agree how to document intra-DC patch panels in Netbox.

Thanks @ayounsi and @cmooney for all your feedback and forward thinking suggestions around this. What I'm leaning towards on this is to only document the info for the patch panels we own in Netbox. I think @Jclark-ctr was playing around with that last week with the expansion related stuff and seems to have that part down, so I think we should be good here. When it comes to patch panels that are on the vendor side, I'm thinking we should just continue adding the information under the patch panel section for circuits (example; https://netbox.wikimedia.org/circuits/circuits/6/) I'd rather not make too many changes in documenting things on the vendor side, until we have a more recent version of Netbox in place. Once a new version of Netbox is in place, maybe we can revisit again and see how things look under circuits at that point? Does that work for everyone?

Mar 10 2022, 3:19 AM · netbox, Infrastructure-Foundations

Mar 8 2022

wiki_willy moved T294949: Q2:(Need By: TBD) rack/setup/install ml-serve100[5-8] from Procurement to Racking Tasks on the ops-eqiad board.
Mar 8 2022, 7:51 PM · Patch-For-Review, SRE, Machine-Learning-Team, ops-eqiad, DC-Ops