Page MenuHomePhabricator

Cmjohnson (cmjohnson)
User

Projects (11)

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Tuesday

  • Clear sailing ahead.

User Details

User Since
Dec 16 2014, 10:22 PM (310 w, 4 d)
Availability
Available
IRC Nick
cmjohnson1
LDAP User
Cmjohnson
MediaWiki User
Unknown

Recent Activity

Wed, Nov 25

Cmjohnson updated the task description for T266369: relocate/reimage cloudvirt1027 with 10G interfaces.
Wed, Nov 25, 8:23 PM · cloud-services-team (Hardware), ops-eqiad, DC-Ops, Operations
Cmjohnson closed T268102: decommission es1014.eqiad.wmnet as Resolved.

done and off the rack

Wed, Nov 25, 8:21 PM · DC-Ops, Operations, ops-eqiad, decommission-hardware
Cmjohnson closed T268101: decommission es1012.eqiad.wmnet as Resolved.

done and off the rack

Wed, Nov 25, 8:21 PM · DC-Ops, ops-eqiad, Operations, decommission-hardware
Cmjohnson closed T268100: decommission es1011.eqiad.wmnet as Resolved.

done and off the rack

Wed, Nov 25, 8:20 PM · DC-Ops, Operations, ops-eqiad, decommission-hardware
Cmjohnson added a comment to T267672: Interface errors on cr1-eqiad:xe-3/2/1.

noticed this today

Wed, Nov 25, 8:19 PM · Operations, ops-eqiad

Tue, Nov 24

Cmjohnson added a comment to T268281: Degraded RAID on labstore1006.

A case has been opened with HPE 5351787485

Tue, Nov 24, 9:00 PM · cloud-services-team (Hardware), ops-eqiad, Operations
Cmjohnson moved T268281: Degraded RAID on labstore1006 from Blocked to Hardware Failure / Troubleshoot on the ops-eqiad board.
Tue, Nov 24, 8:59 PM · cloud-services-team (Hardware), ops-eqiad, Operations
Cmjohnson moved T268101: decommission es1012.eqiad.wmnet from Backlog to Decommission on the ops-eqiad board.
Tue, Nov 24, 4:11 PM · DC-Ops, ops-eqiad, Operations, decommission-hardware
Cmjohnson moved T268281: Degraded RAID on labstore1006 from Backlog to Blocked on the ops-eqiad board.
Tue, Nov 24, 4:10 PM · cloud-services-team (Hardware), ops-eqiad, Operations
Cmjohnson assigned T268281: Degraded RAID on labstore1006 to wiki_willy.
Tue, Nov 24, 4:10 PM · cloud-services-team (Hardware), ops-eqiad, Operations
Cmjohnson updated subscribers of T268281: Degraded RAID on labstore1006.

Looking at the server it's not abundantly clear which disk or disks are bad. I do know this server is out of warranty and a disk or 2 will need to be purchased. Looping in @wiki_willy to facilitate a disk purcahse.

Tue, Nov 24, 4:10 PM · cloud-services-team (Hardware), ops-eqiad, Operations
Cmjohnson closed T268125: Invalid port info on eqiad switches as Resolved.

xe-2/0/15 and 16 did not have cables attached.

Tue, Nov 24, 4:07 PM · ops-eqiad, Operations
Cmjohnson added a comment to T267672: Interface errors on cr1-eqiad:xe-3/2/1.

I replaced the sfp+ at cr1-eqiad xe-3/2/1 cleared the interface statistics on that port. Let's leave this open a few days to see if anything changes.

Tue, Nov 24, 4:05 PM · Operations, ops-eqiad

Thu, Nov 19

Cmjohnson moved T266164: eqiad: Physical moves for MediaWiki servers from Backlog to High Priority Task on the ops-eqiad board.
Thu, Nov 19, 6:06 PM · Operations, serviceops, ops-eqiad, DC-Ops
Cmjohnson moved T267065: eqiad: Server moves to free up space on 10g racks from Backlog to High Priority Task on the ops-eqiad board.
Thu, Nov 19, 6:05 PM · Platform Engineering, ops-eqiad, Operations, DC-Ops
Cmjohnson moved T267672: Interface errors on cr1-eqiad:xe-3/2/1 from Backlog to Hardware Failure / Troubleshoot on the ops-eqiad board.
Thu, Nov 19, 6:05 PM · Operations, ops-eqiad
Cmjohnson moved T268125: Invalid port info on eqiad switches from Backlog to Hardware Failure / Troubleshoot on the ops-eqiad board.
Thu, Nov 19, 6:05 PM · ops-eqiad, Operations
Cmjohnson moved T268100: decommission es1011.eqiad.wmnet from Backlog to Decommission on the ops-eqiad board.
Thu, Nov 19, 6:05 PM · DC-Ops, Operations, ops-eqiad, decommission-hardware
Cmjohnson moved T268102: decommission es1014.eqiad.wmnet from Backlog to Decommission on the ops-eqiad board.
Thu, Nov 19, 6:05 PM · DC-Ops, Operations, ops-eqiad, decommission-hardware
Cmjohnson closed T267160: Degraded RAID on an-presto1004 as Resolved.

The backplane and raid controller were both replaced, all disks are showing online.

Thu, Nov 19, 6:03 PM · ops-eqiad, Operations
Cmjohnson moved T268036: Degraded RAID on ms-be1030 from Backlog to Hardware Failure / Troubleshoot on the ops-eqiad board.
Thu, Nov 19, 6:03 PM · ops-eqiad, Operations
Cmjohnson assigned T265093: (Need By: ASAP) rack/setup/install ms-be106[0-3] to RobH.

@RobH These are ready for you, the raid still needs setup but everything is done.

Thu, Nov 19, 6:00 PM · Operations, ops-eqiad, DC-Ops
Cmjohnson updated the task description for T265093: (Need By: ASAP) rack/setup/install ms-be106[0-3].
Thu, Nov 19, 5:59 PM · Operations, ops-eqiad, DC-Ops
Cmjohnson closed T268171: Degraded RAID on an-presto1004 as Resolved.

It looks like all the disks are working from my end. I am resolving this task.

Thu, Nov 19, 5:53 PM · Analytics-Radar, ops-eqiad, Operations

Wed, Nov 18

Cmjohnson updated the task description for T268146: (Need By: TBD) rack/setup/install an-tool1010.eqiad.wmnet.
Wed, Nov 18, 7:56 PM · ops-eqiad, Operations, DC-Ops
Cmjohnson closed T268125: Invalid port info on eqiad switches as Resolved.

cleaned up the restbase cables and disabled the ports. Also, verified that the other 2 restbases on that ticket didn't have multiple production cables connected.

Wed, Nov 18, 7:55 PM · ops-eqiad, Operations
Cmjohnson updated the task description for T265093: (Need By: ASAP) rack/setup/install ms-be106[0-3].
Wed, Nov 18, 7:13 PM · Operations, ops-eqiad, DC-Ops
Cmjohnson closed T268125: Invalid port info on eqiad switches as Resolved.

ge-3/0/22 is in eth1 on and ge-3/0/23 is in eth 3 on restbase1018. (this is reflected on the switch)

Wed, Nov 18, 4:44 PM · ops-eqiad, Operations

Tue, Nov 17

Cmjohnson moved T267870: ms-be1022 smart storage battery failure; disk sdb possibly bad from Hardware Failure / Troubleshoot to Blocked on the ops-eqiad board.
Tue, Nov 17, 3:36 PM · SRE-swift-storage, ops-eqiad, Operations
Cmjohnson added a comment to T266164: eqiad: Physical moves for MediaWiki servers.

@Dzahn I would need to move them to a 1G rack, (B1,B3,B5,B6 and B8)

Tue, Nov 17, 3:35 PM · Operations, serviceops, ops-eqiad, DC-Ops
Cmjohnson closed T260448: (Need By: 2020-09-15) upgrade/replace memory in stat100[58] as Resolved.

added the new power supplies (will keep the older ones for spares). Added all the new memory sticks. resolving this tasks, if something comes up related to the upgrade please ping me and re-open.

Tue, Nov 17, 3:34 PM · Analytics-Clusters, Operations, ops-eqiad, DC-Ops
Cmjohnson assigned T267870: ms-be1022 smart storage battery failure; disk sdb possibly bad to wiki_willy.

I swapped the bbu with one from a decom'd ms-be host. The server shutdown during the boot process. I put the old bbu back in and the server booted okay. If @fgiunchedi needs this server then we need to purchase a new battery from HP. assigning to @wiki_willy for the next steps.

Tue, Nov 17, 3:29 PM · SRE-swift-storage, ops-eqiad, Operations
Cmjohnson updated subscribers of T267160: Degraded RAID on an-presto1004.

Dell is sending a new backplane and a couple of disks with a technician. I am not sure when they will arrive. I received an email from Dell this morning that they are delayed. @elukey I will give you as much notice as I can to take this server down for maintenance.

Tue, Nov 17, 2:41 PM · ops-eqiad, Operations

Mon, Nov 16

Cmjohnson added a comment to T261405: db1139 memory errors on boot (issue continues after board change) 2020-08-27.

@wiki_willy I do not know what the Q number would be, all of the HP servers start with MXQ and confirmed MXQ91300JF is correct.

Mon, Nov 16, 6:13 PM · Operations, DBA, ops-eqiad
Cmjohnson moved T267870: ms-be1022 smart storage battery failure; disk sdb possibly bad from Backlog to Hardware Failure / Troubleshoot on the ops-eqiad board.
Mon, Nov 16, 6:05 PM · SRE-swift-storage, ops-eqiad, Operations
Cmjohnson added a comment to T267870: ms-be1022 smart storage battery failure; disk sdb possibly bad.

@fgiunchedi The server is out of warranty, I have some decom'd HP servers and most likely can steal a bbu from one of them. I also have decom'd host w/3TB disks that we can take from. This server will require downtime, also worth noting the 4 new ms-be hosts are here and in the rack and will be ready for you by the end of the week (at the latest). In case you want to decom ms-be1022.

Mon, Nov 16, 6:05 PM · SRE-swift-storage, ops-eqiad, Operations
Cmjohnson closed T267872: Degraded RAID on ms-be1022 as Resolved.
Mon, Nov 16, 6:02 PM · ops-eqiad, Operations
Cmjohnson closed T267827: an-worker1113 not in librenms and doesn't show up on juno's interface description as Resolved.

I am not sure how it was missed but port 19 is an-worker1114 and 18 is an-worker1113. I updated the switch ports

Mon, Nov 16, 6:01 PM · ops-eqiad, Operations

Tue, Nov 10

Cmjohnson added a comment to T243390: Reclaim torrelay1001 to spares.

network switch updated with asset tag, removed from public vlan and added to disabled

Tue, Nov 10, 6:23 PM · ops-eqiad, Operations, decommission-hardware
Cmjohnson added a comment to T260448: (Need By: 2020-09-15) upgrade/replace memory in stat100[58].

@elukey Let's schedule this for next Tuesday please 1500UTC (10EST)

Tue, Nov 10, 6:10 PM · Analytics-Clusters, Operations, ops-eqiad, DC-Ops
Cmjohnson added a comment to T265113: Memory issue on elastic1063 caused elasticsearch to be killed.

Thanks, @dcausse Still no h/w error in idrac, A ticket with Dell will need to be created, the server is under warranty.

Tue, Nov 10, 6:09 PM · ops-eqiad, Discovery-Search, Operations
Cmjohnson added a comment to T261405: db1139 memory errors on boot (issue continues after board change) 2020-08-27.

reseated all of the DIMM, the erorr remained the same

Tue, Nov 10, 6:06 PM · Operations, DBA, ops-eqiad
Cmjohnson updated subscribers of T267392: analytics1046/analytics1057 stuck in booting.

@elukey @razzi @wiki_willy The servers are stuck and I cannot update bios or firmware. Please decommission.

Tue, Nov 10, 5:22 PM · Analytics-Radar, Operations, ops-eqiad
Cmjohnson added a comment to T267160: Degraded RAID on an-presto1004.

After more investigating and trying to swap it with a known good 4TB disk, I see an amber light blinking on the backplane. I reached back out to Dell to let them know that they should also send me a backplane and 2 new disks.

Tue, Nov 10, 5:18 PM · ops-eqiad, Operations
Cmjohnson added a comment to T267392: analytics1046/analytics1057 stuck in booting.

Both servers are stuck at the same spot during post. I tried rebooting an-1046 but it still sticks, One of the power supplies is bad and I replaced it with one from a spare but there seems to be more of a problem. I am trying to update bios and idrac now to see if that helps. The h/w log doesn't show anything wrong. These are both well out of warranty and if this doesn't fix the issue we need to have them decommissioned.

Tue, Nov 10, 5:16 PM · Analytics-Radar, Operations, ops-eqiad

Mon, Nov 9

Cmjohnson added a comment to T267065: eqiad: Server moves to free up space on 10g racks.

@Jclark-ctr if you can give me the network ports you intend to use I will have them pre-configured as well.

Mon, Nov 9, 5:22 PM · Platform Engineering, ops-eqiad, Operations, DC-Ops

Thu, Nov 5

Cmjohnson closed T236327: replace onboard NIC in kafka-jumbo100[1-6] as Resolved.

this has been completed

Thu, Nov 5, 8:21 PM · Analytics-Clusters, ops-eqiad, Operations, User-Elukey
Cmjohnson closed T236327: replace onboard NIC in kafka-jumbo100[1-6], a subtask of T220700: Upgrade kafka-jumbo100[1-6] to 10G NICs (if possible), as Resolved.
Thu, Nov 5, 8:21 PM · Analytics-Radar, ops-eqiad, hardware-requests, Operations, User-Elukey
Cmjohnson closed T261348: (Need By: TBD) install new controller into frdb1001 OR add to spares as Resolved.

Had a conversation with Jeff about this and we're going to just hold on to the controller for now. There isn't any immediate need to replace it. I am resolving this task. The controller will have this phab task written on it for reference.

Thu, Nov 5, 6:26 PM · ops-eqiad, Operations, DC-Ops
Cmjohnson reassigned T228919: (Need by: 2020-06-30) replace scs-a8-eqiad from Cmjohnson to Jclark-ctr.

I am assigning this to @Jclark-ctr. John, the new scs is in the flexspace, all of the cable ends may need to be snipped and re-done with a standard tia568a. This may not be necessary for all of row A and B, some of these have dongles that are on the system side of the connection. I recommend swapping the scs, plugging in and removing the dongles first.

Thu, Nov 5, 5:24 PM · ops-eqiad, Operations
Cmjohnson added a comment to T266164: eqiad: Physical moves for MediaWiki servers.

mw1267 issues have been fixed

Thu, Nov 5, 5:06 PM · Operations, serviceops, ops-eqiad, DC-Ops
Cmjohnson added a comment to T267043: (Need By: 2020-11-29) rack/setup/install db11[51-76].

@Marostegui yes, db1091 is already gone from the racks. I did a more detailed count and right now, not removing any 1G servers from 10G racks I can fit 10 1U DB hosts. Your servers are not here yet so we have some time to make space but it's going to require moving 1G servers out of 10G U space.

Thu, Nov 5, 4:27 PM · DBA, ops-eqiad, DC-Ops, Operations
Cmjohnson added a comment to T267043: (Need By: 2020-11-29) rack/setup/install db11[51-76].

When these arrive they will be sitting on the floor until we have space to rack them. At this time I may be able to get 4 or 5 racked in 10G racks.

Thu, Nov 5, 3:59 PM · DBA, ops-eqiad, DC-Ops, Operations
Cmjohnson added a comment to T260445: (Need By: TBD) rack/setup/install an-worker10[18-41].

@elukey to answer some of the earlier questions. @wiki_willy and I identified all the 1G servers in 10G racks that we could potentially move to create more space (T267065). Will it happen in a month, probably not. Based on the rack availability row B has 8 openings in 2 racks. I now have 0 openings in row A, rack C2 has the 2 and all of row D availability is in one rack. I am also in a predicament because we have several more ms-be servers arriving that are 2U and need 10G and new database servers that are 1U and need 10G space. Let me know what you want to do? I can rack a few of your servers for now and wait for space to open up, I could just fill every hole I have and then rack the remainder when/if more space opens.

Thu, Nov 5, 3:58 PM · Analytics-Clusters, Operations, ops-eqiad, DC-Ops
Cmjohnson added a comment to T260445: (Need By: TBD) rack/setup/install an-worker10[18-41].

@elukey There are 2 480GB SSDs and 12 4TB disks in each of the servers. They are all unpacked and I can rack some but not all of them.

Thu, Nov 5, 3:43 PM · Analytics-Clusters, Operations, ops-eqiad, DC-Ops
Cmjohnson added a comment to T266192: Connect cloudstore1008 and cloudstore1009 directly via second 10G interface similar to labstore1004/5.

@wiki_willy the crossover cable needs to be made. We have cat5 cable on-site and can be cut to the length needed. If you rather purchase a cross over cable then we need a blue 10M cable.

Thu, Nov 5, 3:37 PM · cloud-services-team (Hardware), ops-eqiad, Data-Services, Operations
Cmjohnson added a comment to T267160: Degraded RAID on an-presto1004.

Dell reached out and needed more information and raid log. I sent over to them now.

Thu, Nov 5, 3:35 PM · ops-eqiad, Operations
Cmjohnson moved T267160: Degraded RAID on an-presto1004 from Backlog to Hardware Failure / Troubleshoot on the ops-eqiad board.
Thu, Nov 5, 2:45 PM · ops-eqiad, Operations
Cmjohnson moved T267242: Network blip for mw hosts in rack C3 (eqiad) from Backlog to Blocked on the ops-eqiad board.
Thu, Nov 5, 2:45 PM · ops-eqiad, Operations, netops

Wed, Nov 4

Cmjohnson added a comment to T267160: Degraded RAID on an-presto1004.

Sent the TSR report to Dell for a new disk

Wed, Nov 4, 4:30 PM · ops-eqiad, Operations
Cmjohnson added a comment to T254272: Update Documentation for dl360 Motherboard Swap.

John, you can use the db1139 swap to assist with the documentation.

Wed, Nov 4, 1:41 PM · Operations, ops-eqiad, DC-Ops
Cmjohnson reassigned T261405: db1139 memory errors on boot (issue continues after board change) 2020-08-27 from RobH to Jclark-ctr.

John, on Thursday can you swap the motherboard out please. The new one is the flex space.

Wed, Nov 4, 1:40 PM · Operations, DBA, ops-eqiad

Tue, Nov 3

Cmjohnson closed T253438: an-presto1004 down as Resolved.

@elukey the an-presto1004 motherboard has been replaced and the backplane, everything came back up as normal except I am not able to ssh into the server and fresh install may be needed. While it was down I updated the idrac and bios. I am resolving this as the on-site work has been completed. Please reopen if there is still a problem.

Tue, Nov 3, 7:45 PM · Analytics-Radar, Operations, ops-eqiad
Cmjohnson added a comment to T267065: eqiad: Server moves to free up space on 10g racks.

@wiki_willy I had time to do this today while the Dell tech worked on an-presto1004. I am going to be utilizing a 2U space in A2 and B2 for the kafka-jumbo 10G updates leaving only 15 2U spaces. We will have less than I previously reported. I am also pasting what I put in the an-worker ticket here for better tracking.

Tue, Nov 3, 7:40 PM · Platform Engineering, ops-eqiad, Operations, DC-Ops
Cmjohnson claimed T267065: eqiad: Server moves to free up space on 10g racks.
Tue, Nov 3, 7:37 PM · Platform Engineering, ops-eqiad, Operations, DC-Ops
Cmjohnson reassigned T218734: Decommission asw-a-eqiad from Cmjohnson to Jclark-ctr.

@Jclark-ctr Please make sure all of these switches have been restored to factory defaults, unplug, and remove the racks. Please be very careful not to unplug anything else. There is a lot going on back there. Once off the racks please pull all the old stacking cables.

Tue, Nov 3, 7:12 PM · decommission-hardware, Operations, ops-eqiad
Cmjohnson reassigned T234462: Reclaim labpuppetmaster1001 and 1002 from Cmjohnson to wiki_willy.

@wiki_willy @RobH Are we returning to spare or decommissioning these?

Tue, Nov 3, 6:56 PM · cloud-services-team (Kanban), Operations, ops-eqiad, decommission-hardware
Cmjohnson closed T224475: Return sulfur to spares as Resolved.

@RobH This server is ready to go back to you for spares. Where are you tracking that?

Tue, Nov 3, 6:54 PM · Operations, decommission-hardware, ops-eqiad
Cmjohnson added a comment to T267065: eqiad: Server moves to free up space on 10g racks.

Racks D2 and D7 are 100% 10G but they were initially built that way. D4 was just converted to 10G

Tue, Nov 3, 6:46 PM · Platform Engineering, ops-eqiad, Operations, DC-Ops
Cmjohnson added a comment to T267065: eqiad: Server moves to free up space on 10g racks.

Row C

Tue, Nov 3, 6:42 PM · Platform Engineering, ops-eqiad, Operations, DC-Ops
Cmjohnson added a comment to T267065: eqiad: Server moves to free up space on 10g racks.

These are all 1G serves in 10G racks for row B

Tue, Nov 3, 6:36 PM · Platform Engineering, ops-eqiad, Operations, DC-Ops
Cmjohnson added a comment to T267065: eqiad: Server moves to free up space on 10g racks.

These are all 1G servers in 10G racks for row A

Tue, Nov 3, 6:32 PM · Platform Engineering, ops-eqiad, Operations, DC-Ops
Cmjohnson closed T265113: Memory issue on elastic1063 caused elasticsearch to be killed as Resolved.

I reseated all the DIMM and there were several. I am not getting any Dell h/w errors. Hopefully, the reseat and flea power drain will correct the issue. I am resolving this task. If the problem persists, please re-open and tag me.

Tue, Nov 3, 5:58 PM · ops-eqiad, Discovery-Search, Operations
Cmjohnson added a comment to T261405: db1139 memory errors on boot (issue continues after board change) 2020-08-27.

The mainboard arrived

Tue, Nov 3, 5:38 PM · Operations, DBA, ops-eqiad
Cmjohnson closed T267088: decommission db1091.eqiad.wmnet, a subtask of T225060: db1091 crashed, as Resolved.
Tue, Nov 3, 4:56 PM · Patch-For-Review, ops-eqiad, Operations, DBA
Cmjohnson closed T267088: decommission db1091.eqiad.wmnet as Resolved.

done

Tue, Nov 3, 4:56 PM · DC-Ops, ops-eqiad, Operations, decommission-hardware
Cmjohnson closed T267088: decommission db1091.eqiad.wmnet, a subtask of T258361: Refresh and decommission db1074-db1095 (22 servers), as Resolved.
Tue, Nov 3, 4:56 PM · Operations, DBA
Cmjohnson closed T267088: decommission db1091.eqiad.wmnet, a subtask of T258386: db1080-95 batch possibly suffering BBU issues, as Resolved.
Tue, Nov 3, 4:56 PM · Operations, DBA
Cmjohnson updated the task description for T267088: decommission db1091.eqiad.wmnet.
Tue, Nov 3, 4:51 PM · DC-Ops, ops-eqiad, Operations, decommission-hardware
Cmjohnson closed T266709: an-coord1001 ram upgrade as Resolved.
Tue, Nov 3, 4:16 PM · Reading Epics (Analytics), ops-eqiad, Operations
Cmjohnson updated the task description for T266709: an-coord1001 ram upgrade.
Tue, Nov 3, 4:15 PM · Reading Epics (Analytics), ops-eqiad, Operations

Mon, Nov 2

Cmjohnson updated subscribers of T260445: (Need By: TBD) rack/setup/install an-worker10[18-41].

@wiki_willy and @elukey I do not have enough 10G rack space to fit 24 2U servers, Currently, I have 17 2U spaces in 10G racks. This is all I have left for servers this size.

Mon, Nov 2, 5:16 PM · Analytics-Clusters, Operations, ops-eqiad, DC-Ops
Cmjohnson updated the task description for T260445: (Need By: TBD) rack/setup/install an-worker10[18-41].
Mon, Nov 2, 5:02 PM · Analytics-Clusters, Operations, ops-eqiad, DC-Ops
Cmjohnson added a comment to T266988: eqiad: Spare Drive Onsite for db1091.

@wiki_willy I do not have any spare SSDs that would match what is in that server now.

Mon, Nov 2, 4:45 PM · ops-eqiad, DC-Ops, Operations

Fri, Oct 30

Andrew awarded T263145: cloudvirt1033 psu redundancy alert a Like token.
Fri, Oct 30, 4:51 PM · Operations, cloud-services-team (Kanban), ops-eqiad
Cmjohnson assigned T265086: (Need By: ASAP) rack/setup/install frdb1004.frack.eqiad.wmnet to Jgreen.
Fri, Oct 30, 4:32 PM · fundraising-tech-ops, Operations
Cmjohnson added a comment to T265086: (Need By: ASAP) rack/setup/install frdb1004.frack.eqiad.wmnet.

@Jgreen all the on-site work has been completed. idrac password is a temporary password

Fri, Oct 30, 4:31 PM · fundraising-tech-ops, Operations
Cmjohnson updated the task description for T265086: (Need By: ASAP) rack/setup/install frdb1004.frack.eqiad.wmnet.
Fri, Oct 30, 4:30 PM · fundraising-tech-ops, Operations
Cmjohnson closed T265653: (Need By: TBD) setup/install deploy1002 as Resolved.

done

Fri, Oct 30, 4:22 PM · ops-eqiad, Operations, DC-Ops
Cmjohnson closed T265653: (Need By: TBD) setup/install deploy1002, a subtask of T265963: Replace production deployment servers and update them to Buster, as Resolved.
Fri, Oct 30, 4:22 PM · Patch-For-Review, Release-Engineering-Team, serviceops
Cmjohnson closed T263145: cloudvirt1033 psu redundancy alert as Resolved.

New PSU arrived and swapped. System reports healthy.

Fri, Oct 30, 4:18 PM · Operations, cloud-services-team (Kanban), ops-eqiad
Cmjohnson closed T266497: fix/replace cable ID 2648 on FB peering patch - cable report error as Resolved.

updated the cable number to 5226 20M.

Fri, Oct 30, 2:46 PM · ops-eqiad, DC-Ops, Operations
Cmjohnson closed T266497: fix/replace cable ID 2648 on FB peering patch - cable report error, a subtask of T265916: patch in FB peering into cr1-eqiad:xe-3/2/1, as Resolved.
Fri, Oct 30, 2:45 PM · netops, Operations, DC-Ops
Cmjohnson added a comment to T266164: eqiad: Physical moves for MediaWiki servers.

@Dzahn I moved mw1267 and 1268 to rack A8 and confirmed they're up. Updated netbox

Fri, Oct 30, 2:41 PM · Operations, serviceops, ops-eqiad, DC-Ops

Oct 29 2020

Cmjohnson moved T265653: (Need By: TBD) setup/install deploy1002 from Racking Tasks to Backlog on the ops-eqiad board.
Oct 29 2020, 8:08 PM · ops-eqiad, Operations, DC-Ops
Cmjohnson added a comment to T263145: cloudvirt1033 psu redundancy alert.

Called to open a ticket with Dell, they received the information and the TSR and are sending a new part

Oct 29 2020, 4:55 PM · Operations, cloud-services-team (Kanban), ops-eqiad
Cmjohnson added a comment to T253438: an-presto1004 down .

Spoke with Dell tech, Chris Bennet today. The ball was dropped by Dell, nobody ordered the new part and our case was left open and not owned by anyone. Today a new case for the backplane was opened and it's being elevated to L3 because it could be a safety issue since we did have smoke inside the server. This includes anything from a part replacement to a system exchange. Enterprise Service Request 84193619

Oct 29 2020, 4:51 PM · Analytics-Radar, Operations, ops-eqiad
Cmjohnson added a comment to T266709: an-coord1001 ram upgrade.

@elukey great, I usually get to the data center around 1500UTC

Oct 29 2020, 4:07 PM · Reading Epics (Analytics), Operations, ops-eqiad
Cmjohnson updated subscribers of T266709: an-coord1001 ram upgrade.

@elukey I have the 2 DIMM on-site. Does this need to be scheduled? If so can we schedule this for Tuesday 3 November? If not, let me know if I can take it down anytime.

Oct 29 2020, 3:47 PM · Reading Epics (Analytics), Operations, ops-eqiad

Oct 28 2020

Cmjohnson updated the task description for T265086: (Need By: ASAP) rack/setup/install frdb1004.frack.eqiad.wmnet.
Oct 28 2020, 6:34 PM · fundraising-tech-ops, Operations