Page MenuHomePhabricator
Feed Advanced Search

Sep 13 2019

Cmjohnson added a comment to T227335: backup1001 can't address the disk shelf's drives.

Actually we need to close this task and open a separate task about the
disk. Different issue should get a different task.

Sep 13 2019, 1:09 PM · ops-eqiad, SRE, DC-Ops

Sep 12 2019

Cmjohnson added a comment to T228606: Degraded RAID on elastic1046.

I did notice that ssds are different types
The new ssd is a DC3320 series
The old ssd is a DC3610 series

Sep 12 2019, 3:50 PM · Patch-For-Review, Discovery-Search (Current work), ops-eqiad, SRE
Cmjohnson added a comment to T228606: Degraded RAID on elastic1046.

@wiki_willy not really but I reseated it anyway. As far as I can tell in bios everything looks normal. I did swap the 2 disks. @Gehel try again please.

Sep 12 2019, 3:47 PM · Patch-For-Review, Discovery-Search (Current work), ops-eqiad, SRE

Sep 10 2019

Cmjohnson updated the task description for T230746: (Aug 30th, 2019) rack/setup/install elastic10[53-67].eqiad.wmnet.
Sep 10 2019, 1:12 PM · Discovery-Search (Current work), Patch-For-Review, SRE
Cmjohnson added a comment to T227541: b6-eqiad pdu refresh (Tuesday 9/10 @11am UTC).

The PDU has been swapped and the new pdus are in netbox. @RobH can you help with the setup for serial console please.

Sep 10 2019, 12:44 PM · DC-Ops, SRE, ops-eqiad
Cmjohnson updated the task description for T227541: b6-eqiad pdu refresh (Tuesday 9/10 @11am UTC).
Sep 10 2019, 12:42 PM · DC-Ops, SRE, ops-eqiad
Cmjohnson closed Unknown Object (Task), a subtask of T221636: Replace elastic1017-1031, as Resolved.
Sep 10 2019, 12:06 PM · Discovery-Search (Current work), SRE, hardware-requests
Cmjohnson closed Unknown Object (Task), a subtask of T219768: Get a third dumpsdata server, as Resolved.
Sep 10 2019, 12:05 PM · hardware-requests, SRE, Dumps-Generation

Sep 9 2019

Cmjohnson reassigned T227335: backup1001 can't address the disk shelf's drives from Cmjohnson to Jclark-ctr.

this got lost in the shuffle....will work on it this week . @Jclark-ctr can you contact HPE support and open a ticket please.

Sep 9 2019, 3:49 PM · ops-eqiad, SRE, DC-Ops

Sep 6 2019

Cmjohnson updated the task description for T228102: rack/setup/install cloudcephmon100[123].
Sep 6 2019, 5:48 PM · cloud-services-team (Kanban), SRE, Cloud-Services

Sep 5 2019

Cmjohnson added a comment to T225128: Move cloudvirtan* hardware out of CloudVPS back into production Analytics VLAN..

@Ottomata the on-site work is done, They will need updated production DNS but all are moved and connected.

Sep 5 2019, 7:31 PM · Analytics-Kanban, ops-eqiad, SRE, netops, Analytics
Cmjohnson reassigned T229871: relocate/reimage cloudvirt1023 with 10G interfaces from Cmjohnson to Andrew.

@Andrew the new mac is in an earlier update. The server is moved, connected to the new port and raid cfg completed...needs the dhcp file updated and ready for you to re-image.

Sep 5 2019, 7:16 PM · ops-eqiad, DC-Ops, SRE, cloud-services-team (Kanban)
Cmjohnson updated the task description for T229871: relocate/reimage cloudvirt1023 with 10G interfaces.
Sep 5 2019, 7:15 PM · ops-eqiad, DC-Ops, SRE, cloud-services-team (Kanban)
Cmjohnson added a comment to T229871: relocate/reimage cloudvirt1023 with 10G interfaces.

B0:26:28:29:6A:E0

Sep 5 2019, 7:01 PM · ops-eqiad, DC-Ops, SRE, cloud-services-team (Kanban)
Cmjohnson reassigned T229872: relocate/reimage cloudvirt1022 with 10G interfaces from Cmjohnson to Andrew.

@Andrew this is ready for you to re-image

Sep 5 2019, 6:50 PM · ops-eqiad, DC-Ops, SRE, cloud-services-team (Kanban)
Cmjohnson updated the task description for T229872: relocate/reimage cloudvirt1022 with 10G interfaces.
Sep 5 2019, 6:50 PM · ops-eqiad, DC-Ops, SRE, cloud-services-team (Kanban)
Cmjohnson reassigned T229873: relocate/reimage cloudvirt1021 with 10G interfaces from Cmjohnson to Andrew.

@Andrew this is ready for you to re-image

Sep 5 2019, 6:49 PM · ops-eqiad, DC-Ops, SRE, cloud-services-team (Kanban)
Cmjohnson updated the task description for T229873: relocate/reimage cloudvirt1021 with 10G interfaces.
Sep 5 2019, 6:49 PM · ops-eqiad, DC-Ops, SRE, cloud-services-team (Kanban)

Sep 4 2019

Cmjohnson added a comment to T228102: rack/setup/install cloudcephmon100[123].

@Jclark-ctr Please set up the idrac and add the mgmt dns. Let me know if you have any issues or questions. I also need the switch ports.

Sep 4 2019, 12:10 AM · cloud-services-team (Kanban), SRE, Cloud-Services

Sep 3 2019

Cmjohnson added a comment to T225128: Move cloudvirtan* hardware out of CloudVPS back into production Analytics VLAN..

@Ottomata All the servers are moved and all of them but cloudvirtan1003 are connected to the switch in the correct vlan. @Jclark-ctr if you are still around can you verify that cloudvirtan is connected to switch in rack d7 xe-7/0/20, please.

Sep 3 2019, 11:58 PM · Analytics-Kanban, ops-eqiad, SRE, netops, Analytics
Cmjohnson added a comment to T225128: Move cloudvirtan* hardware out of CloudVPS back into production Analytics VLAN..

@Ottomata Do you still need the 2nd port now that you're not doing the cloud thing? If so which vlan?

Sep 3 2019, 5:38 PM · Analytics-Kanban, ops-eqiad, SRE, netops, Analytics

Aug 30 2019

Cmjohnson added a comment to T231638: db1074 crashed: Broken BBU.

@wiki_willy negative, we do not have any spare BBUs lying around.

Aug 30 2019, 5:25 PM · ops-eqiad, SRE, DBA
Cmjohnson added a comment to T230289: Degraded RAID on cloudvirt1024 -- Filesystem mounted read-only.

updated the idrac and raid f/w

Aug 30 2019, 5:08 PM · Patch-For-Review, cloud-services-team, ops-eqiad, SRE

Aug 29 2019

Cmjohnson reassigned T228102: rack/setup/install cloudcephmon100[123] from RobH to Jclark-ctr.

@Jclark-ctr please rack 1 each in B2/B4/B7 please and update netbox

Aug 29 2019, 4:41 PM · cloud-services-team (Kanban), SRE, Cloud-Services
Cmjohnson added a comment to T224188: rack/setup/install (3) new osd ceph nodes.

@Jclark-ctr please rack 1 each in B2/B4/B7 please and update netbox

Aug 29 2019, 4:40 PM · SRE, cloud-services-team (Kanban), Cloud-Services
Cmjohnson moved T231525: cp1085 - IPMI not working from Procurement to Hardware Failure / Troubleshoot on the ops-eqiad board.
Aug 29 2019, 4:36 PM · ops-eqiad, Traffic, SRE
Cmjohnson moved T230746: (Aug 30th, 2019) rack/setup/install elastic10[53-67].eqiad.wmnet from Backlog to Racking Tasks on the ops-eqiad board.
Aug 29 2019, 4:36 PM · Discovery-Search (Current work), Patch-For-Review, SRE
Cmjohnson moved T231525: cp1085 - IPMI not working from Backlog to Procurement on the ops-eqiad board.
Aug 29 2019, 4:35 PM · ops-eqiad, Traffic, SRE
Cmjohnson added a comment to T231525: cp1085 - IPMI not working.

looks like the mgmt is locked out and this server will require a hard reboot and flea power drain. please let me know when it's safe to turn the server off for 5-10 mins.

Aug 29 2019, 4:28 PM · ops-eqiad, Traffic, SRE

Aug 27 2019

Cmjohnson reassigned T225128: Move cloudvirtan* hardware out of CloudVPS back into production Analytics VLAN. from Cmjohnson to Jclark-ctr.

@Jclark-ctr Can you move these servers as evenly as you can into rows B2/B4 and B7, cable with 10G DAC cables and the mgmt cable please and update netbox and this task with their location and the port numbers you connected the servers.

Aug 27 2019, 8:00 PM · Analytics-Kanban, ops-eqiad, SRE, netops, Analytics
Cmjohnson moved T225128: Move cloudvirtan* hardware out of CloudVPS back into production Analytics VLAN. from Cloud Tasks to Hardware Failure / Troubleshoot on the ops-eqiad board.
Aug 27 2019, 7:57 PM · Analytics-Kanban, ops-eqiad, SRE, netops, Analytics
Cmjohnson added a comment to T229871: relocate/reimage cloudvirt1023 with 10G interfaces.

@Andrew This server will require a physical move to B2, B4 or B7. I will do this one last, working on cabling 1021/1022 and updating the raid cfg so you can re-image

Aug 27 2019, 7:57 PM · ops-eqiad, DC-Ops, SRE, cloud-services-team (Kanban)
Cmjohnson reassigned T229872: relocate/reimage cloudvirt1022 with 10G interfaces from Cmjohnson to Jclark-ctr.

@Jclark-ctr Can you run 10G DAC cables in rack B7. Connect to the 10G ports on the server but do not plug into the switch. Be sure to label each cable.

Aug 27 2019, 7:56 PM · ops-eqiad, DC-Ops, SRE, cloud-services-team (Kanban)
Cmjohnson reassigned T229873: relocate/reimage cloudvirt1021 with 10G interfaces from Cmjohnson to Jclark-ctr.

@Jclark-ctr Can you run 10G DAC cables in rack B4. Connect to the 10G ports on the server but do not plug into the switch. Be sure to label each cable.

Aug 27 2019, 7:55 PM · ops-eqiad, DC-Ops, SRE, cloud-services-team (Kanban)
Cmjohnson moved T231199: Degraded RAID on db1063 from Backlog to Hardware Failure / Troubleshoot on the ops-eqiad board.
Aug 27 2019, 7:51 PM · DBA, ops-eqiad, SRE
Cmjohnson reassigned T230575: Degraded RAID on cloudvirt1018 from Cmjohnson to wiki_willy.

The reason for the task being declined. I verified that the failed disk is indeed 1.9TB but is a SSD. The original order and showing on the disk caddy label is for an Intel 1.6TB SSD S3610. Assigning to @wiki_willy

Aug 27 2019, 7:51 PM · ops-eqiad, SRE
Cmjohnson added a comment to T231199: Degraded RAID on db1063.

@Marostegui Replaced the disk with one of the few remaining used spares. I did notice 2 more disks are starting to fail....you may want to speed up the decom process.

Aug 27 2019, 7:50 PM · DBA, ops-eqiad, SRE

Aug 23 2019

Cmjohnson added a comment to T228606: Degraded RAID on elastic1046.

I replaced the failed disk

Aug 23 2019, 3:31 PM · Patch-For-Review, Discovery-Search (Current work), ops-eqiad, SRE
Cmjohnson added a comment to T230575: Degraded RAID on cloudvirt1018.

The ticket was declined by Dell....stating that the disk we have installed are not original to the server. this requires me to investigate

Aug 23 2019, 2:58 PM · ops-eqiad, SRE
Cmjohnson closed T220853: VMs on cloudvirt1015 crashing - bad mainboard/memory as Resolved.

Finished the idrac setup. on-site work is complete

Aug 23 2019, 2:57 PM · cloud-services-team (Kanban), SRE, ops-eqiad, DC-Ops, User-Zppix

Aug 22 2019

Cmjohnson updated the task description for T221818: Decommission labnet1001 & labnet1002.
Aug 22 2019, 5:32 PM · Patch-For-Review, ops-eqiad, decommission-hardware, SRE
Cmjohnson added a comment to T221818: Decommission labnet1001 & labnet1002.

@Jclark-ctr did you add this to the tracking sheet?

Aug 22 2019, 5:28 PM · Patch-For-Review, ops-eqiad, decommission-hardware, SRE

Aug 21 2019

Cmjohnson added a comment to T220853: VMs on cloudvirt1015 crashing - bad mainboard/memory.

Board arrived DOA...need another one

Aug 21 2019, 6:17 PM · cloud-services-team (Kanban), SRE, ops-eqiad, DC-Ops, User-Zppix
Cmjohnson added a comment to T230289: Degraded RAID on cloudvirt1024 -- Filesystem mounted read-only.

The disk was replaced but from what I can tell is that the raid configuration is not accepting the new disk. When I am in the raid utility it shows that all the disks are good but the raid is missing a disk. This may need the raid config manually updated and a re-install. Let me know

Aug 21 2019, 5:15 PM · Patch-For-Review, cloud-services-team, ops-eqiad, SRE
Cmjohnson added a comment to T230289: Degraded RAID on cloudvirt1024 -- Filesystem mounted read-only.

@Bstorm can you try rebooting the server and see if the disks get back to the correct order. I know that works for analytics. Please try that...i do have a disk but I'm not sure which disk is bad

Aug 21 2019, 4:56 PM · Patch-For-Review, cloud-services-team, ops-eqiad, SRE

Aug 20 2019

Cmjohnson raised the priority of T221818: Decommission labnet1001 & labnet1002 from Medium to High.
Aug 20 2019, 7:13 PM · Patch-For-Review, ops-eqiad, decommission-hardware, SRE
Cmjohnson reassigned T221818: Decommission labnet1001 & labnet1002 from Cmjohnson to Jclark-ctr.

@Jclark-ctr Please wipe and remove these servers from the rack and update the task -- assign it back to me once done please.

Aug 20 2019, 7:13 PM · Patch-For-Review, ops-eqiad, decommission-hardware, SRE
Cmjohnson reassigned T189921: decom californium from Cmjohnson to Jclark-ctr.

Can you wipe this server and remove from the rack as soon as you can. Need the space.

Aug 20 2019, 7:02 PM · Patch-For-Review, ops-eqiad, decommission-hardware, DC-Ops, SRE
Cmjohnson added a comment to T217556: Decommission old eqiad logstash hardware hosts logstash100[456].

@Jclark-ctr has this ben done? We need the space in rack B2 so please make this a priority item. Thanks!

Aug 20 2019, 6:44 PM · observability, decommission-hardware, ops-eqiad, User-herron, SRE, Wikimedia-Logstash
Cmjohnson raised the priority of T220505: Decommission iron from Medium to High.
Aug 20 2019, 6:43 PM · Cloud-VPS, ops-eqiad, decommission-hardware, SRE
Cmjohnson updated the task description for T220505: Decommission iron.
Aug 20 2019, 6:43 PM · Cloud-VPS, ops-eqiad, decommission-hardware, SRE
Cmjohnson moved T228956: decommission db1072.eqiad.wmnet from Backlog to Decommission on the ops-eqiad board.
Aug 20 2019, 6:36 PM · Patch-For-Review, DC-Ops, ops-eqiad, decommission-hardware, SRE
Cmjohnson added a comment to T227025: (Need By: August 31) rack/setup/install (3) new zookeeper nodes.

@elukey the site specific portion is complete if you want to take over from here

Aug 20 2019, 3:14 PM · User-Elukey, SRE, ops-eqiad
Cmjohnson updated the task description for T227025: (Need By: August 31) rack/setup/install (3) new zookeeper nodes.
Aug 20 2019, 3:13 PM · User-Elukey, SRE, ops-eqiad
Cmjohnson moved T230682: Degraded RAID on db1063 from Backlog to Hardware Failure / Troubleshoot on the ops-eqiad board.
Aug 20 2019, 2:50 PM · DBA, ops-eqiad, SRE
Cmjohnson added a comment to T230682: Degraded RAID on db1063.

@Marostegui I had a used disk on-site and replace it....it's currently in rebuild

Aug 20 2019, 2:50 PM · DBA, ops-eqiad, SRE
Cmjohnson added a comment to T229452: db1114 crashed due to memory issues (server under warranty).

Swapped the DIMM B3 with A3 and B7 with A7. Powered on and cleared log. Let's see if the errors return or change,

Aug 20 2019, 2:46 PM · ops-eqiad, DBA, SRE
Cmjohnson added a comment to T230289: Degraded RAID on cloudvirt1024 -- Filesystem mounted read-only.

A ticket has been placed with Dell

Aug 20 2019, 2:37 PM · Patch-For-Review, cloud-services-team, ops-eqiad, SRE
Cmjohnson added a comment to T230575: Degraded RAID on cloudvirt1018.

Another ticket has been placed with Dell

Aug 20 2019, 2:37 PM · ops-eqiad, SRE
Cmjohnson moved T230575: Degraded RAID on cloudvirt1018 from Backlog to Cloud Tasks on the ops-eqiad board.
Aug 20 2019, 2:24 PM · ops-eqiad, SRE

Aug 16 2019

Cmjohnson added a comment to T220853: VMs on cloudvirt1015 crashing - bad mainboard/memory.

Dell approved my ticket. I talked to the technician today and he will be
out Monday morning to replace the motherboard.

Aug 16 2019, 2:57 PM · cloud-services-team (Kanban), SRE, ops-eqiad, DC-Ops, User-Zppix

Aug 15 2019

Cmjohnson added a comment to T230518: elastic1017 lost network after reboot.

I will add that this server is out of warranty and would require a motherboard replacement if it is the nic. We typically do not do this after the warranty period and the host should be decommissioned.

Aug 15 2019, 5:42 PM · decommission-hardware, ops-eqiad, DC-Ops, SRE, Discovery-Search (Current work)
Cmjohnson moved T230518: elastic1017 lost network after reboot from Backlog to Hardware Failure / Troubleshoot on the ops-eqiad board.
Aug 15 2019, 5:33 PM · decommission-hardware, ops-eqiad, DC-Ops, SRE, Discovery-Search (Current work)
Cmjohnson added a comment to T230518: elastic1017 lost network after reboot.
  • I checked the network switch and the port shows up/up meaning that link from the server to the network switch is up

ge-3/0/17 up up elastic1017

Aug 15 2019, 5:32 PM · decommission-hardware, ops-eqiad, DC-Ops, SRE, Discovery-Search (Current work)
Cmjohnson updated the task description for T226778: Install new PDUs in rows A/B (Top level tracking task).
Aug 15 2019, 3:18 PM · DC-Ops, SRE, ops-eqiad
Cmjohnson moved T230088: cloudelastic1002: SMART/disk error from Backlog to Cloud Tasks on the ops-eqiad board.
Aug 15 2019, 3:04 PM · ops-eqiad, DC-Ops, SRE, cloud-services-team (Kanban)
Cmjohnson moved T230289: Degraded RAID on cloudvirt1024 -- Filesystem mounted read-only from Backlog to Cloud Tasks on the ops-eqiad board.
Aug 15 2019, 3:04 PM · Patch-For-Review, cloud-services-team, ops-eqiad, SRE
Cmjohnson moved T230442: Degraded RAID on cloudvirt1024 -- Filesystem mounted read-only from Backlog to Cloud Tasks on the ops-eqiad board.
Aug 15 2019, 3:03 PM · ops-eqiad, SRE
Cmjohnson moved T220853: VMs on cloudvirt1015 crashing - bad mainboard/memory from Hardware Failure / Troubleshoot to Cloud Tasks on the ops-eqiad board.
Aug 15 2019, 3:03 PM · cloud-services-team (Kanban), SRE, ops-eqiad, DC-Ops, User-Zppix
Cmjohnson added a comment to T229452: db1114 crashed due to memory issues (server under warranty).

@Marostegui I see a potential issue with B3 as well. I will need to do a DIMM swap A -> B side and see if the errors stay with the DIMM or are the CPU. Let's schedule this for early next week, please. Tuesday 1400UTC?

Aug 15 2019, 3:02 PM · ops-eqiad, DBA, SRE
Cmjohnson updated the task description for T224188: rack/setup/install (3) new osd ceph nodes.
Aug 15 2019, 12:51 AM · SRE, cloud-services-team (Kanban), Cloud-Services
Cmjohnson added a comment to T224188: rack/setup/install (3) new osd ceph nodes.

cloudcephosd1001 10.65.2.177
cloudcephosd1002 10.65.2.178
cloudcephosd1003 10.65.2.179

Aug 15 2019, 12:51 AM · SRE, cloud-services-team (Kanban), Cloud-Services
Cmjohnson added a comment to T220853: VMs on cloudvirt1015 crashing - bad mainboard/memory.

Submitted the ticket with Dell. We will see what happens

Aug 15 2019, 12:37 AM · cloud-services-team (Kanban), SRE, ops-eqiad, DC-Ops, User-Zppix

Aug 14 2019

Cmjohnson reassigned T228102: rack/setup/install cloudcephmon100[123] from RobH to Jclark-ctr.

@Jclark-ctr can you add asset tags and enter these servers into Netbox (T222916 is the procurement task). Leave them on the floor and the rack information blank in netbox until we know for sure where they're going. Once done, please re-assign back to Rob

Aug 14 2019, 3:06 PM · cloud-services-team (Kanban), SRE, Cloud-Services
Cmjohnson reassigned T224188: rack/setup/install (3) new osd ceph nodes from RobH to Jclark-ctr.

@Jclark-ctr can you add asset tags and enter these servers into Netbox (T221698 is the procurement task). Leave them on the floor and the rack information blank in netbox until we know for sure where they're going. Once done, please re-assign back to Rob

Aug 14 2019, 3:05 PM · SRE, cloud-services-team (Kanban), Cloud-Services
Cmjohnson added a comment to T227025: (Need By: August 31) rack/setup/install (3) new zookeeper nodes.

+an-conf1001 1H IN A 10.65.5.118
+an-conf1002 1H IN A 10.65.5.119
+an-conf1003 1H IN A 10.65.5.120

Aug 14 2019, 2:56 PM · User-Elukey, SRE, ops-eqiad
Cmjohnson updated the task description for T227025: (Need By: August 31) rack/setup/install (3) new zookeeper nodes.
Aug 14 2019, 2:55 PM · User-Elukey, SRE, ops-eqiad
Cmjohnson reassigned T230458: hw troubleshooting: power supply for db1129 from Cmjohnson to Jclark-ctr.

Please check to make sure that the power cables are fully seated. Update the task and let me know if I need to order a new PSU.

Aug 14 2019, 1:59 PM · DBA, SRE, ops-eqiad, DC-Ops
Cmjohnson added a comment to T228926: rack/setup/instal (4) CI ganeti nodes.

ganeti1019 10.65.5.114
ganeti1020 10.65.5.115
ganeti1021 10.65.5.116
ganeti1022 10.65.5.117

Aug 14 2019, 1:57 PM · SRE
Cmjohnson updated the task description for T228926: rack/setup/instal (4) CI ganeti nodes.
Aug 14 2019, 1:44 PM · SRE
Cmjohnson added a comment to T228924: rack/setup/install ganeti10([09]|1[0-8]).eqiad.wmnet.

@Jclark-ctr Mgmt IP's that need to be setup on the idrac

Aug 14 2019, 1:41 PM · serviceops, SRE

Aug 13 2019

Cmjohnson closed Unknown Object (Task), a subtask of T222950: (OoW) cloudvirt1006 - RAID battery failed, as Resolved.
Aug 13 2019, 4:44 PM · cloud-services-team (Hardware), User-jbond, ops-eqiad, SRE
Cmjohnson reassigned T228926: rack/setup/instal (4) CI ganeti nodes from akosiaris to Jclark-ctr.
Aug 13 2019, 4:37 PM · SRE
Cmjohnson updated subscribers of T228926: rack/setup/instal (4) CI ganeti nodes.

@Jclark-ctr Please rack 4 of the servers from the same ganeti stack in row D and label them as ganeti1019-1022. Please update netbox, and provide access switch port info.

Aug 13 2019, 4:37 PM · SRE
Cmjohnson closed T229156: Degraded RAID on cloudvirt1018 as Resolved.

Disks replaced, please re-open an ping me if the disk fails

Aug 13 2019, 3:25 PM · cloud-services-team (Kanban), ops-eqiad, SRE
Cmjohnson reassigned T228924: rack/setup/install ganeti10([09]|1[0-8]).eqiad.wmnet from akosiaris to Jclark-ctr.

Please rack, label and cable these servers with the racking locations above. Add them to netbox, be sure to make sure status is set to planned and asset tag/SN is ALL CAPS. Please update the task with which network ports each server is attached to on the access switch.

Aug 13 2019, 2:41 PM · serviceops, SRE
Cmjohnson updated the task description for T228924: rack/setup/install ganeti10([09]|1[0-8]).eqiad.wmnet.
Aug 13 2019, 2:36 PM · serviceops, SRE

Aug 8 2019

Cmjohnson added a comment to T229134: Degraded RAID on sulfur.

This doesn't really tell me anything about the bad disk. I am not able to ssh into the host for more details. I will create a ticket and hope that there is something Dell can use in their TSR

Aug 8 2019, 3:23 PM · ops-eqiad, SRE
Cmjohnson moved T229156: Degraded RAID on cloudvirt1018 from Backlog to Hardware Failure / Troubleshoot on the ops-eqiad board.
Aug 8 2019, 3:18 PM · cloud-services-team (Kanban), ops-eqiad, SRE
Cmjohnson added a comment to T229156: Degraded RAID on cloudvirt1018.

The ticket was approved. the new ssd should arrive today or tomorrow

Aug 8 2019, 3:18 PM · cloud-services-team (Kanban), ops-eqiad, SRE
Cmjohnson moved T229381: decommission db1071.eqiad.wmnet from Backlog to Decommission on the ops-eqiad board.
Aug 8 2019, 3:17 PM · Patch-For-Review, DC-Ops, ops-eqiad, decommission-hardware, SRE
Cmjohnson moved T229452: db1114 crashed due to memory issues (server under warranty) from Backlog to Hardware Failure / Troubleshoot on the ops-eqiad board.
Aug 8 2019, 3:17 PM · ops-eqiad, DBA, SRE
Cmjohnson moved T229612: asw2-c-eqiad:xe-2/0/45 inbound interface errors from Backlog to Hardware Failure / Troubleshoot on the ops-eqiad board.
Aug 8 2019, 3:17 PM · netops, ops-eqiad, SRE
Cmjohnson moved T229586: decommission cp1008, cp1071, cp1072, cp1073, cp1074, cp1099 from Backlog to Decommission on the ops-eqiad board.
Aug 8 2019, 3:17 PM · ops-eqiad, decommission-hardware, SRE
Cmjohnson moved T229871: relocate/reimage cloudvirt1023 with 10G interfaces from Backlog to Cloud Tasks on the ops-eqiad board.
Aug 8 2019, 3:16 PM · ops-eqiad, DC-Ops, SRE, cloud-services-team (Kanban)
Cmjohnson moved T229872: relocate/reimage cloudvirt1022 with 10G interfaces from Backlog to Cloud Tasks on the ops-eqiad board.
Aug 8 2019, 3:16 PM · ops-eqiad, DC-Ops, SRE, cloud-services-team (Kanban)
Cmjohnson moved T229873: relocate/reimage cloudvirt1021 with 10G interfaces from Backlog to Cloud Tasks on the ops-eqiad board.
Aug 8 2019, 3:16 PM · ops-eqiad, DC-Ops, SRE, cloud-services-team (Kanban)
Cmjohnson added a comment to T227335: backup1001 can't address the disk shelf's drives.

This is odd, I am not getting a link light on the raid controller connections.

Aug 8 2019, 3:16 PM · ops-eqiad, SRE, DC-Ops
Cmjohnson updated the task description for T217556: Decommission old eqiad logstash hardware hosts logstash100[456].
Aug 8 2019, 3:08 PM · observability, decommission-hardware, ops-eqiad, User-herron, SRE, Wikimedia-Logstash
Cmjohnson reassigned T217556: Decommission old eqiad logstash hardware hosts logstash100[456] from Cmjohnson to Jclark-ctr.

@Jclark-ctr Please wipe logstash1004 and 1005 and then remove from rack and update netbox and the google tracking sheet.
https://docs.google.com/spreadsheets/d/1JhjeV3cXfIzIyekJrnA2nNFFDGTT4SeLmyAFvDa4HmA/edit#gid=2026042311

Aug 8 2019, 3:07 PM · observability, decommission-hardware, ops-eqiad, User-herron, SRE, Wikimedia-Logstash