Page MenuHomePhabricator
Feed Advanced Search

Jul 12 2019

Cmjohnson moved T224475: Return sulfur to spares from Backlog to Lower Priority Items on the ops-eqiad board.
Jul 12 2019, 12:43 AM · SRE, decommission-hardware, ops-eqiad
Cmjohnson moved T220700: Upgrade kafka-jumbo100[1-6] to 10G NICs (if possible) from Stalled to Blocked on the ops-eqiad board.
Jul 12 2019, 12:35 AM · Analytics-Radar, ops-eqiad, hardware-requests, SRE, User-Elukey
Cmjohnson moved T226599: (OoW) Degraded RAID on analytics1039 from Stalled to Blocked on the ops-eqiad board.
Jul 12 2019, 12:35 AM · ops-eqiad, SRE
Cmjohnson moved T196487: Upgrade eqiad rack D4 to 10G switch from Up next to High Priority Task on the ops-eqiad board.
Jul 12 2019, 12:34 AM · User-jijiki, User-Kormat, ops-eqiad, netops, SRE
Cmjohnson moved T206185: connect atlas-ulsfo to scs-ulsfo from Up next to High Priority Task on the ops-eqiad board.
Jul 12 2019, 12:34 AM · DC-Ops, SRE, ops-ulsfo, ops-eqiad

Jul 11 2019

Cmjohnson moved T222950: (OoW) cloudvirt1006 - RAID battery failed from Blocked to Cloud Tasks on the ops-eqiad board.
Jul 11 2019, 11:54 PM · cloud-services-team (Hardware), User-jbond, ops-eqiad, SRE
Cmjohnson closed T215411: (OoW) thumbor1004 memory errors as Declined.

Declining the task since the server is out of warranty.

Jul 11 2019, 11:35 PM · User-jijiki, Thumbor, ops-eqiad, serviceops, SRE
Cmjohnson closed T193628: (OoW) tungsten disk 1 and 8 SMART failure as Declined.

Since there is no need to replace these disks...declining the task

Jul 11 2019, 11:07 PM · Performance-Team (Radar), ops-eqiad, SRE
Cmjohnson added a comment to T218751: Audit down ports.

All items on rows A, B an C have been updated. Row D will need some on-site verification

Jul 11 2019, 10:47 PM · DC-Ops, SRE, ops-eqiad
Cmjohnson closed T209861: cloudvirt1007 predicted raid failure as Resolved.

This was completed awhile ago...never updated task

Jul 11 2019, 10:22 PM · SRE, ops-eqiad, DC-Ops, cloud-services-team (Kanban)
Cmjohnson added a comment to T224794: Degraded RAID on helium.
Jul 11 2019, 7:23 PM · ops-eqiad, SRE
Cmjohnson moved T226467: Broken disk on analytics1072 from Backlog to Hardware Failure / Troubleshoot on the ops-eqiad board.
Jul 11 2019, 7:10 PM · Analytics-Radar, ops-eqiad, SRE
Cmjohnson added a comment to T226467: Broken disk on analytics1072.

@elukey I am not sure which disk this? I think it's a smaller ssd? Can you confirm the disk type and size please ?

Jul 11 2019, 7:10 PM · Analytics-Radar, ops-eqiad, SRE
Cmjohnson added a comment to T222050: db1107 (eventlogging db master) possibly memory issues.

I still need to move the DIMM around ...I need the server taken down. If this needs to be scheduled, please let me know when you can have the server down?

Jul 11 2019, 6:59 PM · Analytics, SRE, ops-eqiad, MediaWiki-extensions-EventLogging, DBA
Cmjohnson added a comment to T222960: Fix restbase1017's physical rack.

@Dzahn, I need to know I don't know what that means? What does DC-ops need to troubleshoot? Thanks

Jul 11 2019, 6:02 PM · Patch-For-Review, serviceops, Platform Team Workboards (Team 2), SRE, Services (doing), Platform Engineering (Needs Cleaning - Security, stability, performance, and scalability (TEC1)), User-Eevans, Cassandra
Cmjohnson removed a project from T201342: rack/setup/install puppetmaster1003.eqiad.wmnet: ops-eqiad.

I am removing the ops-eqiad tag, if you onsite work is still required please add the ops-eqiad tag.

Jul 11 2019, 5:56 PM · SRE
Cmjohnson closed T204491: Heating alerts / memory errors on mw1254 as Resolved.
Jul 11 2019, 5:55 PM · serviceops, SRE, ops-eqiad
Cmjohnson removed a project from T184293: rack/setup/install lvs101[3-6]: ops-eqiad.

I am removing the ops-eqiad tag on this task, if you need additional dc ops work please add the tag back.

Jul 11 2019, 5:55 PM · SRE, Traffic
Cmjohnson closed T209139: (OoW) Broken memory on mw1239 as Declined.

The server is out of warranty and I do not have any spare DIMM

Jul 11 2019, 5:49 PM · ops-eqiad, SRE
Cmjohnson closed T220853: VMs on cloudvirt1015 crashing - bad mainboard/memory as Resolved.

I am resolving this task

Jul 11 2019, 5:47 PM · cloud-services-team (Kanban), SRE, ops-eqiad, DC-Ops, User-Zppix
Cmjohnson added a comment to T222960: Fix restbase1017's physical rack.

I have removed the ops-eqiad tag, if you have an issue that required DC ops please add the ops-eqiad tag back to the task.

Jul 11 2019, 5:44 PM · Patch-For-Review, serviceops, Platform Team Workboards (Team 2), SRE, Services (doing), Platform Engineering (Needs Cleaning - Security, stability, performance, and scalability (TEC1)), User-Eevans, Cassandra
Cmjohnson removed a project from T222960: Fix restbase1017's physical rack: ops-eqiad.
Jul 11 2019, 5:38 PM · Patch-For-Review, serviceops, Platform Team Workboards (Team 2), SRE, Services (doing), Platform Engineering (Needs Cleaning - Security, stability, performance, and scalability (TEC1)), User-Eevans, Cassandra

Jul 9 2019

Cmjohnson added a comment to T222960: Fix restbase1017's physical rack.

@Eevans We did a test run for an install and the server was able to reach the installer without an issue. I did see on IRC something about stretch. I will leave that up to you if you like and the server can be installed whenever you need it.

Jul 9 2019, 6:36 PM · Patch-For-Review, serviceops, Platform Team Workboards (Team 2), SRE, Services (doing), Platform Engineering (Needs Cleaning - Security, stability, performance, and scalability (TEC1)), User-Eevans, Cassandra
Cmjohnson added a comment to T222960: Fix restbase1017's physical rack.

restbase1017 has been moved to rack B5
network port updated
DNS updated

Jul 9 2019, 3:08 PM · Patch-For-Review, serviceops, Platform Team Workboards (Team 2), SRE, Services (doing), Platform Engineering (Needs Cleaning - Security, stability, performance, and scalability (TEC1)), User-Eevans, Cassandra

Jul 3 2019

Cmjohnson added a comment to T225128: Move cloudvirtan* hardware out of CloudVPS back into production Analytics VLAN..

@Ottomata
Please decommission the current servers to spare role
Please provide the new hostnames you want to use
These are all located in row B...will that be okay or do you need them spread out across the rows?

Jul 3 2019, 7:38 PM · Analytics-Kanban, ops-eqiad, SRE, netops, Analytics
Cmjohnson updated the task description for T226274: (Need By: June 30) rack/setup/install kafka-main100[1-5].
Jul 3 2019, 6:14 PM · User-herron, SRE

Jun 27 2019

Cmjohnson moved T224794: Degraded RAID on helium from Backlog to Hardware Failure / Troubleshoot on the ops-eqiad board.
Jun 27 2019, 4:29 PM · ops-eqiad, SRE
Cmjohnson added a comment to T222960: Fix restbase1017's physical rack.

@Eevans Do you still want to move this server? Let's coordinate a day/time

Jun 27 2019, 4:29 PM · Patch-For-Review, serviceops, Platform Team Workboards (Team 2), SRE, Services (doing), Platform Engineering (Needs Cleaning - Security, stability, performance, and scalability (TEC1)), User-Eevans, Cassandra
Cmjohnson moved T225121: (Need by: 2019-09-30) upgrade msw1-eqiad from EX4200 to EX4300 from Backlog to Racking Tasks on the ops-eqiad board.
Jun 27 2019, 4:28 PM · netops, SRE, ops-eqiad
Cmjohnson moved T225128: Move cloudvirtan* hardware out of CloudVPS back into production Analytics VLAN. from Backlog to Cloud Tasks on the ops-eqiad board.
Jun 27 2019, 4:28 PM · Analytics-Kanban, ops-eqiad, SRE, netops, Analytics
Cmjohnson moved T226188: relocate/reimage cloudvirt1014 with 10G interfaces from Backlog to Cloud Tasks on the ops-eqiad board.
Jun 27 2019, 4:28 PM · DC-Ops, SRE, cloud-services-team (Kanban)
Cmjohnson moved T226382: Hardware Request: puppet master eqiad from Backlog to Blocked on the ops-eqiad board.
Jun 27 2019, 4:27 PM · SRE, ops-eqiad, DC-Ops
Cmjohnson assigned T226382: Hardware Request: puppet master eqiad to RobH.

Assigning this to @RobH he is able to allocate a spare

Jun 27 2019, 4:27 PM · SRE, ops-eqiad, DC-Ops
Cmjohnson moved T226517: Decommission old Kafka analytics brokers: kafka1012,kafka1013,kafka1014,kafka1020,kafka1022,kafka1023 from UnRacking Tasks to Decommission on the ops-eqiad board.
Jun 27 2019, 4:25 PM · Analytics-Radar, ops-eqiad, DC-Ops, decommission-hardware, SRE
Cmjohnson moved T226517: Decommission old Kafka analytics brokers: kafka1012,kafka1013,kafka1014,kafka1020,kafka1022,kafka1023 from Backlog to UnRacking Tasks on the ops-eqiad board.
Jun 27 2019, 4:25 PM · Analytics-Radar, ops-eqiad, DC-Ops, decommission-hardware, SRE
Cmjohnson closed T226569: Degraded RAID on db1072 as Resolved.

@Marostegui disk swapped but this server is out of warranty. I would suggest moving masters to new servers.

Jun 27 2019, 4:24 PM · DBA, ops-eqiad, SRE
Cmjohnson moved T226599: (OoW) Degraded RAID on analytics1039 from Backlog to Stalled on the ops-eqiad board.
Jun 27 2019, 4:20 PM · ops-eqiad, SRE
Cmjohnson moved T226689: decommission db1068 from Backlog to Decommission on the ops-eqiad board.
Jun 27 2019, 4:19 PM · DC-Ops, ops-eqiad, decommission-hardware, SRE

Jun 24 2019

Cmjohnson closed Unknown Object (Task), a subtask of T90364: Test Ceph for instance storage, as Resolved.
Jun 24 2019, 5:25 PM · Sustainability (Incident Followup), Epic, Goal, cloud-services-team (Kanban), Cloud-Services

Jun 19 2019

Cmjohnson added a comment to T224188: rack/setup/install (3) new osd ceph nodes.

@Bstorm @ayounsi I will need very clear instructions on which racks/rows these servers can go in before I physically rack and cable. Once that is figured out please update the task.

Jun 19 2019, 4:57 PM · SRE, cloud-services-team (Kanban), Cloud-Services
Cmjohnson updated the task description for T224188: rack/setup/install (3) new osd ceph nodes.
Jun 19 2019, 4:56 PM · SRE, cloud-services-team (Kanban), Cloud-Services
Cmjohnson updated the task description for T225704: eqiad: rack/setup/install (4) dbproxy systems..
Jun 19 2019, 4:54 PM · Patch-For-Review, SRE, DBA
Cmjohnson reassigned T225704: eqiad: rack/setup/install (4) dbproxy systems. from Cmjohnson to RobH.

I updated the switch config to private1-d.....both servers are currently off and ready for installs. assigning to @RobH to install

Jun 19 2019, 4:54 PM · Patch-For-Review, SRE, DBA
Cmjohnson added a comment to T222731: Storage problems with new host db1133.

Dell is sending me a new Raid card, cables and backplane. Sorry, it took so long, I had to call them after they denied my second request.

Jun 19 2019, 4:40 PM · ops-eqiad, SRE
Cmjohnson added a comment to T218544: ms-be1043 sdk failed.
Jun 19 2019, 4:38 PM · User-fgiunchedi, observability, SRE-tools, SRE, ops-eqiad
Cmjohnson closed T214283: Memory correctable errors -EDAC- elastic1029 as Resolved.

Closing this for now, let me know if there is another issue. Keep in mind this server is out of warranty

Jun 19 2019, 4:10 PM · Discovery-Search (Current work), ops-eqiad, Discovery-ARCHIVED, DC-Ops, SRE
Cmjohnson added a comment to T214283: Memory correctable errors -EDAC- elastic1029.

The DIMM has been reseated and swapped to the opposite sides.

Jun 19 2019, 3:45 PM · Discovery-Search (Current work), ops-eqiad, Discovery-ARCHIVED, DC-Ops, SRE

Jun 17 2019

Cmjohnson closed T196697: rack/setup/add to spares tracking 2 single cpu misc class systems as Resolved.

these have been racked

Jun 17 2019, 6:48 PM · ops-eqiad, SRE
Cmjohnson closed T225219: eqiad: rack and setup (3) dual CPU servers as Resolved.

servers are set up and have been added to the tracking sheet

Jun 17 2019, 6:44 PM · SRE, ops-eqiad
Cmjohnson updated the task description for T225219: eqiad: rack and setup (3) dual CPU servers .
Jun 17 2019, 6:44 PM · SRE, ops-eqiad
Cmjohnson closed T219890: rack/setup 3 new single cpu spare pool systems as Resolved.

servers are ready as spares and in tracking sheet

Jun 17 2019, 6:43 PM · ops-eqiad, SRE
Cmjohnson added a comment to T225704: eqiad: rack/setup/install (4) dbproxy systems..

@ayounsi I rather not move the servers...I racked them based on the instructions and they're already in racks and setup

Jun 17 2019, 6:39 PM · Patch-For-Review, SRE, DBA
Cmjohnson updated the task description for T219890: rack/setup 3 new single cpu spare pool systems.
Jun 17 2019, 6:33 PM · ops-eqiad, SRE
Cmjohnson added a comment to T225704: eqiad: rack/setup/install (4) dbproxy systems..

@Marostegui: do they all go to the cloud vlan? if they do then 1020 and 1021 are in row D...that support-cloud vlan is not available on row D yet. I need Arzhel to copy the vlan over.

Jun 17 2019, 6:17 PM · Patch-For-Review, SRE, DBA
Cmjohnson moved T225704: eqiad: rack/setup/install (4) dbproxy systems. from Backlog to Racking Tasks on the ops-eqiad board.
Jun 17 2019, 6:11 PM · Patch-For-Review, SRE, DBA
Cmjohnson updated the task description for T225704: eqiad: rack/setup/install (4) dbproxy systems..
Jun 17 2019, 6:11 PM · Patch-For-Review, SRE, DBA
Cmjohnson reassigned T225704: eqiad: rack/setup/install (4) dbproxy systems. from Cmjohnson to ayounsi.

Assigning to @ayounsi to add cloud-support1-d-eqiad. Once that is done, the vlan for dbproxy1020 and 1021 will need to be set up. Switch port descriptions are done.

Jun 17 2019, 6:11 PM · Patch-For-Review, SRE, DBA
Cmjohnson updated the task description for T225704: eqiad: rack/setup/install (4) dbproxy systems..
Jun 17 2019, 6:04 PM · Patch-For-Review, SRE, DBA

Jun 11 2019

Cmjohnson moved T223518: ms-be1033 not powering up from Hardware Failure / Troubleshoot to Blocked on the ops-eqiad board.
Jun 11 2019, 4:16 PM · User-fgiunchedi, SRE, ops-eqiad
Cmjohnson moved T215411: (OoW) thumbor1004 memory errors from Hardware Failure / Troubleshoot to Stalled on the ops-eqiad board.
Jun 11 2019, 4:16 PM · User-jijiki, Thumbor, ops-eqiad, serviceops, SRE
Cmjohnson added a comment to T222731: Storage problems with new host db1133.

they declined my ticket...says I didn't isolate the problem well enough.

Jun 11 2019, 4:15 PM · ops-eqiad, SRE
Cmjohnson closed T222922: wmf7622 wont powercycle (cannot be allocated from spares) as Resolved.

This server accepts all the racadm commands successfully. I verified on-site that these things actually happened

Jun 11 2019, 4:02 PM · SRE, ops-eqiad
Cmjohnson updated the task description for T222922: wmf7622 wont powercycle (cannot be allocated from spares).
Jun 11 2019, 4:01 PM · SRE, ops-eqiad
Cmjohnson updated the task description for T222922: wmf7622 wont powercycle (cannot be allocated from spares).
Jun 11 2019, 4:01 PM · SRE, ops-eqiad
Cmjohnson added a comment to T220853: VMs on cloudvirt1015 crashing - bad mainboard/memory.

@Andrew what parts? There is nothing that suggests that it is CPU on the server side of things. I reseated and moved the DIMM and that error has not returned. It may very well have been poorly seated DIMM. I checked dmesg and do not see any more errors related to memory or CPU. Try putting it back into production and let's see if anything comes back. Unfortunately, I need to demonstrate and prove there is a problem for Dell to do anything and right now I do not have anything to give them.

Jun 11 2019, 3:57 PM · cloud-services-team (Kanban), SRE, ops-eqiad, DC-Ops, User-Zppix
Cmjohnson moved T224260: restbase-dev1006 has a broken disk from Hardware Failure / Troubleshoot to Blocked on the ops-eqiad board.
Jun 11 2019, 3:50 PM · Platform Engineering (Needs Cleaning - Cassandra Operational), Cassandra, RESTBase, Services (watching), SRE
Cmjohnson closed T223825: Degraded RAID on restbase-dev1006 as Declined.

this is a duplicate task declining

Jun 11 2019, 3:50 PM · ops-eqiad, SRE
Cmjohnson assigned T223825: Degraded RAID on restbase-dev1006 to RobH.

This server's SSD's are not part of the original build and under HP warranty. They are intel SSDs that I believe came from restbase1001-1003. Assigning to @RobH to order new SSDs.

Jun 11 2019, 3:49 PM · ops-eqiad, SRE
Cmjohnson moved T222950: (OoW) cloudvirt1006 - RAID battery failed from Hardware Failure / Troubleshoot to Blocked on the ops-eqiad board.
Jun 11 2019, 3:46 PM · cloud-services-team (Hardware), User-jbond, ops-eqiad, SRE
Cmjohnson closed T220880: Degraded RAID on analytics1039 as Resolved.

I found a spare disk and added the disk back, it's now online

Jun 11 2019, 3:45 PM · ops-eqiad, SRE
Cmjohnson added a comment to T214283: Memory correctable errors -EDAC- elastic1029.

@Gehel you will need to take the server offline for a day so I can reseat the DIMM. The server logs do not indicate any memory errors. If you want to downtime it for Wednesday or Thursday let me know.

Jun 11 2019, 3:37 PM · Discovery-Search (Current work), ops-eqiad, Discovery-ARCHIVED, DC-Ops, SRE
Cmjohnson closed T225391: db1077 crashed as Resolved.

@Marostegui that log entry may have been old. The server has both power supplies connected and does not report any current errors. Resolving the task.

Jun 11 2019, 3:35 PM · ops-eqiad, DBA, SRE

Jun 10 2019

Cmjohnson reassigned T224260: restbase-dev1006 has a broken disk from Cmjohnson to RobH.

@RobH this disk will need to be ordered outside of the warranty. These servers were shipped without disks, the procurement task states that the disk from RBDEV1001-1003 will be used. They are 800GB Intell SSDS

Jun 10 2019, 7:08 PM · Platform Engineering (Needs Cleaning - Cassandra Operational), Cassandra, RESTBase, Services (watching), SRE
Cmjohnson closed T223126: Install new PDUs into b5-eqiad as Resolved.

This has been completed

Jun 10 2019, 6:59 PM · ops-eqiad, SRE
Cmjohnson closed T224795: Degraded RAID on analytics1029 as Declined.

since this server is out of warranty and @elukey said to skip replacing the disk. If the status changes and needs to be done please re-open task

Jun 10 2019, 6:54 PM · Analytics, ops-eqiad, SRE
Cmjohnson closed T224805: db1062 (s7 db primary master) disk with predictive failure, a subtask of T208323: Predictive failures on disk S.M.A.R.T. status, as Declined.
Jun 10 2019, 6:53 PM · SRE, DBA
Cmjohnson closed T224805: db1062 (s7 db primary master) disk with predictive failure as Declined.

declining this for now since it's out of warranty and the disk has not failed

Jun 10 2019, 6:53 PM · ops-eqiad, SRE, DBA
Cmjohnson reassigned T225391: db1077 crashed from Cmjohnson to Marostegui.

I updated with the service pack and powered on...reassigning to @Marostegui

Jun 10 2019, 6:48 PM · ops-eqiad, DBA, SRE

Jun 7 2019

Cmjohnson closed T223518: ms-be1033 not powering up as Resolved.

The motherboard was replaced and the server is back up

Jun 7 2019, 4:22 PM · User-fgiunchedi, SRE, ops-eqiad

Jun 6 2019

Cmjohnson added a comment to T224805: db1062 (s7 db primary master) disk with predictive failure.

The server is out of warrant and we will need to order more 600GB disks.

Jun 6 2019, 3:18 PM · ops-eqiad, SRE, DBA
Cmjohnson moved T225219: eqiad: rack and setup (3) dual CPU servers from Backlog to Racking Tasks on the ops-eqiad board.
Jun 6 2019, 3:17 PM · SRE, ops-eqiad
Cmjohnson added a parent task for T225219: eqiad: rack and setup (3) dual CPU servers : Unknown Object (Task).
Jun 6 2019, 2:59 PM · SRE, ops-eqiad
Cmjohnson created T225219: eqiad: rack and setup (3) dual CPU servers .
Jun 6 2019, 2:59 PM · SRE, ops-eqiad
Cmjohnson added a comment to T223518: ms-be1033 not powering up.

The HP technician will be her June 7 @1000 Ashburn time.

Jun 6 2019, 2:55 PM · User-fgiunchedi, SRE, ops-eqiad
Cmjohnson added a comment to T222731: Storage problems with new host db1133.

You have successfully submitted request SR991779294.

Jun 6 2019, 2:52 PM · ops-eqiad, SRE

Jun 5 2019

Cmjohnson added a comment to T222731: Storage problems with new host db1133.

Update on this server. I have updated all of the f/w including the raid card. I am able to isolate the problem to slot 0 right now. I moved the disks around and they do not report any errors only the slot. I have blown out the raid several times and re-configured but the error keeps coming back. I have reseated the raid card as well.

Jun 5 2019, 5:22 PM · ops-eqiad, SRE
Cmjohnson closed T225060: db1091 crashed as Resolved.

The bbu has been replaced.

Jun 5 2019, 5:10 PM · Patch-For-Review, ops-eqiad, SRE, DBA
Cmjohnson added a comment to T225060: db1091 crashed.

Good afternoon! db1091...i do have a spare bbu but that spare has been helpful the last year or so. HP is slow to send out the batteries, they can take days to get because of their slow response time and then having to ship batteries via ground transportation only. If I use it for this server than I am not able to quickly change out the bbu on something that may be more important in the future. The call
10:22 is yours since you have the most BBU issues.

Jun 5 2019, 2:26 PM · Patch-For-Review, ops-eqiad, SRE, DBA
Cmjohnson moved T223217: Decommission db1064 from Backlog to Decommission on the ops-eqiad board.
Jun 5 2019, 2:19 PM · Patch-For-Review, SRE, ops-eqiad, decommission-hardware

Jun 4 2019

Cmjohnson updated the task description for T219890: rack/setup 3 new single cpu spare pool systems.
Jun 4 2019, 4:04 PM · ops-eqiad, SRE

May 31 2019

Cmjohnson closed T213422: es1019 IPMI and its management interface are unresponsive (again) as Resolved.

@jcrespo the server is back on and I am able to reach the mgmt interface.

May 31 2019, 6:41 PM · SRE, ops-eqiad
Cmjohnson closed T213422: es1019 IPMI and its management interface are unresponsive (again), a subtask of T167121: Several hosts return "internal IPMI error" in the check_ipmi_temp check, as Resolved.
May 31 2019, 6:41 PM · Patch-For-Review, SRE, observability
Cmjohnson closed T213422: es1019 IPMI and its management interface are unresponsive (again), a subtask of T193155: IPMI Audit 2018-04, as Resolved.
May 31 2019, 6:40 PM · SRE
Cmjohnson assigned T207707: contint1001 store docker images on separate partition or disk to greg.

@greg the disks have been added and assigned to you

May 31 2019, 6:14 PM · Release-Engineering-Team (CI & Testing services), Release-Engineering-Team-TODO (201907), serviceops, SRE, Continuous-Integration-Infrastructure

May 30 2019

Cmjohnson added a comment to T220853: VMs on cloudvirt1015 crashing - bad mainboard/memory.

Swapped DIMM B3 with DIMM A3 and cleared the log.

May 30 2019, 5:39 PM · cloud-services-team (Kanban), SRE, ops-eqiad, DC-Ops, User-Zppix
Cmjohnson added a comment to T221502: db1099 memory issues.

Swapped DIMM A5 with DIMM B5 and cleared the racadm log.

May 30 2019, 3:42 PM · ops-eqiad, SRE, DBA
Cmjohnson added a comment to T207707: contint1001 store docker images on separate partition or disk.

@greg @RobH I am just plugging these disks into the server correct? nothing else? this will not require downtime afaik.

May 30 2019, 2:38 PM · Release-Engineering-Team (CI & Testing services), Release-Engineering-Team-TODO (201907), serviceops, SRE, Continuous-Integration-Infrastructure
Cmjohnson closed Unknown Object (Task), a subtask of T207707: contint1001 store docker images on separate partition or disk, as Resolved.
May 30 2019, 2:36 PM · Release-Engineering-Team (CI & Testing services), Release-Engineering-Team-TODO (201907), serviceops, SRE, Continuous-Integration-Infrastructure

May 29 2019

Cmjohnson added a comment to T223825: Degraded RAID on restbase-dev1006.

a ticket has been created with HP for a replacement 5338974144

May 29 2019, 7:06 PM · ops-eqiad, SRE
Cmjohnson added a comment to T223518: ms-be1033 not powering up.

Steps i have taken

May 29 2019, 7:00 PM · User-fgiunchedi, SRE, ops-eqiad