Page MenuHomePhabricator

Cmjohnson (cmjohnson)
User

Projects (11)

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Saturday

  • Clear sailing ahead.

User Details

User Since
Dec 16 2014, 10:22 PM (239 w, 1 d)
Availability
Available
IRC Nick
cmjohnson1
LDAP User
Cmjohnson
MediaWiki User
Unknown

Recent Activity

Yesterday

Cmjohnson reassigned T226188: relocate/reimage cloudvirt1014 with 10G interfaces from Cmjohnson to Andrew.

@andrewbogott This server is ready for you, i updated raid cfg to R10 and 2 spare disks. Network switch updated (old switch info removed) with new ports and correct vlans. Disabled the ethernet ports and the 10G ports are enabled. I sent you the mac address via PM in IRC. Resolve this task once finished. I am removing the ops-eqiad tag and assigned to you. If there is an issue please add the tag back and assign to me.

Wed, Jul 17, 4:57 PM · DC-Ops, Operations, Epic, cloud-services-team (Kanban)
Cmjohnson updated the task description for T226188: relocate/reimage cloudvirt1014 with 10G interfaces.
Wed, Jul 17, 4:54 PM · DC-Ops, Operations, Epic, cloud-services-team (Kanban)
Cmjohnson updated the task description for T226188: relocate/reimage cloudvirt1014 with 10G interfaces.
Wed, Jul 17, 4:45 PM · DC-Ops, Operations, Epic, cloud-services-team (Kanban)

Tue, Jul 16

Cmjohnson added a comment to T226188: relocate/reimage cloudvirt1014 with 10G interfaces.

@andrewbogott: Is it safe to move forward with this task?

Tue, Jul 16, 8:27 PM · DC-Ops, Operations, Epic, cloud-services-team (Kanban)
Cmjohnson moved T224260: restbase-dev1006 has a broken disk from Blocked to Hardware Failure / Troubleshoot on the ops-eqiad board.
Tue, Jul 16, 8:25 PM · Cassandra, RESTBase, Core Platform Team (Security, stability, performance and scalability (TEC1)), Core Platform Team Backlog (Watching / External), Services (watching), Operations, DC-Ops, ops-eqiad
Cmjohnson added a comment to T224260: restbase-dev1006 has a broken disk.

@Volans I have the new ssd, are you positive that /dev/sda is in slot 0?

Tue, Jul 16, 8:24 PM · Cassandra, RESTBase, Core Platform Team (Security, stability, performance and scalability (TEC1)), Core Platform Team Backlog (Watching / External), Services (watching), Operations, DC-Ops, ops-eqiad
Cmjohnson added a comment to T226467: Broken disk on analytics1072.

cmjohnson@analytics1072:~$ sudo megacli -LdPdInfo -aall | grep -e 'Virtual Drive' -e Slot
Virtual Drive: 0 (Target Id: 0)
Slot Number: 12
Slot Number: 13
Virtual Drive: 2 (Target Id: 2)
Slot Number: 1
Virtual Drive: 3 (Target Id: 3)
Slot Number: 2
Virtual Drive: 4 (Target Id: 4)
Slot Number: 3
Virtual Drive: 5 (Target Id: 5)
Slot Number: 4
Virtual Drive: 6 (Target Id: 6)
Slot Number: 5
Virtual Drive: 7 (Target Id: 7)
Slot Number: 6
Virtual Drive: 8 (Target Id: 8)
Slot Number: 7
Virtual Drive: 9 (Target Id: 9)
Slot Number: 8
Virtual Drive: 10 (Target Id: 10)
Slot Number: 9
Virtual Drive: 11 (Target Id: 11)
Slot Number: 10
Virtual Drive: 12 (Target Id: 12)
Slot Number: 11

Tue, Jul 16, 8:22 PM · ops-eqiad, Operations, Analytics
Cmjohnson added a comment to T226467: Broken disk on analytics1072.

@elukey the disk has been replaced, it is in still unconfigured (good) the disk needs to be mapped back to Virtual Drive: 1 (Target Id: 1)
Slot Number: 0

Tue, Jul 16, 8:20 PM · ops-eqiad, Operations, Analytics
Cmjohnson closed T218544: ms-be1043 sdk failed as Resolved.

@godog replaced the disk, all should be good now. If you find that it's not please re-open the task and ping me.

Tue, Jul 16, 8:07 PM · User-fgiunchedi, observability, Operations-Software-Development, Operations, ops-eqiad
Cmjohnson closed T222050: db1107 (eventlogging db master) possibly memory issues as Resolved.

Resolving this task for now, if the error returns please re-open and ping me.

Tue, Jul 16, 7:59 PM · Analytics, Operations, ops-eqiad, Analytics-EventLogging, DBA
Cmjohnson moved T228102: rack/setup/install cloudcephmon100[123] from Backlog to Racking Tasks on the ops-eqiad board.
Tue, Jul 16, 7:56 PM · cloud-services-team (Kanban), Operations, Cloud-Services, ops-eqiad
Cmjohnson reassigned T224794: Degraded RAID on helium from Cmjohnson to RobH.

I do not have any spare 4TB SAS disks...this will need to go to @RobH and @wiki_willy for a procurement task.

Tue, Jul 16, 7:52 PM · ops-eqiad, Operations
Cmjohnson moved T227940: (OoW) Degraded RAID on analytics1032 from Hardware Failure / Troubleshoot to Blocked on the ops-eqiad board.
Tue, Jul 16, 3:27 PM · ops-eqiad, Operations
Cmjohnson closed T227867: mw1239 memory errors as Resolved.

I am resolving this ticket, please re-open and ping me if the problem returns.

Tue, Jul 16, 3:27 PM · ops-eqiad, DC-Ops, Operations, serviceops
Cmjohnson added a comment to T227867: mw1239 memory errors .

I swapped all the DIMM from side A to side B cleared the log and powered back up. Please put the server back in service and let's see if the reseating worked.

Tue, Jul 16, 3:26 PM · ops-eqiad, DC-Ops, Operations, serviceops
Cmjohnson added a comment to T227867: mw1239 memory errors .

Last log paste before clearing the log

Tue, Jul 16, 3:26 PM · ops-eqiad, DC-Ops, Operations, serviceops
Cmjohnson added a comment to T222050: db1107 (eventlogging db master) possibly memory issues.

Swapped DIMM A3 with DIMM B3, now we have to powrer the server back on and let it go for a few days to see if the error returns and where it returns.

Tue, Jul 16, 3:06 PM · Analytics, Operations, ops-eqiad, Analytics-EventLogging, DBA
Cmjohnson added a comment to T222050: db1107 (eventlogging db master) possibly memory issues.

one last paste of the idrac log

Tue, Jul 16, 3:01 PM · Analytics, Operations, ops-eqiad, Analytics-EventLogging, DBA
Cmjohnson added a comment to T218544: ms-be1043 sdk failed.

@godog I did get the new disk but since it's not failed...I am not sure which disk is actually bad on my end. Do you know which slot the disk is in or let's coordinate and see if you can make the disk blink.

Tue, Jul 16, 2:52 PM · User-fgiunchedi, observability, Operations-Software-Development, Operations, ops-eqiad
Cmjohnson added a comment to T226467: Broken disk on analytics1072.

I received the disk on-site but I cannot tell which disk is failed, they all have green LEDs. @elukey could you please let me know which disk slot or let's coordinate to make the disk blink.

Tue, Jul 16, 2:51 PM · ops-eqiad, Operations, Analytics

Mon, Jul 15

Cmjohnson added a comment to T218544: ms-be1043 sdk failed.

@godog, no worries about the earlier comment. Dell approved the disk replacement. I will update task once it's been replaced.

Mon, Jul 15, 5:54 PM · User-fgiunchedi, observability, Operations-Software-Development, Operations, ops-eqiad
Cmjohnson added a comment to T226467: Broken disk on analytics1072.

Disks is on it's way

Mon, Jul 15, 5:32 PM · ops-eqiad, Operations, Analytics
Cmjohnson added a comment to T218544: ms-be1043 sdk failed.

Thanks, @godog is there any way you can put some stress on that disk? It's hard for me to justify to Dell that we need a disk replacement when it shows that it is working

Mon, Jul 15, 4:32 PM · User-fgiunchedi, observability, Operations-Software-Development, Operations, ops-eqiad

Fri, Jul 12

Cmjohnson merged task T205364: helium (bacula) - Device not healthy -SMART- into T224794: Degraded RAID on helium.
Fri, Jul 12, 7:05 PM · ops-eqiad, Operations
Cmjohnson merged T205364: helium (bacula) - Device not healthy -SMART- into T224794: Degraded RAID on helium.
Fri, Jul 12, 7:05 PM · ops-eqiad, Operations
Cmjohnson added a comment to T226467: Broken disk on analytics1072.

You have successfully submitted request SR994463766 is the Dell ticket created. I did see the disk in megacli so I am not sure the TSR report I sent them will include the disk. I did include what you pasted in this ticket showing sdb as failed. Hopefully, that's enough to get a new disk shipped.

Fri, Jul 12, 6:56 PM · ops-eqiad, Operations, Analytics
Cmjohnson added a comment to T224794: Degraded RAID on helium.

I am not sure what I was looking at yesterday but this server is out of warranty. However, I think I have a 4TB disks that I can replace it with. I will confirm when I get back to eqiad next week.

Fri, Jul 12, 6:46 PM · ops-eqiad, Operations
Cmjohnson moved T227867: mw1239 memory errors from Backlog to Hardware Failure / Troubleshoot on the ops-eqiad board.
Fri, Jul 12, 6:43 PM · ops-eqiad, DC-Ops, Operations, serviceops
Cmjohnson added a comment to T227867: mw1239 memory errors .

This server is out of warranty, I can reseat the DIMM but will need the server to taken down.

Fri, Jul 12, 6:43 PM · ops-eqiad, DC-Ops, Operations, serviceops
Cmjohnson added a comment to T218544: ms-be1043 sdk failed.

here is the Dell task You have successfully submitted request SR994463101.

Fri, Jul 12, 6:42 PM · User-fgiunchedi, observability, Operations-Software-Development, Operations, ops-eqiad
Cmjohnson added a comment to T218544: ms-be1043 sdk failed.

This is a dell server, I will try and put in a ticket with Dell but all h/w is showing that there isn't a problem so I may have trouble with Dell giving me a disk for an issue that they do not think exists.

Fri, Jul 12, 6:39 PM · User-fgiunchedi, observability, Operations-Software-Development, Operations, ops-eqiad
Cmjohnson moved T191357: decom silver/WMF3434 from UnRacking Tasks to Decommission on the ops-eqiad board.
Fri, Jul 12, 12:52 AM · decommission, Operations, DC-Ops, ops-eqiad
Cmjohnson moved T170474: Decommisson and store old row D network gear. from UnRacking Tasks to Decommission on the ops-eqiad board.
Fri, Jul 12, 12:52 AM · Operations, ops-eqiad
Cmjohnson moved T196487: upgrade row d to have 3 10G switches from High Priority Task to Not urgent on the ops-eqiad board.
Fri, Jul 12, 12:43 AM · ops-eqiad, netops, Operations
Cmjohnson moved T206185: connect atlas-ulsfo to scs-ulsfo from High Priority Task to Not urgent on the ops-eqiad board.
Fri, Jul 12, 12:43 AM · DC-Ops, Operations, ops-ulsfo, ops-eqiad
Cmjohnson moved T224475: Return sulfur to spares from Backlog to Not urgent on the ops-eqiad board.
Fri, Jul 12, 12:43 AM · decommission, ops-eqiad, Operations
Cmjohnson moved T220700: Upgrade kafka-jumbo100[1-6] to 10G NICs (if possible) from Stalled to Blocked on the ops-eqiad board.
Fri, Jul 12, 12:35 AM · netops, ops-eqiad, hardware-requests, Operations, Analytics, User-Elukey
Cmjohnson moved T226599: (OoW) Degraded RAID on analytics1039 from Stalled to Blocked on the ops-eqiad board.
Fri, Jul 12, 12:35 AM · ops-eqiad, Operations
Cmjohnson moved T196487: upgrade row d to have 3 10G switches from Up next to High Priority Task on the ops-eqiad board.
Fri, Jul 12, 12:34 AM · ops-eqiad, netops, Operations
Cmjohnson moved T206185: connect atlas-ulsfo to scs-ulsfo from Up next to High Priority Task on the ops-eqiad board.
Fri, Jul 12, 12:34 AM · DC-Ops, Operations, ops-ulsfo, ops-eqiad

Thu, Jul 11

Cmjohnson moved T222950: (OoW) cloudvirt1006 - RAID battery failed from Blocked to Cloud Tasks on the ops-eqiad board.
Thu, Jul 11, 11:54 PM · cloud-services-team, ops-eqiad, Operations
Cmjohnson closed T215411: (OoW) thumbor1004 memory errors as Declined.

Declining the task since the server is out of warranty.

Thu, Jul 11, 11:35 PM · User-jijiki, Thumbor, ops-eqiad, serviceops, Operations
Cmjohnson closed T193628: (OoW) tungsten disk 1 and 8 SMART failure as Declined.

Since there is no need to replace these disks...declining the task

Thu, Jul 11, 11:07 PM · Performance-Team (Radar), ops-eqiad, Operations
Cmjohnson added a comment to T218751: Audit down ports.

All items on rows A, B an C have been updated. Row D will need some on-site verification

Thu, Jul 11, 10:47 PM · DC-Ops, ops-ulsfo, ops-eqiad, Operations
Cmjohnson closed T209861: cloudvirt1007 predicted raid failure as Resolved.

This was completed awhile ago...never updated task

Thu, Jul 11, 10:22 PM · Operations, ops-eqiad, DC-Ops, cloud-services-team (Kanban)
Cmjohnson added a comment to T224794: Degraded RAID on helium.
Thu, Jul 11, 7:23 PM · ops-eqiad, Operations
Cmjohnson moved T226467: Broken disk on analytics1072 from Backlog to Hardware Failure / Troubleshoot on the ops-eqiad board.
Thu, Jul 11, 7:10 PM · ops-eqiad, Operations, Analytics
Cmjohnson added a comment to T226467: Broken disk on analytics1072.

@elukey I am not sure which disk this? I think it's a smaller ssd? Can you confirm the disk type and size please ?

Thu, Jul 11, 7:10 PM · ops-eqiad, Operations, Analytics
Cmjohnson added a comment to T222050: db1107 (eventlogging db master) possibly memory issues.

I still need to move the DIMM around ...I need the server taken down. If this needs to be scheduled, please let me know when you can have the server down?

Thu, Jul 11, 6:59 PM · Analytics, Operations, ops-eqiad, Analytics-EventLogging, DBA
Cmjohnson added a comment to T222960: Fix restbase1017's physical rack.

@Dzahn, I need to know I don't know what that means? What does DC-ops need to troubleshoot? Thanks

Thu, Jul 11, 6:02 PM · Patch-For-Review, serviceops, Core Platform Team Workboards (Team 2), Operations, Services (doing), Core Platform Team (Security, stability, performance and scalability (TEC1)), User-Eevans, Cassandra
Cmjohnson removed a project from T201342: rack/setup/install puppetmaster1003.eqiad.wmnet: ops-eqiad.

I am removing the ops-eqiad tag, if you onsite work is still required please add the ops-eqiad tag.

Thu, Jul 11, 5:56 PM · Operations
Cmjohnson closed T204491: Heating alerts / memory errors on mw1254 as Resolved.
Thu, Jul 11, 5:55 PM · serviceops, Operations, ops-eqiad
Cmjohnson removed a project from T184293: rack/setup/install lvs101[3-6]: ops-eqiad.

I am removing the ops-eqiad tag on this task, if you need additional dc ops work please add the tag back.

Thu, Jul 11, 5:55 PM · Operations, Traffic
Cmjohnson closed T209139: (OoW) Broken memory on mw1239 as Declined.

The server is out of warranty and I do not have any spare DIMM

Thu, Jul 11, 5:49 PM · ops-eqiad, Operations
Cmjohnson closed T220853: VMs on cloudvirt1015 crashing - bad mainboard/memory as Resolved.

I am resolving this task

Thu, Jul 11, 5:47 PM · Operations, ops-eqiad, DC-Ops, User-Zppix, cloud-services-team (Kanban)
Cmjohnson added a comment to T222960: Fix restbase1017's physical rack.

I have removed the ops-eqiad tag, if you have an issue that required DC ops please add the ops-eqiad tag back to the task.

Thu, Jul 11, 5:44 PM · Patch-For-Review, serviceops, Core Platform Team Workboards (Team 2), Operations, Services (doing), Core Platform Team (Security, stability, performance and scalability (TEC1)), User-Eevans, Cassandra
Cmjohnson removed a project from T222960: Fix restbase1017's physical rack: ops-eqiad.
Thu, Jul 11, 5:38 PM · Patch-For-Review, serviceops, Core Platform Team Workboards (Team 2), Operations, Services (doing), Core Platform Team (Security, stability, performance and scalability (TEC1)), User-Eevans, Cassandra

Tue, Jul 9

Cmjohnson added a comment to T222960: Fix restbase1017's physical rack.

@Eevans We did a test run for an install and the server was able to reach the installer without an issue. I did see on IRC something about stretch. I will leave that up to you if you like and the server can be installed whenever you need it.

Tue, Jul 9, 6:36 PM · Patch-For-Review, serviceops, Core Platform Team Workboards (Team 2), Operations, Services (doing), Core Platform Team (Security, stability, performance and scalability (TEC1)), User-Eevans, Cassandra
Cmjohnson added a comment to T222960: Fix restbase1017's physical rack.

restbase1017 has been moved to rack B5
network port updated
DNS updated

Tue, Jul 9, 3:08 PM · Patch-For-Review, serviceops, Core Platform Team Workboards (Team 2), Operations, Services (doing), Core Platform Team (Security, stability, performance and scalability (TEC1)), User-Eevans, Cassandra

Wed, Jul 3

Cmjohnson added a comment to T225128: Move cloudvirtan* hardware out of CloudVPS back into production Analytics VLAN..

@Ottomata
Please decommission the current servers to spare role
Please provide the new hostnames you want to use
These are all located in row B...will that be okay or do you need them spread out across the rows?

Wed, Jul 3, 7:38 PM · Analytics-Kanban, ops-eqiad, Operations, netops, Analytics
Cmjohnson updated the task description for T226274: (Need By: June 30) rack/setup/install kafka-main100[1-5].
Wed, Jul 3, 6:14 PM · User-herron, Operations

Thu, Jun 27

Cmjohnson moved T224794: Degraded RAID on helium from Backlog to Hardware Failure / Troubleshoot on the ops-eqiad board.
Thu, Jun 27, 4:29 PM · ops-eqiad, Operations
Cmjohnson added a comment to T222960: Fix restbase1017's physical rack.

@Eevans Do you still want to move this server? Let's coordinate a day/time

Thu, Jun 27, 4:29 PM · Patch-For-Review, serviceops, Core Platform Team Workboards (Team 2), Operations, Services (doing), Core Platform Team (Security, stability, performance and scalability (TEC1)), User-Eevans, Cassandra
Cmjohnson moved T225121: (Need By: Sept 30) upgrade msw1-eqiad from EX4200 to EX4300 from Backlog to Racking Tasks on the ops-eqiad board.
Thu, Jun 27, 4:28 PM · netops, Operations, ops-eqiad
Cmjohnson moved T225128: Move cloudvirtan* hardware out of CloudVPS back into production Analytics VLAN. from Backlog to Cloud Tasks on the ops-eqiad board.
Thu, Jun 27, 4:28 PM · Analytics-Kanban, ops-eqiad, Operations, netops, Analytics
Cmjohnson moved T226188: relocate/reimage cloudvirt1014 with 10G interfaces from Backlog to Cloud Tasks on the ops-eqiad board.
Thu, Jun 27, 4:28 PM · DC-Ops, Operations, Epic, cloud-services-team (Kanban)
Cmjohnson moved T226382: Hardware Request: puppet master eqiad from Backlog to Blocked on the ops-eqiad board.
Thu, Jun 27, 4:27 PM · Operations, ops-eqiad, DC-Ops
Cmjohnson assigned T226382: Hardware Request: puppet master eqiad to RobH.

Assigning this to @RobH he is able to allocate a spare

Thu, Jun 27, 4:27 PM · Operations, ops-eqiad, DC-Ops
Cmjohnson moved T226517: Decommission old Kafka analytics brokers: kafka1012,kafka1013,kafka1014,kafka1020,kafka1022,kafka1023 from UnRacking Tasks to Decommission on the ops-eqiad board.
Thu, Jun 27, 4:25 PM · ops-eqiad, DC-Ops, Analytics, decommission, Operations
Cmjohnson moved T226517: Decommission old Kafka analytics brokers: kafka1012,kafka1013,kafka1014,kafka1020,kafka1022,kafka1023 from Backlog to UnRacking Tasks on the ops-eqiad board.
Thu, Jun 27, 4:25 PM · ops-eqiad, DC-Ops, Analytics, decommission, Operations
Cmjohnson closed T226569: Degraded RAID on db1072 as Resolved.

@Marostegui disk swapped but this server is out of warranty. I would suggest moving masters to new servers.

Thu, Jun 27, 4:24 PM · DBA, ops-eqiad, Operations
Cmjohnson moved T226599: (OoW) Degraded RAID on analytics1039 from Backlog to Stalled on the ops-eqiad board.
Thu, Jun 27, 4:20 PM · ops-eqiad, Operations
Cmjohnson moved T226689: decommission db1068 from Backlog to Decommission on the ops-eqiad board.
Thu, Jun 27, 4:19 PM · DC-Ops, ops-eqiad, decommission, Operations

Mon, Jun 24

Cmjohnson closed Unknown Object (Task), a subtask of T90364: Test Ceph for instance storage, as Resolved.
Mon, Jun 24, 5:25 PM · Wikimedia-Incident, cloud-services-team (Kanban), Cloud-Services

Wed, Jun 19

Cmjohnson added a comment to T224188: rack/setup/install (3) new osd ceph nodes.

@Bstorm @ayounsi I will need very clear instructions on which racks/rows these servers can go in before I physically rack and cable. Once that is figured out please update the task.

Wed, Jun 19, 4:57 PM · ops-eqiad, Operations, cloud-services-team (Kanban), Cloud-Services
Cmjohnson updated the task description for T224188: rack/setup/install (3) new osd ceph nodes.
Wed, Jun 19, 4:56 PM · ops-eqiad, Operations, cloud-services-team (Kanban), Cloud-Services
Cmjohnson updated the task description for T225704: eqiad: rack/setup/install (4) dbproxy systems..
Wed, Jun 19, 4:54 PM · Patch-For-Review, Operations, DBA
Cmjohnson reassigned T225704: eqiad: rack/setup/install (4) dbproxy systems. from Cmjohnson to RobH.

I updated the switch config to private1-d.....both servers are currently off and ready for installs. assigning to @RobH to install

Wed, Jun 19, 4:54 PM · Patch-For-Review, Operations, DBA
Cmjohnson added a comment to T222731: Storage problems with new host db1133.

Dell is sending me a new Raid card, cables and backplane. Sorry, it took so long, I had to call them after they denied my second request.

Wed, Jun 19, 4:40 PM · ops-eqiad, Operations
Cmjohnson added a comment to T218544: ms-be1043 sdk failed.
Wed, Jun 19, 4:38 PM · User-fgiunchedi, observability, Operations-Software-Development, Operations, ops-eqiad
Cmjohnson closed T214283: Memory correctable errors -EDAC- elastic1029 as Resolved.

Closing this for now, let me know if there is another issue. Keep in mind this server is out of warranty

Wed, Jun 19, 4:10 PM · Discovery-Search (Current work), ops-eqiad, Discovery, DC-Ops, Operations
Cmjohnson added a comment to T214283: Memory correctable errors -EDAC- elastic1029.

The DIMM has been reseated and swapped to the opposite sides.

Wed, Jun 19, 3:45 PM · Discovery-Search (Current work), ops-eqiad, Discovery, DC-Ops, Operations

Jun 17 2019

Cmjohnson closed T196697: rack/setup/add to spares tracking 2 single cpu misc class systems as Resolved.

these have been racked

Jun 17 2019, 6:48 PM · ops-eqiad, Operations
Cmjohnson closed T225219: eqiad: rack and setup (3) dual CPU servers as Resolved.

servers are set up and have been added to the tracking sheet

Jun 17 2019, 6:44 PM · Operations, ops-eqiad
Cmjohnson updated the task description for T225219: eqiad: rack and setup (3) dual CPU servers .
Jun 17 2019, 6:44 PM · Operations, ops-eqiad
Cmjohnson closed T219890: rack/setup 3 new single cpu spare pool systems as Resolved.

servers are ready as spares and in tracking sheet

Jun 17 2019, 6:43 PM · ops-eqiad, Operations
Cmjohnson added a comment to T225704: eqiad: rack/setup/install (4) dbproxy systems..

@ayounsi I rather not move the servers...I racked them based on the instructions and they're already in racks and setup

Jun 17 2019, 6:39 PM · Patch-For-Review, Operations, DBA
Cmjohnson updated the task description for T219890: rack/setup 3 new single cpu spare pool systems.
Jun 17 2019, 6:33 PM · ops-eqiad, Operations
Cmjohnson added a comment to T225704: eqiad: rack/setup/install (4) dbproxy systems..

@Marostegui: do they all go to the cloud vlan? if they do then 1020 and 1021 are in row D...that support-cloud vlan is not available on row D yet. I need Arzhel to copy the vlan over.

Jun 17 2019, 6:17 PM · Patch-For-Review, Operations, DBA
Cmjohnson moved T225704: eqiad: rack/setup/install (4) dbproxy systems. from Backlog to Racking Tasks on the ops-eqiad board.
Jun 17 2019, 6:11 PM · Patch-For-Review, Operations, DBA
Cmjohnson updated the task description for T225704: eqiad: rack/setup/install (4) dbproxy systems..
Jun 17 2019, 6:11 PM · Patch-For-Review, Operations, DBA
Cmjohnson reassigned T225704: eqiad: rack/setup/install (4) dbproxy systems. from Cmjohnson to ayounsi.

Assigning to @ayounsi to add cloud-support1-d-eqiad. Once that is done, the vlan for dbproxy1020 and 1021 will need to be set up. Switch port descriptions are done.

Jun 17 2019, 6:11 PM · Patch-For-Review, Operations, DBA
Cmjohnson updated the task description for T225704: eqiad: rack/setup/install (4) dbproxy systems..
Jun 17 2019, 6:04 PM · Patch-For-Review, Operations, DBA

Jun 11 2019

Cmjohnson moved T223518: ms-be1033 not powering up from Hardware Failure / Troubleshoot to Blocked on the ops-eqiad board.
Jun 11 2019, 4:16 PM · User-fgiunchedi, Operations, ops-eqiad
Cmjohnson moved T215411: (OoW) thumbor1004 memory errors from Hardware Failure / Troubleshoot to Stalled on the ops-eqiad board.
Jun 11 2019, 4:16 PM · User-jijiki, Thumbor, ops-eqiad, serviceops, Operations
Cmjohnson added a comment to T222731: Storage problems with new host db1133.

they declined my ticket...says I didn't isolate the problem well enough.

Jun 11 2019, 4:15 PM · ops-eqiad, Operations
Cmjohnson closed T222922: wmf7622 wont powercycle (cannot be allocated from spares) as Resolved.

This server accepts all the racadm commands successfully. I verified on-site that these things actually happened

Jun 11 2019, 4:02 PM · Operations, ops-eqiad
Cmjohnson updated the task description for T222922: wmf7622 wont powercycle (cannot be allocated from spares).
Jun 11 2019, 4:01 PM · Operations, ops-eqiad
Cmjohnson updated the task description for T222922: wmf7622 wont powercycle (cannot be allocated from spares).
Jun 11 2019, 4:01 PM · Operations, ops-eqiad
Cmjohnson added a comment to T220853: VMs on cloudvirt1015 crashing - bad mainboard/memory.

@Andrew what parts? There is nothing that suggests that it is CPU on the server side of things. I reseated and moved the DIMM and that error has not returned. It may very well have been poorly seated DIMM. I checked dmesg and do not see any more errors related to memory or CPU. Try putting it back into production and let's see if anything comes back. Unfortunately, I need to demonstrate and prove there is a problem for Dell to do anything and right now I do not have anything to give them.

Jun 11 2019, 3:57 PM · Operations, ops-eqiad, DC-Ops, User-Zppix, cloud-services-team (Kanban)