Cmjohnson (cmjohnson)
User

Projects (11)

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Friday

  • Clear sailing ahead.

User Details

User Since
Dec 16 2014, 10:22 PM (200 w, 22 h)
Availability
Available
IRC Nick
cmjohnson1
LDAP User
Cmjohnson
MediaWiki User
Unknown

Recent Activity

Yesterday

Cmjohnson assigned T202705: Degraded RAID on sodium to ArielGlenn.

@ArielGlenn Can you help get this disk back into rotation. Shows as unconfigured good

Tue, Oct 16, 3:32 PM · ops-eqiad, Operations
Cmjohnson added a comment to T196507: Degraded RAID on cloudvirt1019.

Still waiting on a response

Tue, Oct 16, 3:31 PM · ops-eqiad, Operations
Cmjohnson added a comment to T206915: Degraded RAID on aqs1006.

A case has been opened up with HPE Support

Tue, Oct 16, 3:22 PM · ops-eqiad, Operations

Mon, Oct 15

Cmjohnson moved T206965: Degraded RAID on dbstore1002 from Backlog to Up next on the ops-eqiad board.
Mon, Oct 15, 4:49 PM · Analytics, ops-eqiad, Operations
Cmjohnson moved T206915: Degraded RAID on aqs1006 from Backlog to Being worked on on the ops-eqiad board.
Mon, Oct 15, 4:49 PM · ops-eqiad, Operations
Cmjohnson added a comment to T199125: rack/setup/install cloudvirt102[34].

Dell sent me a 10G NIC and not a raid card. They are rushing one out.

Mon, Oct 15, 4:48 PM · cloud-services-team (Kanban), ops-eqiad, Cloud-VPS, Operations
Cmjohnson closed T206651: Degraded RAID on cloudvirt1019 as Declined.

yes, we know! still working on it...there is a current task already.

Mon, Oct 15, 4:42 PM · cloud-services-team, ops-eqiad, Operations
Cmjohnson added a comment to T206972: asw2-a-eqiad FPC7 faulty PEM0.

swapped it with one from a spare switch....leaving ticket open to enter RMA details

Mon, Oct 15, 4:41 PM · netops, Operations, ops-eqiad
Cmjohnson moved T206972: asw2-a-eqiad FPC7 faulty PEM0 from Backlog to Up next on the ops-eqiad board.
Mon, Oct 15, 4:22 PM · netops, Operations, ops-eqiad

Wed, Oct 10

Cmjohnson added a comment to T196507: Degraded RAID on cloudvirt1019.

F/W updated and now I am getting new issues...missing several of the disks. I have to get another AHS report and send to HP....the saga continues

Wed, Oct 10, 5:44 PM · ops-eqiad, Operations
Cmjohnson added a comment to T196507: Degraded RAID on cloudvirt1019.

received the new raid controller and installed, updating the firmware now. Initially it is showing as failed raid

Wed, Oct 10, 4:30 PM · ops-eqiad, Operations
Cmjohnson moved T206626: Decommission conf100[1-3] from Backlog to Decommission on the ops-eqiad board.
Wed, Oct 10, 3:35 PM · ops-eqiad, decommission, Operations
Cmjohnson added a comment to T206394: cp1076 hardware failure.

@BBlack is there any action item for me?

Wed, Oct 10, 3:35 PM · Operations, ops-eqiad, Traffic
Cmjohnson added a comment to T206500: Degraded RAID on db1067.

Reseated the disk....let's see what happens

Wed, Oct 10, 3:34 PM · DBA, ops-eqiad, Operations

Tue, Oct 9

Cmjohnson added a comment to T206313: Degraded RAID on db1072.

new disk...trying it again

Tue, Oct 9, 5:39 PM · DBA, ops-eqiad, Operations
Cmjohnson added a comment to T206254: Degraded RAID on db1073.

it is a new disk...trying it again

Tue, Oct 9, 5:38 PM · DBA, ops-eqiad, Operations
Cmjohnson added a comment to T203244: analytics1068 doesn't boot.

@elukey I am in conversation with DELL about the server, getting them the info they need.....nothing has been decided yet but as soon as they tell me what they're sending (should be a new system board) I will let you know

Tue, Oct 9, 5:36 PM · ops-eqiad, Operations, Analytics
Cmjohnson added a comment to T198479: labvirt1009 HP Raid alert.

@Bstorm the disk has been swapped...resolve once it's back to normal please

Tue, Oct 9, 5:35 PM · cloud-services-team (Kanban), Operations, ops-eqiad, DC-Ops
Cmjohnson added a comment to T205514: db1092 crashed - BBU broken.

The battery was sent to our old office address in San Francisco, they are shipping a new battery...because it's a battery it has to go ground and will be 3-5 days

Tue, Oct 9, 5:32 PM · User-Banyek, Operations, ops-eqiad, Patch-For-Review, DBA
Cmjohnson closed T206345: Degraded RAID on db1064 as Resolved.

raid looks to be back after disk swap ...resolving

Tue, Oct 9, 5:31 PM · ops-eqiad, Operations
Cmjohnson moved T206500: Degraded RAID on db1067 from Backlog to Being worked on on the ops-eqiad board.
Tue, Oct 9, 5:26 PM · DBA, ops-eqiad, Operations
Cmjohnson added a comment to T206500: Degraded RAID on db1067.

@Marostegui disk swapped

Tue, Oct 9, 5:26 PM · DBA, ops-eqiad, Operations
Cmjohnson added a comment to T199125: rack/setup/install cloudvirt102[34].

I replaced the old raid card with the new one

Tue, Oct 9, 4:24 PM · cloud-services-team (Kanban), ops-eqiad, Cloud-VPS, Operations

Mon, Oct 8

Cmjohnson moved T206345: Degraded RAID on db1064 from Backlog to Being worked on on the ops-eqiad board.
Mon, Oct 8, 1:05 PM · ops-eqiad, Operations
Cmjohnson moved T198479: labvirt1009 HP Raid alert from Blocked to Being worked on on the ops-eqiad board.
Mon, Oct 8, 1:05 PM · cloud-services-team (Kanban), Operations, ops-eqiad, DC-Ops
Cmjohnson added a comment to T203244: analytics1068 doesn't boot.

This is one of the issues we have with leasing.....Dell has it so Farnam is the owner not us. I think it's sorted now and attempting to get it resolved.

Mon, Oct 8, 12:54 PM · ops-eqiad, Operations, Analytics

Fri, Oct 5

Cmjohnson moved T205364: helium (bacula) - Device not healthy -SMART- from Backlog to Being worked on on the ops-eqiad board.
Fri, Oct 5, 4:40 PM · ops-eqiad, Operations
Cmjohnson moved T206245: db1064 has disk smart error from Backlog to Being worked on on the ops-eqiad board.
Fri, Oct 5, 4:40 PM · DBA, Operations, ops-eqiad
Cmjohnson added a comment to T206245: db1064 has disk smart error.

Swapped the failed disk

Fri, Oct 5, 4:40 PM · DBA, Operations, ops-eqiad
Cmjohnson moved T206313: Degraded RAID on db1072 from Backlog to Being worked on on the ops-eqiad board.
Fri, Oct 5, 4:35 PM · DBA, ops-eqiad, Operations
Cmjohnson moved T206254: Degraded RAID on db1073 from Backlog to Being worked on on the ops-eqiad board.
Fri, Oct 5, 4:35 PM · DBA, ops-eqiad, Operations
Cmjohnson added a comment to T206254: Degraded RAID on db1073.

Failed disk has been swapped out

Fri, Oct 5, 4:35 PM · DBA, ops-eqiad, Operations
Cmjohnson added a comment to T206313: Degraded RAID on db1072.

Failed disk has been swapped out

Fri, Oct 5, 4:31 PM · DBA, ops-eqiad, Operations
Cmjohnson added a comment to T205364: helium (bacula) - Device not healthy -SMART-.

@akosiaris I found a spare 4TB SAS disk...replacing it now

Fri, Oct 5, 4:28 PM · ops-eqiad, Operations

Thu, Oct 4

Cmjohnson added a comment to T205364: helium (bacula) - Device not healthy -SMART-.

The disk was a spare...i didn't even look to see that it was a SATA disk.
This server is out of warranty and we'll need to buy 4TB SAS disks

Thu, Oct 4, 3:13 PM · ops-eqiad, Operations

Wed, Oct 3

Cmjohnson added a comment to T205364: helium (bacula) - Device not healthy -SMART-.

@Dzahn the disk was replaced but it's unconfigured good ....I have not tried to add it back but no success. can you give it a go please

Wed, Oct 3, 7:31 PM · ops-eqiad, Operations
Cmjohnson closed T206004: Degraded RAID on helium as Declined.

T

Wed, Oct 3, 7:29 PM · ops-eqiad, Operations

Tue, Oct 2

Cmjohnson added a comment to T204970: setup/install an-coord1001/wmf7621.

THe disks are now being seen by the contorller, this server was the spare we borrowed a cable from to work on cloudvirt1023. Re-connected the cable and not disks are showing up. I also set the bios boot order to boot from disks first.

Tue, Oct 2, 6:01 PM · User-Elukey, Patch-For-Review, ops-eqiad, Analytics, Operations
Cmjohnson added a comment to T205364: helium (bacula) - Device not healthy -SMART-.

Swapped the failed disk

Tue, Oct 2, 5:46 PM · ops-eqiad, Operations
Cmjohnson added a comment to T205986: cloudnet1004: spontaneous reboot.
  • The power cables are tight and have green LEDs, from a physical aspect no amber lights are flashing and everything appears normal.
Tue, Oct 2, 4:48 PM · DC-Ops, cloud-services-team
Cmjohnson added a comment to T196886: Replace wtp1043's sda.

@mortzm Is it safe to say that this can be resolved? Thanks!

Tue, Oct 2, 4:46 PM · Parsing-Team, DC-Ops, ops-eqiad, Operations
Cmjohnson added a comment to T196507: Degraded RAID on cloudvirt1019.

Updated HP that the status remains the same and that the 3rd battery they sent us still does not fix the problem.

Tue, Oct 2, 4:44 PM · ops-eqiad, Operations
Cmjohnson added a comment to T205514: db1092 crashed - BBU broken.

Waiting on the part still

Tue, Oct 2, 4:42 PM · User-Banyek, Operations, ops-eqiad, Patch-For-Review, DBA
Cmjohnson moved T205780: db1067 (enwiki master) disk #7 with errors from Backlog to Being worked on on the ops-eqiad board.
Tue, Oct 2, 4:42 PM · ops-eqiad, Operations, DBA
Cmjohnson added a comment to T205780: db1067 (enwiki master) disk #7 with errors.

The disk has been swapped

Tue, Oct 2, 4:41 PM · ops-eqiad, Operations, DBA

Thu, Sep 27

Cmjohnson added a comment to T205514: db1092 crashed - BBU broken.

the HP required AHS log has been uploaded to their dropbox. Waiting on their response.

Thu, Sep 27, 5:06 PM · User-Banyek, Operations, ops-eqiad, Patch-For-Review, DBA
Cmjohnson added a comment to T205253: db1069 has errored disk in slot 7.

@Marostegui The disk on slot 7 has been replaced, please resolve after rebuild

Thu, Sep 27, 5:05 PM · User-Banyek, DBA, ops-eqiad, DC-Ops, Operations
Cmjohnson closed T205034: apply hostname labels to an-coord1001/wmf7621, a subtask of T204970: setup/install an-coord1001/wmf7621, as Resolved.
Thu, Sep 27, 5:00 PM · User-Elukey, Patch-For-Review, ops-eqiad, Analytics, Operations
Cmjohnson closed T205034: apply hostname labels to an-coord1001/wmf7621 as Resolved.
Thu, Sep 27, 5:00 PM · ops-eqiad, Operations
Cmjohnson assigned T193655: rack/setup/install cloudstore1008 & cloudstore1009 to Bstorm.

Both of these servers are able to be installed. assigning to @Bstorm

Thu, Sep 27, 4:20 PM · cloud-services-team (Kanban), Patch-For-Review, ops-eqiad, Cloud-VPS, Operations
Cmjohnson moved T205034: apply hostname labels to an-coord1001/wmf7621 from Being worked on to Up next on the ops-eqiad board.
Thu, Sep 27, 3:30 PM · ops-eqiad, Operations
Cmjohnson closed T205284: Degraded RAID on rdb1004 as Resolved.

Replaced disk in slot 3

Thu, Sep 27, 3:23 PM · ops-eqiad, Operations
Cmjohnson added a comment to T196507: Degraded RAID on cloudvirt1019.

Sorry, they sent another new battery. Swapped the battery and let's see if it gets beyond recharging status

Thu, Sep 27, 3:22 PM · ops-eqiad, Operations
Cmjohnson moved T205507: Decommission analytics100[1,2] from Backlog to Decommission on the ops-eqiad board.
Thu, Sep 27, 3:15 PM · Patch-For-Review, Operations, ops-eqiad, decommission, User-Elukey, Analytics
Cmjohnson closed T204999: update label on an-master100[12].eqiad.wmnet as Resolved.
Thu, Sep 27, 3:15 PM · ops-eqiad, Operations
Cmjohnson closed T204999: update label on an-master100[12].eqiad.wmnet, a subtask of T201939: rack/setup/install an-master100[12].eqiad.wmnet, as Resolved.
Thu, Sep 27, 3:15 PM · Patch-For-Review, User-Elukey, Analytics, Operations
Cmjohnson placed T204970: setup/install an-coord1001/wmf7621 up for grabs.

@elukey everything looks good on our end I was able to access the server

Thu, Sep 27, 3:14 PM · User-Elukey, Patch-For-Review, ops-eqiad, Analytics, Operations
Cmjohnson updated the task description for T204970: setup/install an-coord1001/wmf7621.
Thu, Sep 27, 3:12 PM · User-Elukey, Patch-For-Review, ops-eqiad, Analytics, Operations
Cmjohnson moved T204999: update label on an-master100[12].eqiad.wmnet from Being worked on to Up next on the ops-eqiad board.
Thu, Sep 27, 2:50 PM · ops-eqiad, Operations

Wed, Sep 26

Cmjohnson moved T205514: db1092 crashed - BBU broken from Backlog to Being worked on on the ops-eqiad board.
Wed, Sep 26, 2:32 PM · User-Banyek, Operations, ops-eqiad, Patch-For-Review, DBA
Cmjohnson added a comment to T205514: db1092 crashed - BBU broken.

A support ticket has been submitted with HPE

Wed, Sep 26, 2:31 PM · User-Banyek, Operations, ops-eqiad, Patch-For-Review, DBA

Tue, Sep 25

Cmjohnson updated the task description for T193655: rack/setup/install cloudstore1008 & cloudstore1009.
Tue, Sep 25, 7:17 PM · cloud-services-team (Kanban), Patch-For-Review, ops-eqiad, Cloud-VPS, Operations
Cmjohnson updated the task description for T193655: rack/setup/install cloudstore1008 & cloudstore1009.
Tue, Sep 25, 7:17 PM · cloud-services-team (Kanban), Patch-For-Review, ops-eqiad, Cloud-VPS, Operations
Cmjohnson added a comment to T193655: rack/setup/install cloudstore1008 & cloudstore1009.

For cloudstore1008, I updated asw2-a5-eqiad to put this server in the public vlan. Everything was accepted like normal but when I display inheritance it's not showing up in that vlan. When I search the ports in the public vlan the port shows as being there.

Tue, Sep 25, 7:17 PM · cloud-services-team (Kanban), Patch-For-Review, ops-eqiad, Cloud-VPS, Operations
Cmjohnson added a comment to T193655: rack/setup/install cloudstore1008 & cloudstore1009.

@Bstorm cloudstore1008 and 1009 were in the wrong vlans on the switch port. I updated the ports. you should be able to get the installer now

Tue, Sep 25, 6:11 PM · cloud-services-team (Kanban), Patch-For-Review, ops-eqiad, Cloud-VPS, Operations
Cmjohnson placed T193655: rack/setup/install cloudstore1008 & cloudstore1009 up for grabs.

please let me know the partman recipe you want current labstore1006/7 is dumps-distribution-100x.cfg

Tue, Sep 25, 5:26 PM · cloud-services-team (Kanban), Patch-For-Review, ops-eqiad, Cloud-VPS, Operations
Cmjohnson moved T204970: setup/install an-coord1001/wmf7621 from Backlog to Racking Tasks on the ops-eqiad board.
Tue, Sep 25, 4:46 PM · User-Elukey, Patch-For-Review, ops-eqiad, Analytics, Operations
Cmjohnson moved T204999: update label on an-master100[12].eqiad.wmnet from Backlog to Being worked on on the ops-eqiad board.
Tue, Sep 25, 4:46 PM · ops-eqiad, Operations
Cmjohnson moved T205253: db1069 has errored disk in slot 7 from Backlog to Being worked on on the ops-eqiad board.
Tue, Sep 25, 4:45 PM · User-Banyek, DBA, ops-eqiad, DC-Ops, Operations
Cmjohnson moved T205284: Degraded RAID on rdb1004 from Backlog to Being worked on on the ops-eqiad board.
Tue, Sep 25, 4:45 PM · ops-eqiad, Operations
Cmjohnson added a comment to T205284: Degraded RAID on rdb1004.

The disks are hardware raided, 500GB SATA, The server is out of warranty but I have spares on-site that I can replace, it looks like it's the SATA disk in slot 3 . .

Tue, Sep 25, 4:45 PM · ops-eqiad, Operations
Cmjohnson moved T205034: apply hostname labels to an-coord1001/wmf7621 from Backlog to Being worked on on the ops-eqiad board.
Tue, Sep 25, 4:24 PM · ops-eqiad, Operations

Fri, Sep 21

Cmjohnson closed T205110: install additonal SSDs maps100[1-4] as Resolved.

Done

Fri, Sep 21, 3:37 PM · Operations, ops-eqiad
Cmjohnson created T205110: install additonal SSDs maps100[1-4].
Fri, Sep 21, 3:37 PM · Operations, ops-eqiad

Wed, Sep 19

Cmjohnson added a comment to T204812: mc1021 boot failure.

@MoritzMuehlenhoff I disabled icinga checks for this host

Wed, Sep 19, 5:28 PM · Operations, ops-eqiad
Cmjohnson added a comment to T204491: Heating alerts / memory errors on mw1254.

@MoritzMuehlenhoff I ended up just swapping the DIMM between side A and B....leaving open to see if it helps

Wed, Sep 19, 5:25 PM · Operations, ops-eqiad
Cmjohnson added a comment to T196507: Degraded RAID on cloudvirt1019.

latest update from HP...they are sending a new cable

Wed, Sep 19, 5:18 PM · ops-eqiad, Operations
Cmjohnson added a comment to T203244: analytics1068 doesn't boot.

Dell kicked it back again saying it's not our system. I will try calling them now

Wed, Sep 19, 5:17 PM · ops-eqiad, Operations, Analytics
Cmjohnson moved T204812: mc1021 boot failure from Backlog to Being worked on on the ops-eqiad board.
Wed, Sep 19, 5:17 PM · Operations, ops-eqiad
Cmjohnson closed T204743: Ensure scs-c1-eqiad:eth1 is not connected as Resolved.

IDK how or recall but I managed to plug in a second ethernet port on the mgmt ports of the scs....removed the eth1 connection and all is good now. Resolving

Wed, Sep 19, 5:16 PM · netops, ops-eqiad, Operations
Cmjohnson added a comment to T204812: mc1021 boot failure.

@MoritzMuehlenhoff This server was not set to legacy bios. I changed the setting to legacy bios and verified the SATA controller was enabled. I was able to boot to the HDD and the server had no issues going through the installer. I don't know how the setting was changed but it's fixed now. Please resolve once satisfied .

Wed, Sep 19, 5:12 PM · Operations, ops-eqiad

Tue, Sep 18

Cmjohnson added a comment to T203244: analytics1068 doesn't boot.

The Dell ticket was kicked back because the server was not owned by us but Farnham...that has been resolved and I resubmitted the dell service task.

Tue, Sep 18, 3:11 PM · ops-eqiad, Operations, Analytics

Sep 17 2018

Cmjohnson moved T199125: rack/setup/install cloudvirt102[34] from Being worked on to Blocked on the ops-eqiad board.
Sep 17 2018, 4:42 PM · cloud-services-team (Kanban), ops-eqiad, Cloud-VPS, Operations
jcrespo awarded T204493: db1061 management interface busy (no sessions allowed) a Mountain of Wealth token.
Sep 17 2018, 4:23 PM · ops-eqiad, Operations
Cmjohnson added a comment to T196507: Degraded RAID on cloudvirt1019.

sent HP an updated AHS log at their request

Sep 17 2018, 3:53 PM · ops-eqiad, Operations
Cmjohnson moved T204170: Rack/setup cr2-eqord from Backlog to Blocked on the ops-eqiad board.
Sep 17 2018, 3:52 PM · netops, ops-eqiad, Operations
Cmjohnson closed T204493: db1061 management interface busy (no sessions allowed) as Resolved.

I am done

Sep 17 2018, 3:52 PM · ops-eqiad, Operations
Cmjohnson closed T204493: db1061 management interface busy (no sessions allowed), a subtask of T204311: Upgrade all core (mediawiki) database servers to mariadb 10.1, as Resolved.
Sep 17 2018, 3:52 PM · Patch-For-Review, Operations, DBA
Cmjohnson moved T204462: Degraded disk on db1069 (x1 master) from Backlog to Being worked on on the ops-eqiad board.
Sep 17 2018, 3:51 PM · Operations, ops-eqiad, DBA
Cmjohnson added a comment to T204170: Rack/setup cr2-eqord.

@ayounsi I connected both mx204's I have in eqiad to the console and mgmt switch. cr2-eqord is on port 47 and the other is in port 48 and labled cr2-eqsin.

Sep 17 2018, 3:50 PM · netops, ops-eqiad, Operations
Cmjohnson updated the task description for T204170: Rack/setup cr2-eqord.
Sep 17 2018, 3:49 PM · netops, ops-eqiad, Operations
Cmjohnson moved T204479: Heating alerts on kafka1014 from Backlog to Being worked on on the ops-eqiad board.
Sep 17 2018, 3:15 PM · Operations, ops-eqiad
Cmjohnson added a comment to T204479: Heating alerts on kafka1014.

@elukey yes please stop the host and I will apply thermal paste

Sep 17 2018, 3:15 PM · Operations, ops-eqiad
Cmjohnson moved T204491: Heating alerts / memory errors on mw1254 from Backlog to Being worked on on the ops-eqiad board.
Sep 17 2018, 3:15 PM · Operations, ops-eqiad
Cmjohnson moved T204493: db1061 management interface busy (no sessions allowed) from Backlog to Being worked on on the ops-eqiad board.
Sep 17 2018, 3:14 PM · ops-eqiad, Operations
Cmjohnson added a comment to T204493: db1061 management interface busy (no sessions allowed).

@jcrespo I need to power this server off let me know when I can do this

Sep 17 2018, 3:14 PM · ops-eqiad, Operations
Cmjohnson added a comment to T196507: Degraded RAID on cloudvirt1019.

Updated HP that it still is giving me the same "recharging" message 4 days later.

Sep 17 2018, 3:12 PM · ops-eqiad, Operations

Sep 14 2018

Cmjohnson added a comment to T203244: analytics1068 doesn't boot.

created a ticket with Dell
You have successfully submitted request SR979751933.

Sep 14 2018, 5:06 PM · ops-eqiad, Operations, Analytics
Cmjohnson added a comment to T196507: Degraded RAID on cloudvirt1019.

icinga still shows battery recharging....let's give it the weekend

Sep 14 2018, 3:56 PM · ops-eqiad, Operations
Cmjohnson closed T204302: db1062 management interface busy (no sessions allowed), a subtask of T204311: Upgrade all core (mediawiki) database servers to mariadb 10.1, as Resolved.
Sep 14 2018, 3:20 PM · Patch-For-Review, Operations, DBA