RobH (Rob Halsell)
Operations Engineer

Projects (24)

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Sunday

  • Clear sailing ahead.

User Details

User Since
Nov 24 2014, 1:43 PM (194 w, 4 d)
Availability
Available
IRC Nick
RobH
LDAP User
RobH
MediaWiki User
RobH [ Global Accounts ]

My GPG Key fingerprint = CB1F C7E7 0FF8 5DB2 6820 9C7E 75ED 14C7 0245 D22A

I am an Operations Engineer on Wikimedia's Datacenter Operations Team.

I also am the primary triage engineer for the hardware-requests project, as well as the private S4 procurement space and procurement project.

All questions involving allocation of hardware can be initially addressed on https://wikitech.wikimedia.org/wiki/Operations_requests.

Please note that private message via phabricator is not my preferred contact means. Please feel free to contact me (robh) directly via irc/freenode, or email my @wikimedia.org email address.

Recent Activity

Today

RobH moved T196690: rack/setup/install dbproxy101[2-7].eqiad.wmnet from Blocked external/Not db team to Triage on the DBA board.
Fri, Aug 17, 4:55 PM · DBA, Operations
RobH placed T196690: rack/setup/install dbproxy101[2-7].eqiad.wmnet up for grabs.

These systems are now ready for the DBA team to take over and press into service. This can be taken over by @jcrespo or @Marostegui. I've not assigned to either since the DBA team triages their DBA tag.

Fri, Aug 17, 4:55 PM · DBA, Operations
RobH added a comment to T199125: rack/setup/install cloudvirt102[34].

Ok,

@RobH let's assume we won't be using the 2x10G NICs in the short-mid term.
How many 1G NICs do these servers have? are they disabled in BIOS?

Fri, Aug 17, 4:23 PM · cloud-services-team (Kanban), ops-eqiad, Cloud-VPS, Operations
RobH added a comment to T199125: rack/setup/install cloudvirt102[34].

I've emailed Dell to see what our other 10G network card options are:

Fri, Aug 17, 4:21 PM · cloud-services-team (Kanban), ops-eqiad, Cloud-VPS, Operations

Yesterday

RobH added a comment to T196030: troubleshoot cr3/cr4 link.
Link-level type: Flexible-Ethernet, MTU: 9192, MRU: 9200, Speed: 40Gbps, BPDU Error: None, Loop Detect PDU Error: None, Loopback: Disabled, Source filtering: Disabled, Flow control: Enabled
Thu, Aug 16, 9:21 PM · Operations, ops-ulsfo, netops, Traffic
RobH reassigned T196886: Replace wtp1043's sda from RobH to Cmjohnson.

Dell fixed the ownership info for us, you can put in requests for support and parts now.

Thu, Aug 16, 6:53 PM · DC-Ops, ops-eqiad, Operations
RobH added a member for acl*sre-team: jijiki.
Thu, Aug 16, 6:19 PM
RobH renamed acl*sre-team from acl*operations-team to acl*sre-team.
Thu, Aug 16, 6:15 PM
RobH updated the task description for T196690: rack/setup/install dbproxy101[2-7].eqiad.wmnet.
Thu, Aug 16, 6:00 PM · DBA, Operations
RobH triaged T201444: Refresh switch ports descriptions for recently renamed cloud servers as Normal priority.
Thu, Aug 16, 4:01 PM · Operations, netops, cloud-services-team, DC-Ops
RobH added a comment to T196886: Replace wtp1043's sda.

I did not check this, just didn't notice it assigned to me. The Tech Direct doesn't work, was normal support attempted? I've emailed our team, & CCed Chris.

Thu, Aug 16, 3:01 PM · DC-Ops, ops-eqiad, Operations

Wed, Aug 15

RobH updated the task description for T196690: rack/setup/install dbproxy101[2-7].eqiad.wmnet.
Wed, Aug 15, 9:05 PM · DBA, Operations
RobH updated the task description for T168559: decom silver (was silver has trouble rebooting).
Wed, Aug 15, 8:48 PM · decommission, Operations
RobH reopened T168559: decom silver (was silver has trouble rebooting) as "Open".
Wed, Aug 15, 8:47 PM · decommission, Operations
RobH closed T168559: decom silver (was silver has trouble rebooting) as Resolved.
Wed, Aug 15, 8:47 PM · decommission, Operations
RobH updated the task description for T196690: rack/setup/install dbproxy101[2-7].eqiad.wmnet.
Wed, Aug 15, 8:44 PM · DBA, Operations
RobH reassigned T196690: rack/setup/install dbproxy101[2-7].eqiad.wmnet from RobH to Cmjohnson.

Ok, @ayounsi and I tracked down this issue.

Wed, Aug 15, 8:43 PM · DBA, Operations
RobH added a comment to T196690: rack/setup/install dbproxy101[2-7].eqiad.wmnet.

Odd issue attempting to pxe boot dbproxy1016. It gets no free leases from dhcp, so it cannot then be served the tftp image since its not getting an IP address assignemnt.

Wed, Aug 15, 8:21 PM · DBA, Operations
RobH added a comment to T196690: rack/setup/install dbproxy101[2-7].eqiad.wmnet.

So dbproxy1015 drac isn't responsive to network, and dbproxy1017 has a media check failure when attempting to boot PXE.

Wed, Aug 15, 8:18 PM · DBA, Operations
RobH updated the task description for T196690: rack/setup/install dbproxy101[2-7].eqiad.wmnet.
Wed, Aug 15, 7:27 PM · DBA, Operations
RobH removed a project from T196690: rack/setup/install dbproxy101[2-7].eqiad.wmnet: Patch-For-Review.
Wed, Aug 15, 7:17 PM · DBA, Operations

Tue, Aug 14

RobH added a comment to T201939: rack/setup/install analytics-master100[12].eqiad.wmnet.

@Ottomata the name is entirely too long for labels and tracking. can we shorten it a bit?

Tue, Aug 14, 6:03 PM · Patch-For-Review, ops-eqiad, Analytics, Operations
RobH renamed T201939: rack/setup/install analytics-master100[12].eqiad.wmnet from rack/setup/install 2 new hadoop master/standby systems in eqiad to rack/setup/install analytics-master100[12].eqiad.wmnet.
Tue, Aug 14, 4:58 PM · Patch-For-Review, ops-eqiad, Analytics, Operations
RobH placed T201942: jessie support for QLogic FastLinQ 41112 Dual Port 10Gb SFP+ Adapter up for grabs.
Tue, Aug 14, 4:49 PM · cloud-services-team (Kanban), ops-eqiad, Cloud-VPS, Operations
RobH reassigned T201939: rack/setup/install analytics-master100[12].eqiad.wmnet from elukey to Ottomata.
Tue, Aug 14, 4:46 PM · Patch-For-Review, ops-eqiad, Analytics, Operations
RobH added a comment to T201939: rack/setup/install analytics-master100[12].eqiad.wmnet.
Tue, Aug 14, 4:46 PM · Patch-For-Review, ops-eqiad, Analytics, Operations
RobH triaged T201939: rack/setup/install analytics-master100[12].eqiad.wmnet as Normal priority.
Tue, Aug 14, 4:42 PM · Patch-For-Review, ops-eqiad, Analytics, Operations

Fri, Aug 10

RobH reassigned T193420: Decommission hafnium from RobH to Cmjohnson.
Fri, Aug 10, 10:38 PM · decommission, Performance-Team (Radar), ops-eqiad, Operations
RobH removed a project from T193420: Decommission hafnium: Patch-For-Review.
Fri, Aug 10, 10:38 PM · decommission, Performance-Team (Radar), ops-eqiad, Operations
RobH updated the task description for T193420: Decommission hafnium.
Fri, Aug 10, 10:17 PM · decommission, Performance-Team (Radar), ops-eqiad, Operations
RobH claimed T193420: Decommission hafnium.
Fri, Aug 10, 10:14 PM · decommission, Performance-Team (Radar), ops-eqiad, Operations
RobH moved T201440: Decommission labtestnet2001.codfw.wmnet from Backlog to Decommission on the ops-codfw board.
Fri, Aug 10, 8:18 PM · Patch-For-Review, ops-codfw, decommission, cloud-services-team, Operations
RobH moved T201440: Decommission labtestnet2001.codfw.wmnet from Backlog to pending onsite steps (codfw) on the decommission board.
Fri, Aug 10, 8:18 PM · Patch-For-Review, ops-codfw, decommission, cloud-services-team, Operations
RobH reassigned T201440: Decommission labtestnet2001.codfw.wmnet from RobH to Papaul.
Fri, Aug 10, 8:18 PM · Patch-For-Review, ops-codfw, decommission, cloud-services-team, Operations
RobH triaged T201440: Decommission labtestnet2001.codfw.wmnet as Normal priority.
Fri, Aug 10, 8:07 PM · Patch-For-Review, ops-codfw, decommission, cloud-services-team, Operations
RobH added a comment to T199675: cp5001 unreachable since 2018-07-14 17:49:21.

Engineers (Wong Kee Heng & Kelvin Goh Keng Yew) from Unisys (sub-contracted by Dell for Pro support) will be onsite on Monday, August 13th between 1500 and 1700 Singapore local.

Fri, Aug 10, 3:49 PM · Operations, ops-eqsin, Traffic

Thu, Aug 9

RobH assigned T201522: Decommission chromium and hydrogen to Cmjohnson.
Thu, Aug 9, 9:43 PM · decommission, Traffic, ops-eqiad, Operations
RobH removed a project from T201522: Decommission chromium and hydrogen: Patch-For-Review.
Thu, Aug 9, 9:43 PM · decommission, Traffic, ops-eqiad, Operations
RobH removed a project from T201522: Decommission chromium and hydrogen: Patch-For-Review.
Thu, Aug 9, 9:32 PM · decommission, Traffic, ops-eqiad, Operations
RobH added a comment to T200203: labvirt1003 raid warning.

Ok, so we'll need to buy some 300GB SFF SAS disks, correct? I'll create a procurement task and link to this.

Thu, Aug 9, 5:08 PM · cloud-services-team, ops-eqiad, DC-Ops, Operations
RobH closed T199557: LDAP access to the wmf group for Pats Pena as Resolved.

modified entry of patspena to ppena, was a bad oversight/typo/mistake.

Thu, Aug 9, 5:04 PM · Patch-For-Review, LDAP-Access-Requests
RobH added a comment to T199675: cp5001 unreachable since 2018-07-14 17:49:21.

Scheduled visit for next monday, EQ ticket 1-165318922260. Dell dispatch 91912127436

Thu, Aug 9, 2:47 PM · Operations, ops-eqsin, Traffic

Wed, Aug 8

Dzahn awarded T201343: rack/setup/install mwmaint1002.eqiad.wmnet a Goat token.
Wed, Aug 8, 5:21 PM · Patch-For-Review, ops-eqiad, Operations
RobH updated the task description for T201522: Decommission chromium and hydrogen.
Wed, Aug 8, 4:43 PM · decommission, Traffic, ops-eqiad, Operations
RobH added a project to T201522: Decommission chromium and hydrogen: decommission.
Wed, Aug 8, 4:42 PM · decommission, Traffic, ops-eqiad, Operations
RobH reassigned T196685: rack/setup/install rdb10[09|10].eqiad.wmnet from RobH to elukey.

So this should likely get assigned to either @elukey or @Joe, and since Luca commented, to him it goes!

Wed, Aug 8, 12:11 AM · User-Joe, User-Elukey, Operations
RobH removed projects from T196685: rack/setup/install rdb10[09|10].eqiad.wmnet: Patch-For-Review, ops-eqiad.
Wed, Aug 8, 12:10 AM · User-Joe, User-Elukey, Operations

Tue, Aug 7

RobH added a comment to T196685: rack/setup/install rdb10[09|10].eqiad.wmnet.
Tue, Aug 7, 11:16 PM · User-Joe, User-Elukey, Operations
RobH closed T196787: Deactivate Chad's Racktables account as Resolved.

done

Tue, Aug 7, 9:48 PM · Operations
RobH added a comment to T199125: rack/setup/install cloudvirt102[34].

paste of the lspci output:

Tue, Aug 7, 9:46 PM · cloud-services-team (Kanban), ops-eqiad, Cloud-VPS, Operations
RobH created P7435 cloudvirt1023 lspci during install.
Tue, Aug 7, 9:45 PM
RobH updated the task description for T201439: rename/reimage labnodepool1002.eqiad.wmnet as cloudservices1003.wikimedia.org.
Tue, Aug 7, 7:21 PM · Patch-For-Review, cloud-services-team (Kanban), Cloud-VPS, Operations
RobH changed the edit policy for P7432 Server Decommission Checklist.
Tue, Aug 7, 6:28 PM · decommission
RobH edited Description on decommission.
Tue, Aug 7, 6:28 PM
RobH edited Description on decommission.
Tue, Aug 7, 6:28 PM
RobH created P7432 Server Decommission Checklist.
Tue, Aug 7, 6:27 PM · decommission
RobH reassigned T196701: rack/setup/install torrelay1001.wikimedia.org from RobH to Dzahn.

IRC Sync/Update:

Tue, Aug 7, 6:26 PM · Tor, Operations
RobH removed a project from T196701: rack/setup/install torrelay1001.wikimedia.org: Patch-For-Review.
Tue, Aug 7, 6:25 PM · Tor, Operations
RobH added a project to T201444: Refresh switch ports descriptions for recently renamed cloud servers: netops.
Tue, Aug 7, 5:54 PM · Operations, netops, cloud-services-team, DC-Ops
RobH renamed T201439: rename/reimage labnodepool1002.eqiad.wmnet as cloudservices1003.wikimedia.org from Rename labnodepool1002.eqiad.wmnet as cloudservices1003.eqiad.wmnet to rename/reimage labnodepool1002.eqiad.wmnet as cloudservices1003.eqiad.wmnet.
Tue, Aug 7, 5:42 PM · Patch-For-Review, cloud-services-team (Kanban), Cloud-VPS, Operations
RobH renamed T201367: rack/setup/add to spares tracking 2 dual cpu misc system from rack/setup/add to spares tracking 1 dual cpu misc system to rack/setup/add to spares tracking 2 dual cpu misc system.
Tue, Aug 7, 5:28 PM · ops-eqiad, Operations
RobH updated the task description for T201341: rack/setup/install cloudservices1004.wikimedia.org.
Tue, Aug 7, 5:26 PM · ops-eqiad, Operations
RobH added a comment to T168407: rack/setup/install labnodepool1002.eqiad.wmnet.

I'm not exactly sure what is being done here? It seems that labnodepool1002 no longer needs to serve in that role, and will be assigned a new hostname for another role?

Tue, Aug 7, 4:59 PM · cloud-services-team (Kanban), Cloud-VPS, Operations
RobH removed a project from T199125: rack/setup/install cloudvirt102[34]: Patch-For-Review.
Tue, Aug 7, 4:52 PM · cloud-services-team (Kanban), ops-eqiad, Cloud-VPS, Operations
RobH added a comment to T199125: rack/setup/install cloudvirt102[34].

So, I've gone ahead and updated the puppet repo for the installation, and they successfully PXE boot into the jessie installer. Unfortuantely, there is where we hit an issue.

Tue, Aug 7, 4:52 PM · cloud-services-team (Kanban), ops-eqiad, Cloud-VPS, Operations

Mon, Aug 6

RobH triaged T201367: rack/setup/add to spares tracking 2 dual cpu misc system as Normal priority.
Mon, Aug 6, 10:23 PM · ops-eqiad, Operations
RobH triaged T201366: rack/setup/install scandium.eqiad.wmnet (parsoid test box) as High priority.
Mon, Aug 6, 10:21 PM · Patch-For-Review, ops-eqiad, Parsoid, Operations
RobH triaged T201364: rack/setup/install sulfur.wikimedia.org as Normal priority.
Mon, Aug 6, 10:17 PM · ops-eqiad, Operations
RobH moved T191153: decom bast1001 from Backlog to pending onsite steps (eqiad) on the decommission board.
Mon, Aug 6, 7:20 PM · ops-eqiad, decommission, Operations
RobH assigned T191153: decom bast1001 to Cmjohnson.
Mon, Aug 6, 7:20 PM · ops-eqiad, decommission, Operations
RobH removed a project from T191153: decom bast1001: Patch-For-Review.
Mon, Aug 6, 7:16 PM · ops-eqiad, decommission, Operations
RobH triaged T201346: rack/setup/install clustermgmt1001.eqiad.wmnet (new cumin master) as Normal priority.
Mon, Aug 6, 6:58 PM · ops-eqiad, Operations-Software-Development, Operations
RobH triaged T201344: rack/setup/install icinga1001.wikimedia.org as High priority.
Mon, Aug 6, 6:55 PM · ops-eqiad, monitoring, Operations
RobH triaged T201343: rack/setup/install mwmaint1002.eqiad.wmnet as Normal priority.
Mon, Aug 6, 6:51 PM · Patch-For-Review, ops-eqiad, Operations
RobH updated the task description for T201342: rack/setup/install puppetmaster1003.eqiad.wmnet.
Mon, Aug 6, 6:48 PM · ops-eqiad, Operations
RobH triaged T201342: rack/setup/install puppetmaster1003.eqiad.wmnet as Normal priority.
Mon, Aug 6, 6:46 PM · ops-eqiad, Operations
RobH closed Unknown Object (Task), a subtask of T199578: Designate (DNS) integration with Neutron, as Resolved.
Mon, Aug 6, 6:24 PM · Patch-For-Review, Epic, Cloud-Services
RobH added a parent task for T201341: rack/setup/install cloudservices1004.wikimedia.org: Unknown Object (Task).
Mon, Aug 6, 6:23 PM · ops-eqiad, Operations
RobH triaged T201341: rack/setup/install cloudservices1004.wikimedia.org as Normal priority.
Mon, Aug 6, 6:23 PM · ops-eqiad, Operations
RobH closed Unknown Object (Task), a subtask of T196485: WDQS diskspace is low, as Resolved.
Mon, Aug 6, 6:06 PM · Operations, Discovery, Wikidata, Wikidata-Query-Service
RobH assigned T196252: Labservices1001 crashing, probable overheating to Andrew.

So, this thread is mildly confusing. From what I can see, labservices1001 (warranty expired 2017-04), had its thermal paste replaced at a previous time

Mon, Aug 6, 4:42 PM · Patch-For-Review, ops-eqiad, cloud-services-team, Operations
RobH added a comment to T199675: cp5001 unreachable since 2018-07-14 17:49:21.

Dell finally replied back to me (3 days later) giving me a list of 4 engineers to go onsite. They keep doing that (listing more than are going.) So now I have to figure out which they will send with them and file the proper ticket.

Mon, Aug 6, 3:25 PM · Operations, ops-eqsin, Traffic

Fri, Aug 3

RobH removed a project from T196691: rack/setup/install dns100[12].wikimedia.org: ops-eqiad.
Fri, Aug 3, 10:39 PM · Patch-For-Review, DNS, Operations, Traffic
RobH reassigned T196691: rack/setup/install dns100[12].wikimedia.org from RobH to BBlack.

So these two systems fail their puppet runs, but fail for the following:

Fri, Aug 3, 10:39 PM · Patch-For-Review, DNS, Operations, Traffic
RobH removed a project from T196691: rack/setup/install dns100[12].wikimedia.org: Patch-For-Review.
Fri, Aug 3, 10:16 PM · Patch-For-Review, DNS, Operations, Traffic
RobH added a comment to T200706: rack/setup/install centrallog1001.eqiad.wmnet.

@fgiunchedi: You were the SRE team member to provide feedback regarding the disk capacity, so I'm assuming you would be the service owner. If this isn't correct, please comment/assign back to me/assign to service owner as needed.

Fri, Aug 3, 9:55 PM · User-fgiunchedi, Operations
RobH reassigned T200706: rack/setup/install centrallog1001.eqiad.wmnet from RobH to fgiunchedi.
Fri, Aug 3, 9:54 PM · User-fgiunchedi, Operations
RobH removed a project from T200706: rack/setup/install centrallog1001.eqiad.wmnet: Patch-For-Review.
Fri, Aug 3, 8:11 PM · User-fgiunchedi, Operations
RobH updated the task description for T196484: rack/setup/install graphite1004.
Fri, Aug 3, 6:27 PM · Patch-For-Review, User-fgiunchedi, monitoring, Operations
RobH reassigned T196484: rack/setup/install graphite1004 from RobH to fgiunchedi.

Please note I set this to role spare, since I wasn't sure if setting it to any other role may produce logging spam/traffic/alerts to the other graphite hosts. When in doubt, go for the smaller impact role choice before service implementation.

Fri, Aug 3, 6:27 PM · Patch-For-Review, User-fgiunchedi, monitoring, Operations
RobH reassigned T200203: labvirt1003 raid warning from RobH to Cmjohnson.

I see we have a bunch of sprae: Intel 320 Series SSDSA2CW300G3 2.5" 300GB

Fri, Aug 3, 5:47 PM · cloud-services-team, ops-eqiad, Operations, DC-Ops
RobH assigned T196698: rack/setup/install auth1002 to MoritzMuehlenhoff.

@MoritzMuehlenhoff: It is my understanding that you are the primary person that handles the authentication servers. (If not, please correct me!)

Fri, Aug 3, 5:39 PM · Patch-For-Review, Operations
RobH placed T196698: rack/setup/install auth1002 up for grabs.
Fri, Aug 3, 5:37 PM · Patch-For-Review, Operations
RobH updated the task description for T196698: rack/setup/install auth1002.
Fri, Aug 3, 4:37 PM · Patch-For-Review, Operations
RobH removed a project from T196698: rack/setup/install auth1002: Patch-For-Review.
Fri, Aug 3, 4:37 PM · Patch-For-Review, Operations
RobH added a comment to T201185: Jmorgan production ssh revokation/replacement (due to key in use in production and cloud).

Just to clarify, the fix for this is easy:

Fri, Aug 3, 4:13 PM · Patch-For-Review, Operations, SRE-Access-Requests
RobH removed a project from T201185: Jmorgan production ssh revokation/replacement (due to key in use in production and cloud): Patch-For-Review.
Fri, Aug 3, 4:07 PM · Patch-For-Review, Operations, SRE-Access-Requests
RobH moved T201185: Jmorgan production ssh revokation/replacement (due to key in use in production and cloud) from Backlog to Awaiting User Input on the SRE-Access-Requests board.
Fri, Aug 3, 4:07 PM · Patch-For-Review, Operations, SRE-Access-Requests
RobH triaged T201185: Jmorgan production ssh revokation/replacement (due to key in use in production and cloud) as High priority.
Fri, Aug 3, 3:59 PM · Patch-For-Review, Operations, SRE-Access-Requests
RobH added a comment to T201122: Cannot login to Wikitech w. my LDAP account.

It seems like you are unable to login, due to your 2FA token being invalid. Before we reset things on this end, it is my understanding 2FA via authenticator apps can become unsynced if your phone (running the app) has an incorrect date/time.

Fri, Aug 3, 3:05 AM · User-Addshore, wikitech.wikimedia.org, WMDE-Analytics-Engineering, User-GoranSMilovanovic