Page MenuHomePhabricator

RobH (Rob Halsell)
Senior Data Center EngineerAdministrator

Projects (18)

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Saturday

  • Clear sailing ahead.

User Details

User Since
Nov 24 2014, 1:43 PM (402 w, 2 d)
Roles
Administrator
Availability
Available
IRC Nick
RobH
LDAP User
RobH
MediaWiki User
RobH [ Global Accounts ]

My GPG Key fingerprint = CB1F C7E7 0FF8 5DB2 6820 9C7E 75ED 14C7 0245 D22A

I am an Senior Data Center Engineer on Wikimedia's Data Center SRE Team.

Please note that private message via phabricator is not my preferred contact means. Please feel free to contact me (robh) directly via irc/freenode, or email my @wikimedia.org email address.

Recent Activity

Thu, Aug 4

RobH assigned T314587: Q1:rack/setup/install new machine learning hosts to Jclark-ctr.
Thu, Aug 4, 3:37 PM · SRE, Machine-Learning-Team, ops-eqiad, DC-Ops
RobH updated the task description for T314587: Q1:rack/setup/install new machine learning hosts.
Thu, Aug 4, 3:37 PM · SRE, Machine-Learning-Team, ops-eqiad, DC-Ops
RobH added a parent task for T314587: Q1:rack/setup/install new machine learning hosts: Unknown Object (Task).
Thu, Aug 4, 3:34 PM · SRE, Machine-Learning-Team, ops-eqiad, DC-Ops
RobH created T314587: Q1:rack/setup/install new machine learning hosts.
Thu, Aug 4, 3:32 PM · SRE, Machine-Learning-Team, ops-eqiad, DC-Ops

Wed, Aug 3

RobH reassigned T313960: Q1:rack/setup/install kafka-logging100[45] from RobH to Jclark-ctr.
Wed, Aug 3, 4:27 PM · SRE Observability, observability, SRE, ops-eqiad, DC-Ops

Mon, Aug 1

RobH added a parent task for T314335: Q1:rack/setup/install druid10[09-11]: Unknown Object (Task).
Mon, Aug 1, 8:12 PM · Data Engineering Planning, SRE, ops-eqiad, DC-Ops
RobH updated the task description for T314335: Q1:rack/setup/install druid10[09-11].
Mon, Aug 1, 8:12 PM · Data Engineering Planning, SRE, ops-eqiad, DC-Ops
RobH moved T314335: Q1:rack/setup/install druid10[09-11] from Backlog to Racking Tasks on the ops-eqiad board.
Mon, Aug 1, 8:11 PM · Data Engineering Planning, SRE, ops-eqiad, DC-Ops
RobH created T314335: Q1:rack/setup/install druid10[09-11].
Mon, Aug 1, 8:10 PM · Data Engineering Planning, SRE, ops-eqiad, DC-Ops
RobH moved T314303: Q1:rack/setup/install ganeti103[34] from Backlog to Racking Tasks on the ops-eqiad board.
Mon, Aug 1, 2:52 PM · Infrastructure-Foundations, SRE, ops-eqiad, DC-Ops
RobH added a parent task for T314303: Q1:rack/setup/install ganeti103[34]: Unknown Object (Task).
Mon, Aug 1, 2:52 PM · Infrastructure-Foundations, SRE, ops-eqiad, DC-Ops
RobH created T314303: Q1:rack/setup/install ganeti103[34].
Mon, Aug 1, 2:52 PM · Infrastructure-Foundations, SRE, ops-eqiad, DC-Ops

Fri, Jul 29

RobH moved T314160: Q1:rack/setup/install kafka-stretch200[12] from Backlog to Racking Tasks on the ops-codfw board.
Fri, Jul 29, 3:20 PM · Data Engineering Planning, ops-codfw, SRE, DC-Ops
RobH created T314160: Q1:rack/setup/install kafka-stretch200[12].
Fri, Jul 29, 3:20 PM · Data Engineering Planning, ops-codfw, SRE, DC-Ops
RobH added a parent task for T314156: Q1:rack/setup/install kafka-stretch100[12]: Unknown Object (Task).
Fri, Jul 29, 3:12 PM · Data Engineering Planning, SRE, ops-eqiad, DC-Ops
RobH moved T314156: Q1:rack/setup/install kafka-stretch100[12] from Backlog to Racking Tasks on the ops-eqiad board.
Fri, Jul 29, 3:12 PM · Data Engineering Planning, SRE, ops-eqiad, DC-Ops
RobH created T314156: Q1:rack/setup/install kafka-stretch100[12].
Fri, Jul 29, 3:12 PM · Data Engineering Planning, SRE, ops-eqiad, DC-Ops

Thu, Jul 28

RobH added a parent task for T306939: Q4:(Need By: TBD) rack/setup/install kafka-jumbo101[0-5]: Unknown Object (Task).
Thu, Jul 28, 2:59 PM · SRE, ops-eqiad, DC-Ops

Wed, Jul 27

RobH moved T313983: Q1:rack/setup/install cloudvirt10[54-61].eqiad.wmnet from Backlog to Racking Tasks on the ops-eqiad board.
Wed, Jul 27, 7:54 PM · SRE, cloud-services-team (Hardware), ops-eqiad, DC-Ops
RobH added a parent task for T313983: Q1:rack/setup/install cloudvirt10[54-61].eqiad.wmnet: Unknown Object (Task).
Wed, Jul 27, 7:53 PM · SRE, cloud-services-team (Hardware), ops-eqiad, DC-Ops
RobH created T313983: Q1:rack/setup/install cloudvirt10[54-61].eqiad.wmnet.
Wed, Jul 27, 7:53 PM · SRE, cloud-services-team (Hardware), ops-eqiad, DC-Ops
RobH added a parent task for T313979: Q1:rack/setup/install db218[34]: Unknown Object (Task).
Wed, Jul 27, 7:33 PM · Data-Persistence-Backup, SRE, ops-codfw, DC-Ops
RobH moved T313979: Q1:rack/setup/install db218[34] from Backlog to Racking Tasks on the ops-codfw board.
Wed, Jul 27, 7:33 PM · Data-Persistence-Backup, SRE, ops-codfw, DC-Ops
RobH created T313979: Q1:rack/setup/install db218[34].
Wed, Jul 27, 7:32 PM · Data-Persistence-Backup, SRE, ops-codfw, DC-Ops
RobH moved T313978: Q1:rack/setup/install db119[67] from Backlog to Racking Tasks on the ops-eqiad board.
Wed, Jul 27, 7:29 PM · SRE, Data-Persistence-Backup, ops-eqiad, DC-Ops
RobH added a parent task for T313978: Q1:rack/setup/install db119[67]: Unknown Object (Task).
Wed, Jul 27, 7:29 PM · SRE, Data-Persistence-Backup, ops-eqiad, DC-Ops
RobH created T313978: Q1:rack/setup/install db119[67].
Wed, Jul 27, 7:29 PM · SRE, Data-Persistence-Backup, ops-eqiad, DC-Ops
RobH moved T313963: Q1:rack/setup/install new eqiad memcached hosts from Backlog to Racking Tasks on the ops-eqiad board.
Wed, Jul 27, 6:37 PM · ops-eqiad, SRE, serviceops, DC-Ops
RobH removed a parent task for T313968: codfw (2) memcached host service implementation tracking: Unknown Object (Task).
Wed, Jul 27, 6:36 PM · serviceops, SRE
RobH added a parent task for T313966: Q1:rack/setup/install new codfw memcached hosts: Unknown Object (Task).
Wed, Jul 27, 6:36 PM · serviceops, SRE, ops-codfw, DC-Ops
RobH added a parent task for T313968: codfw (2) memcached host service implementation tracking: Unknown Object (Task).
Wed, Jul 27, 6:35 PM · serviceops, SRE
RobH created T313968: codfw (2) memcached host service implementation tracking.
Wed, Jul 27, 6:35 PM · serviceops, SRE
RobH moved T313966: Q1:rack/setup/install new codfw memcached hosts from Backlog to Racking Tasks on the ops-codfw board.

Reassigning this to you per our IRC discussion. Pending needs from you/serviceops:

Wed, Jul 27, 6:33 PM · serviceops, SRE, ops-codfw, DC-Ops
RobH created T313966: Q1:rack/setup/install new codfw memcached hosts.
Wed, Jul 27, 6:33 PM · serviceops, SRE, ops-codfw, DC-Ops
RobH renamed T313965: eqiad (2) memcached host for wikifunctions service implementation tracking from codfw (2) memcached host service implementation tracking to eqiad (2) memcached host service implementation tracking.
Wed, Jul 27, 6:32 PM · SRE, serviceops
RobH renamed T313963: Q1:rack/setup/install new eqiad memcached hosts from Q1:rack/setup/install new codfw memcached hosts to Q1:rack/setup/install new eqiad memcached hosts.
Wed, Jul 27, 6:31 PM · ops-eqiad, SRE, serviceops, DC-Ops
RobH updated subscribers of T313963: Q1:rack/setup/install new eqiad memcached hosts.
Wed, Jul 27, 6:31 PM · ops-eqiad, SRE, serviceops, DC-Ops
RobH edited projects for T313963: Q1:rack/setup/install new eqiad memcached hosts, added: ops-eqiad; removed ops-codfw.
Wed, Jul 27, 6:31 PM · ops-eqiad, SRE, serviceops, DC-Ops
RobH created T313965: eqiad (2) memcached host for wikifunctions service implementation tracking.
Wed, Jul 27, 6:29 PM · SRE, serviceops
RobH renamed T313963: Q1:rack/setup/install new eqiad memcached hosts from Q1:rack/setup/install new memcached hosts to Q1:rack/setup/install new codfw memcached hosts.
Wed, Jul 27, 6:28 PM · ops-eqiad, SRE, serviceops, DC-Ops
RobH added a parent task for T313963: Q1:rack/setup/install new eqiad memcached hosts: Unknown Object (Task).
Wed, Jul 27, 6:28 PM · ops-eqiad, SRE, serviceops, DC-Ops
RobH moved T313963: Q1:rack/setup/install new eqiad memcached hosts from Backlog to Racking Tasks on the ops-codfw board.

Reassigning this to you per our IRC discussion. Pending needs from you/serviceops:

Wed, Jul 27, 6:28 PM · ops-eqiad, SRE, serviceops, DC-Ops
RobH removed a project from T313963: Q1:rack/setup/install new eqiad memcached hosts: masz.
Wed, Jul 27, 6:25 PM · ops-eqiad, SRE, serviceops, DC-Ops
RobH created T313963: Q1:rack/setup/install new eqiad memcached hosts.
Wed, Jul 27, 6:25 PM · ops-eqiad, SRE, serviceops, DC-Ops
RobH moved T313960: Q1:rack/setup/install kafka-logging100[45] from Backlog to Racking Tasks on the ops-eqiad board.

The ordering task lacked racking details, but since we had all the info for the codfw kafka-logging order already, I was able to figure out most of them.

Wed, Jul 27, 5:46 PM · SRE Observability, observability, SRE, ops-eqiad, DC-Ops
RobH added a parent task for T313960: Q1:rack/setup/install kafka-logging100[45]: Unknown Object (Task).
Wed, Jul 27, 5:45 PM · SRE Observability, observability, SRE, ops-eqiad, DC-Ops
RobH created T313960: Q1:rack/setup/install kafka-logging100[45].
Wed, Jul 27, 5:45 PM · SRE Observability, observability, SRE, ops-eqiad, DC-Ops
RobH moved T313959: Q1:rack/setup/install kafka-logging200[45] from Backlog to Racking Tasks on the ops-codfw board.
Wed, Jul 27, 5:41 PM · SRE Observability, observability, SRE, ops-codfw, DC-Ops
RobH added a parent task for T313959: Q1:rack/setup/install kafka-logging200[45]: Unknown Object (Task).
Wed, Jul 27, 5:41 PM · SRE Observability, observability, SRE, ops-codfw, DC-Ops
RobH created T313959: Q1:rack/setup/install kafka-logging200[45].
Wed, Jul 27, 5:40 PM · SRE Observability, observability, SRE, ops-codfw, DC-Ops

Tue, Jul 26

RobH added a subtask for T209460: CloudVPS: network architecture: Unknown Object (Task).
Tue, Jul 26, 11:16 PM · SRE, cloud-services-team (Kanban), Epic
RobH closed Unknown Object (Task), a subtask of T270704: cloud: introduce new edge network architecture for eqiad1 and codfw1dev, as Declined.
Tue, Jul 26, 11:16 PM · Patch-For-Review, cloud-services-team (Kanban)
RobH created T313874: kubernetes102[01] implemetation tracking.
Tue, Jul 26, 11:07 PM · SRE, serviceops
RobH added a parent task for T313873: Q1:rack/setup/install kubernetes102[01]: Unknown Object (Task).
Tue, Jul 26, 11:06 PM · SRE, serviceops, ops-eqiad, DC-Ops
RobH created T313873: Q1:rack/setup/install kubernetes102[01].
Tue, Jul 26, 11:04 PM · SRE, serviceops, ops-eqiad, DC-Ops
RobH created T313871: kubernetes202[01] implementation tracking.
Tue, Jul 26, 10:57 PM · SRE, serviceops, ops-codfw, DC-Ops
RobH added a parent task for T313870: Q1:rack/setup/install kubernetes202[01]: Unknown Object (Task).
Tue, Jul 26, 10:55 PM · SRE, serviceops, ops-codfw, DC-Ops
RobH created T313870: Q1:rack/setup/install kubernetes202[01].
Tue, Jul 26, 10:55 PM · SRE, serviceops, ops-codfw, DC-Ops
RobH added a parent task for T313867: Q1:rack/setup/install netmon2002: Unknown Object (Task).
Tue, Jul 26, 10:25 PM · SRE Observability, observability, ops-codfw, SRE, DC-Ops
RobH moved T313867: Q1:rack/setup/install netmon2002 from Backlog to Racking Tasks on the ops-codfw board.
Tue, Jul 26, 10:25 PM · SRE Observability, observability, ops-codfw, SRE, DC-Ops
RobH created T313867: Q1:rack/setup/install netmon2002.
Tue, Jul 26, 10:25 PM · SRE Observability, observability, ops-codfw, SRE, DC-Ops
RobH moved T313858: Q1:rack/setup/install centrallog1002 from Backlog to Racking Tasks on the ops-eqiad board.
Tue, Jul 26, 8:57 PM · SRE Observability, SRE, observability, ops-eqiad, DC-Ops
RobH added a parent task for T313858: Q1:rack/setup/install centrallog1002: Unknown Object (Task).
Tue, Jul 26, 8:55 PM · SRE Observability, SRE, observability, ops-eqiad, DC-Ops
RobH created T313858: Q1:rack/setup/install centrallog1002.
Tue, Jul 26, 8:55 PM · SRE Observability, SRE, observability, ops-eqiad, DC-Ops
RobH added a parent task for T313856: Q1:rack/setup/install ganeti203[12]: Unknown Object (Task).
Tue, Jul 26, 8:39 PM · Infrastructure-Foundations, SRE, ops-codfw, DC-Ops
RobH updated the task description for T313857: ganeti203[12] implementation tracking.
Tue, Jul 26, 8:39 PM · Infrastructure-Foundations, SRE
RobH updated the task description for T313857: ganeti203[12] implementation tracking.
Tue, Jul 26, 8:39 PM · Infrastructure-Foundations, SRE
RobH created T313857: ganeti203[12] implementation tracking.
Tue, Jul 26, 8:39 PM · Infrastructure-Foundations, SRE
RobH moved T313856: Q1:rack/setup/install ganeti203[12] from Backlog to Racking Tasks on the ops-codfw board.
Tue, Jul 26, 8:38 PM · Infrastructure-Foundations, SRE, ops-codfw, DC-Ops
RobH created T313856: Q1:rack/setup/install ganeti203[12].
Tue, Jul 26, 8:38 PM · Infrastructure-Foundations, SRE, ops-codfw, DC-Ops
RobH added a parent task for T313853: Q1:rack/setup/install graphite1005: Unknown Object (Task).
Tue, Jul 26, 8:15 PM · SRE Observability, observability, SRE, ops-eqiad, DC-Ops
RobH moved T313853: Q1:rack/setup/install graphite1005 from Backlog to Racking Tasks on the ops-eqiad board.
Tue, Jul 26, 8:14 PM · SRE Observability, observability, SRE, ops-eqiad, DC-Ops
RobH created T313853: Q1:rack/setup/install graphite1005.
Tue, Jul 26, 8:14 PM · SRE Observability, observability, SRE, ops-eqiad, DC-Ops
RobH moved T313851: Q1:rack/setup/install graphite2004 from Backlog to Racking Tasks on the ops-codfw board.
Tue, Jul 26, 8:09 PM · observability, ops-codfw, SRE, DC-Ops
RobH added a parent task for T313851: Q1:rack/setup/install graphite2004: Unknown Object (Task).
Tue, Jul 26, 8:09 PM · observability, ops-codfw, SRE, DC-Ops
RobH created T313851: Q1:rack/setup/install graphite2004.
Tue, Jul 26, 8:08 PM · observability, ops-codfw, SRE, DC-Ops
RobH added a parent task for T313849: Q1:rack/setup/install logstash103[67]: Unknown Object (Task).
Tue, Jul 26, 7:43 PM · SRE Observability, observability, ops-eqiad, SRE, DC-Ops
RobH moved T313849: Q1:rack/setup/install logstash103[67] from Backlog to Racking Tasks on the ops-eqiad board.
Tue, Jul 26, 7:43 PM · SRE Observability, observability, ops-eqiad, SRE, DC-Ops
RobH updated the task description for T313849: Q1:rack/setup/install logstash103[67].
Tue, Jul 26, 7:43 PM · SRE Observability, observability, ops-eqiad, SRE, DC-Ops
RobH created T313849: Q1:rack/setup/install logstash103[67].
Tue, Jul 26, 7:42 PM · SRE Observability, observability, ops-eqiad, SRE, DC-Ops
RobH added a parent task for T313848: Q1:rack/setup/install logstash203[67]: Unknown Object (Task).
Tue, Jul 26, 7:40 PM · observability, SRE, ops-codfw, DC-Ops
RobH updated the task description for T313848: Q1:rack/setup/install logstash203[67].
Tue, Jul 26, 7:40 PM · observability, SRE, ops-codfw, DC-Ops
RobH moved T313848: Q1:rack/setup/install logstash203[67] from Backlog to Racking Tasks on the ops-codfw board.
Tue, Jul 26, 7:40 PM · observability, SRE, ops-codfw, DC-Ops
RobH created T313848: Q1:rack/setup/install logstash203[67].
Tue, Jul 26, 7:39 PM · observability, SRE, ops-codfw, DC-Ops
RobH updated the task description for T313607: decommission frdb1002.frack.eqiad.wmnet.
Tue, Jul 26, 6:24 PM · SRE, ops-eqiad, decommission-hardware
RobH created T313832: contint1002 service implementation tracking.
Tue, Jul 26, 5:07 PM · serviceops-collab, SRE, serviceops
RobH added a parent task for T313830: Q1:rack/setup/install contint1002: Unknown Object (Task).
Tue, Jul 26, 5:00 PM · ops-codfw, SRE, serviceops, DC-Ops
RobH moved T313830: Q1:rack/setup/install contint1002 from Backlog to Racking Tasks on the ops-codfw board.
Tue, Jul 26, 5:00 PM · ops-codfw, SRE, serviceops, DC-Ops
RobH created T313830: Q1:rack/setup/install contint1002.
Tue, Jul 26, 4:59 PM · ops-codfw, SRE, serviceops, DC-Ops

Mon, Jul 18

RobH reopened Unknown Object (Task), a subtask of T297913: Confirm support of PERC 750 raid controller, as Open.
Mon, Jul 18, 4:52 PM · Patch-For-Review, DC-Ops, SRE

Fri, Jul 15

RobH added a comment to T313125: T166179 has attachments that perhaps shouldn't have been made public.

easily fixed, shifted it back into S4 and its now hidden again.

Fri, Jul 15, 3:29 PM · SecTeam-Processed, WMF-Legal, Vuln-Infoleak, SRE, Security, Security-Team

Wed, Jul 13

RobH updated the task description for T215301: codfw spare pool system for partman testing.
Wed, Jul 13, 11:52 PM · ops-codfw, DC-Ops, SRE
RobH closed T257337: OKR: DC Operations Documentation Update as Resolved.
Wed, Jul 13, 11:51 PM · Documentation, DC-Ops

Jun 30 2022

RobH claimed T299443: Q3: rack/setup/install dumpsdata100[67].

John fixed it, just pinged me in IRC. So I'll steal this back and open a case for the NIC issue.

Jun 30 2022, 5:20 PM · Dumps-Generation, SRE, ops-eqiad, DC-Ops
RobH added a comment to T299443: Q3: rack/setup/install dumpsdata100[67].

Summarizing yesterday's work:

Jun 30 2022, 4:30 PM · Dumps-Generation, SRE, ops-eqiad, DC-Ops
RobH reassigned T297913: Confirm support of PERC 750 raid controller from RobH to MoritzMuehlenhoff.

So I think this is now on Mortiz to roll out the monitoring changes (as he is in the above patchset) and no longer blocked on my testing. I'm assignign this over to him until that rolls live and then this can either come back to me for review of any other pending raid issues or be resolved entirely.

Jun 30 2022, 4:06 PM · Patch-For-Review, DC-Ops, SRE

Jun 29 2022

RobH updated the task description for T299443: Q3: rack/setup/install dumpsdata100[67].
Jun 29 2022, 7:31 PM · Dumps-Generation, SRE, ops-eqiad, DC-Ops
RobH added a comment to T297913: Confirm support of PERC 750 raid controller.

So post dumpsdata1007 install it fails puppet due to megaraid monitoring items it seems?

That's expected, we still need to adapt the "raid" fact in Puppet so that it installs perccli (but for that we needed a running system with Perc controller, so that we can figure out the device names which allow Puppet to detect the controller). Just leave the system in that state and we'll use dumpsdata1007 for that?

Jun 29 2022, 7:30 PM · Patch-For-Review, DC-Ops, SRE
RobH reassigned T299443: Q3: rack/setup/install dumpsdata100[67] from RobH to Jclark-ctr.

Ok, I updated the bios and then foolishly updated idrac, and now https implementation is broken for idrac.

Jun 29 2022, 7:26 PM · Dumps-Generation, SRE, ops-eqiad, DC-Ops
RobH added a comment to T299443: Q3: rack/setup/install dumpsdata100[67].

Something is very wrong with dumps1006, when I go to set it up, it doesn't see a 10G NIC, only the 1G. Rather than pollute this setup task, if I cannot solve it quickly I'll create a high priority hw error task for investigation and link to this.

Jun 29 2022, 6:33 PM · Dumps-Generation, SRE, ops-eqiad, DC-Ops