Page MenuHomePhabricator

RobH (Rob Halsell)
Operations EngineerAdministrator

Projects (21)

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Tuesday

  • Clear sailing ahead.

User Details

User Since
Nov 24 2014, 1:43 PM (287 w, 5 d)
Roles
Administrator
Availability
Available
IRC Nick
RobH
LDAP User
RobH
MediaWiki User
RobH [ Global Accounts ]

My GPG Key fingerprint = CB1F C7E7 0FF8 5DB2 6820 9C7E 75ED 14C7 0245 D22A

I am an Operations Engineer on Wikimedia's Datacenter Operations Team.

I also am the primary triage engineer for the hardware-requests project, as well as the private S4 procurement space and procurement project.

All questions involving allocation of hardware can be initially addressed on https://wikitech.wikimedia.org/wiki/Operations_requests.

Please note that private message via phabricator is not my preferred contact means. Please feel free to contact me (robh) directly via irc/freenode, or email my @wikimedia.org email address.

Recent Activity

Tue, May 26

RobH added a project to T253694: (Need By:TBD) rack/setup/install rows C and D new PDUs: ops-eqiad.
Tue, May 26, 9:53 PM · Operations, ops-eqiad, DC-Ops
RobH created T253694: (Need By:TBD) rack/setup/install rows C and D new PDUs.
Tue, May 26, 9:41 PM · Operations, ops-eqiad, DC-Ops

Wed, May 20

RobH reassigned T251219: cp5012 memory errors from RobH to Vgutierrez.

So this ran the full suite of Dell tests, including extended memory testing, without failure. I did update the firmware before testing though.

Wed, May 20, 9:12 PM · Operations, ops-eqsin, Traffic
RobH added a comment to T251219: cp5012 memory errors.

Ok, for memory tests we need to clear the SEL, so just dumping its output here for easy review later (its stored in the server still but not readable without a data dump and sorting):

Wed, May 20, 5:32 PM · Operations, ops-eqsin, Traffic
RobH added a parent task for T253246: (Need By: TBD) rack/setup/install cr3-eqsin.wikimedia.org: Unknown Object (Task).
Wed, May 20, 4:58 PM · Operations, netops, ops-eqsin, DC-Ops
RobH created T253246: (Need By: TBD) rack/setup/install cr3-eqsin.wikimedia.org.
Wed, May 20, 4:57 PM · Operations, netops, ops-eqsin, DC-Ops
RobH added a parent task for T243450: Audit & update spares part tracking for all sites: Unknown Object (Task).
Wed, May 20, 4:48 PM · ops-eqiad, ops-esams, ops-codfw, ops-eqsin, DC-Ops, Operations
RobH added a parent task for T244900: apply asset tags to s[12]-60[34]-eqsin: Unknown Object (Task).
Wed, May 20, 4:48 PM · Operations, ops-eqsin
RobH added a parent task for T250369: eqsin ganeti cable IDs: Unknown Object (Task).
Wed, May 20, 4:48 PM · Operations, ops-eqsin
RobH added a parent task for T251219: cp5012 memory errors: Unknown Object (Task).
Wed, May 20, 4:48 PM · Operations, ops-eqsin, Traffic

Tue, May 19

RobH added a subtask for T251644: Icinga refresh hardware selection (2020): Unknown Object (Task).
Tue, May 19, 5:33 PM · observability, Operations
RobH added a subtask for T251644: Icinga refresh hardware selection (2020): Unknown Object (Task).
Tue, May 19, 5:33 PM · observability, Operations

Mon, May 11

RobH added a project to T238957: decommission phab1003.eqiad.wmnet: decommission.
Mon, May 11, 2:31 PM · decommission, serviceops, Release-Engineering-Team
fgiunchedi awarded T251635: (Need By: TDB) rack/setup/install thanos-fe200[123] a Party Time token.
Mon, May 11, 8:08 AM · ops-codfw, Operations, DC-Ops

Fri, May 8

RobH moved T251219: cp5012 memory errors from Backlog to Hardware Failure / Repair on the ops-eqsin board.
Fri, May 8, 8:41 PM · Operations, ops-eqsin, Traffic

Fri, May 1

RobH reassigned T251639: (Need By: 31st May) rack/setup/install db213[6-9] and db2140 from Papaul to jcrespo.

@jcrespo or @Marostegui:

Fri, May 1, 8:21 PM · DBA, ops-codfw, Operations, DC-Ops
RobH updated the task description for T251639: (Need By: 31st May) rack/setup/install db213[6-9] and db2140.
Fri, May 1, 8:20 PM · DBA, ops-codfw, Operations, DC-Ops
RobH moved T251639: (Need By: 31st May) rack/setup/install db213[6-9] and db2140 from Backlog to Racking Tasks on the ops-codfw board.
Fri, May 1, 7:23 PM · DBA, ops-codfw, Operations, DC-Ops
RobH added a parent task for T251639: (Need By: 31st May) rack/setup/install db213[6-9] and db2140: Unknown Object (Task).
Fri, May 1, 7:22 PM · DBA, ops-codfw, Operations, DC-Ops
RobH created T251639: (Need By: 31st May) rack/setup/install db213[6-9] and db2140.
Fri, May 1, 7:22 PM · DBA, ops-codfw, Operations, DC-Ops
RobH moved T251635: (Need By: TDB) rack/setup/install thanos-fe200[123] from Backlog to Racking Tasks on the ops-codfw board.
Fri, May 1, 7:04 PM · ops-codfw, Operations, DC-Ops
RobH added a parent task for T251635: (Need By: TDB) rack/setup/install thanos-fe200[123]: Unknown Object (Task).
Fri, May 1, 7:04 PM · ops-codfw, Operations, DC-Ops
RobH created T251635: (Need By: TDB) rack/setup/install thanos-fe200[123].
Fri, May 1, 7:03 PM · ops-codfw, Operations, DC-Ops
RobH renamed T251634: (Need By: TBD) rack/setup/install thanos-be200[1-4] from (Need By: TBD) rack/setup/install thanos-be200[123] to (Need By: TBD) rack/setup/install thanos-be200[1-4].
Fri, May 1, 7:02 PM · ops-codfw, Operations, DC-Ops
RobH moved T251634: (Need By: TBD) rack/setup/install thanos-be200[1-4] from Backlog to Racking Tasks on the ops-codfw board.
Fri, May 1, 7:00 PM · ops-codfw, Operations, DC-Ops
RobH added a parent task for T251634: (Need By: TBD) rack/setup/install thanos-be200[1-4]: Unknown Object (Task).
Fri, May 1, 7:00 PM · ops-codfw, Operations, DC-Ops
RobH created T251634: (Need By: TBD) rack/setup/install thanos-be200[1-4].
Fri, May 1, 6:59 PM · ops-codfw, Operations, DC-Ops
RobH added parent tasks for T251632: (Need By: TBD) rack/setup/install WMCS 10G switches: Unknown Object (Task), Unknown Object (Task).
Fri, May 1, 6:28 PM · cloud-services-team (Hardware), Operations, netops, ops-eqiad, DC-Ops
RobH created T251632: (Need By: TBD) rack/setup/install WMCS 10G switches.
Fri, May 1, 6:27 PM · cloud-services-team (Hardware), Operations, netops, ops-eqiad, DC-Ops
RobH moved T251627: (Need By: TDB) rack/setup/install cloudvirt10[31-39]eqiad.wmnet from Backlog to Racking Tasks on the ops-eqiad board.
Fri, May 1, 5:38 PM · cloud-services-team (Hardware), ops-eqiad, Operations, DC-Ops
RobH added a parent task for T251627: (Need By: TDB) rack/setup/install cloudvirt10[31-39]eqiad.wmnet: Unknown Object (Task).
Fri, May 1, 5:38 PM · cloud-services-team (Hardware), ops-eqiad, Operations, DC-Ops
RobH created T251627: (Need By: TDB) rack/setup/install cloudvirt10[31-39]eqiad.wmnet.
Fri, May 1, 5:38 PM · cloud-services-team (Hardware), ops-eqiad, Operations, DC-Ops
RobH moved T251626: (Need By: TDB) rack/setup/install rdb200[78] from Backlog to Racking Tasks on the ops-codfw board.
Fri, May 1, 5:36 PM · Operations, ops-codfw, DC-Ops
RobH added a parent task for T251626: (Need By: TDB) rack/setup/install rdb200[78]: Unknown Object (Task).
Fri, May 1, 5:35 PM · Operations, ops-codfw, DC-Ops
RobH created T251626: (Need By: TDB) rack/setup/install rdb200[78].
Fri, May 1, 5:35 PM · Operations, ops-codfw, DC-Ops
RobH added a parent task for T251618: (Need By: ASAP) rack/setup/install thanos-be100[123]: Unknown Object (Task).
Fri, May 1, 5:21 PM · ops-eqiad, DC-Ops, Operations
RobH renamed T251618: (Need By: ASAP) rack/setup/install thanos-be100[123] from (Due Date: TBD) rack/setup/install thanos-be100[123] to (Need By: TBD) rack/setup/install thanos-be100[123].
Fri, May 1, 5:21 PM · ops-eqiad, DC-Ops, Operations
RobH added a parent task for T251620: (NEED BY: ASAP) rack/setup/install thanos-fe100[123].eqiad.wmnet: Unknown Object (Task).
Fri, May 1, 5:18 PM · ops-eqiad, Operations, DC-Ops
RobH moved T251622: (Need By: ASAP) install additional SSDs into prometheus200[34] from Backlog to Racking Tasks on the ops-codfw board.
Fri, May 1, 4:44 PM · Operations, ops-codfw, DC-Ops
RobH added a parent task for T251622: (Need By: ASAP) install additional SSDs into prometheus200[34]: Unknown Object (Task).
Fri, May 1, 4:44 PM · Operations, ops-codfw, DC-Ops
RobH created T251622: (Need By: ASAP) install additional SSDs into prometheus200[34].
Fri, May 1, 4:43 PM · Operations, ops-codfw, DC-Ops
RobH moved T251621: (Need By: ASAP) install additional SSDs into prometheus100[34] from Backlog to Racking Tasks on the ops-eqiad board.
Fri, May 1, 4:42 PM · ops-eqiad, DC-Ops, Operations
RobH added a parent task for T251621: (Need By: ASAP) install additional SSDs into prometheus100[34]: Unknown Object (Task).
Fri, May 1, 4:42 PM · ops-eqiad, DC-Ops, Operations
RobH created T251621: (Need By: ASAP) install additional SSDs into prometheus100[34].
Fri, May 1, 4:41 PM · ops-eqiad, DC-Ops, Operations
RobH moved T251620: (NEED BY: ASAP) rack/setup/install thanos-fe100[123].eqiad.wmnet from Backlog to Procurement on the ops-eqiad board.
Fri, May 1, 4:38 PM · ops-eqiad, Operations, DC-Ops
RobH created T251620: (NEED BY: ASAP) rack/setup/install thanos-fe100[123].eqiad.wmnet.
Fri, May 1, 4:38 PM · ops-eqiad, Operations, DC-Ops
RobH moved T251619: (Need By: TBD) rack/setup/install cloudcephosd10[04-15].wikimedia.org from Backlog to Racking Tasks on the ops-eqiad board.
Fri, May 1, 4:35 PM · cloud-services-team (Hardware), ops-eqiad, DC-Ops, Operations
RobH added a parent task for T251619: (Need By: TBD) rack/setup/install cloudcephosd10[04-15].wikimedia.org: Unknown Object (Task).
Fri, May 1, 4:34 PM · cloud-services-team (Hardware), ops-eqiad, DC-Ops, Operations
RobH created T251619: (Need By: TBD) rack/setup/install cloudcephosd10[04-15].wikimedia.org.
Fri, May 1, 4:34 PM · cloud-services-team (Hardware), ops-eqiad, DC-Ops, Operations
RobH moved T251616: (Due Date: ASAP) rack/setup/install replacement msw-c6-eqiad from Backlog to Racking Tasks on the ops-eqiad board.
Fri, May 1, 4:31 PM · Operations, ops-eqiad, DC-Ops
RobH moved T251618: (Need By: ASAP) rack/setup/install thanos-be100[123] from Backlog to Racking Tasks on the ops-eqiad board.
Fri, May 1, 4:31 PM · ops-eqiad, DC-Ops, Operations
RobH created T251618: (Need By: ASAP) rack/setup/install thanos-be100[123].
Fri, May 1, 4:31 PM · ops-eqiad, DC-Ops, Operations
RobH added a parent task for T251616: (Due Date: ASAP) rack/setup/install replacement msw-c6-eqiad: Unknown Object (Task).
Fri, May 1, 4:27 PM · Operations, ops-eqiad, DC-Ops
RobH created T251616: (Due Date: ASAP) rack/setup/install replacement msw-c6-eqiad.
Fri, May 1, 4:27 PM · Operations, ops-eqiad, DC-Ops
RobH moved T251614: (Need By: 31st May) rack/setup/install db114[1-9] from Backlog to Racking Tasks on the ops-eqiad board.
Fri, May 1, 4:13 PM · DBA, Operations, ops-eqiad, DC-Ops
RobH added a parent task for T251614: (Need By: 31st May) rack/setup/install db114[1-9]: Unknown Object (Task).
Fri, May 1, 4:13 PM · DBA, Operations, ops-eqiad, DC-Ops
RobH renamed T251614: (Need By: 31st May) rack/setup/install db114[1-9] from (<enter due date here>) rack/setup/install <insert FQDN/hostname of hardware here> to (Need By: ASAP) rack/setup/install db114[1-9].
Fri, May 1, 4:13 PM · DBA, Operations, ops-eqiad, DC-Ops
RobH created T251614: (Need By: 31st May) rack/setup/install db114[1-9].
Fri, May 1, 4:12 PM · DBA, Operations, ops-eqiad, DC-Ops

Apr 30 2020

RobH updated the task description for T250408: fix newly imported cable data in ulsfo.
Apr 30 2020, 5:37 PM · Operations, DC-Ops, ops-ulsfo

Apr 28 2020

RobH updated the task description for T250408: fix newly imported cable data in ulsfo.
Apr 28 2020, 6:36 PM · Operations, DC-Ops, ops-ulsfo

Apr 21 2020

RobH added a parent task for T250846: (Need By: TBD) rack/setup/install cloudceph200[123]-dev: Unknown Object (Task).
Apr 21 2020, 6:29 PM · Cloud-Services, Operations, ops-codfw, DC-Ops
RobH created T250846: (Need By: TBD) rack/setup/install cloudceph200[123]-dev.
Apr 21 2020, 6:29 PM · Cloud-Services, Operations, ops-codfw, DC-Ops
RobH updated the task description for T250816: (Need By: TBD) rack/setup/install backup1002 + array.
Apr 21 2020, 3:20 PM · DBA, ops-eqiad, Operations, DC-Ops
RobH added a comment to T250816: (Need By: TBD) rack/setup/install backup1002 + array.

Can you confirm racking and hostname details

Cannot they be copied from the ones I gave for backup2002? T248934

Apr 21 2020, 3:19 PM · DBA, ops-eqiad, Operations, DC-Ops
RobH closed T250817: (Need By: TBD) rack/setup/install backup2002 + array as Invalid.

This is a duplicate task, this is already done at T248934 - you have quite a mixup there.

Apr 21 2020, 3:18 PM · ops-codfw, DC-Ops, Operations
RobH assigned T250817: (Need By: TBD) rack/setup/install backup2002 + array to jcrespo.

We did not have the racking info for this before it arrived, I've made the above task. Can you confirm racking and hostname details and then assign to @Papaul for implementation, thanks!

Apr 21 2020, 3:13 PM · ops-codfw, DC-Ops, Operations
RobH created T250817: (Need By: TBD) rack/setup/install backup2002 + array.
Apr 21 2020, 3:13 PM · ops-codfw, DC-Ops, Operations
RobH assigned T250816: (Need By: TBD) rack/setup/install backup1002 + array to jcrespo.

We did not have the racking info for this before it arrived, I've made the above task. Can you confirm racking and hostname details and then assign to @Jclark-ctr for implementation, thanks!

Apr 21 2020, 3:11 PM · DBA, ops-eqiad, Operations, DC-Ops
RobH added a parent task for T250816: (Need By: TBD) rack/setup/install backup1002 + array: Unknown Object (Task).
Apr 21 2020, 3:10 PM · DBA, ops-eqiad, Operations, DC-Ops
RobH created T250816: (Need By: TBD) rack/setup/install backup1002 + array.
Apr 21 2020, 3:09 PM · DBA, ops-eqiad, Operations, DC-Ops

Apr 20 2020

RobH added a comment to T250652: msw1-a6-eqiad flopping up and down mgmt connections on A6.

T249048 was approved last Friday (today being Monday), and my plan is to place the info into Coupa later today for ordering. I don't think we'll need another task for just a one off switch, as this should come in just as fast.

Apr 20 2020, 3:56 PM · Operations, ops-eqiad

Apr 16 2020

RobH updated the task description for T250408: fix newly imported cable data in ulsfo.
Apr 16 2020, 4:44 PM · Operations, DC-Ops, ops-ulsfo
RobH changed the status of T250408: fix newly imported cable data in ulsfo from Open to Stalled.

Please note this will NOT be fixed quickly, as we are limiting on-site visitation due to covid-19 shelter in place restrictions. Since this is just auditing cables, it likely can simply wait until we have to go onsite, or have to have another remote hands task filed with Digital Realty.

Apr 16 2020, 4:44 PM · Operations, DC-Ops, ops-ulsfo
RobH moved T250408: fix newly imported cable data in ulsfo from Backlog to Hardware Failure / Repair on the ops-ulsfo board.
Apr 16 2020, 4:43 PM · Operations, DC-Ops, ops-ulsfo
RobH triaged T250408: fix newly imported cable data in ulsfo as Low priority.
Apr 16 2020, 4:43 PM · Operations, DC-Ops, ops-ulsfo
RobH moved T250369: eqsin ganeti cable IDs from Backlog to Hardware Failure / Repair on the ops-eqsin board.
Apr 16 2020, 3:47 PM · Operations, ops-eqsin
RobH added a comment to T250369: eqsin ganeti cable IDs.

Acknowledged. We are waiting to have Jin go onsite once the new router arrives, so I'll add this to the list of items for him to tackle when onsite for that!

Apr 16 2020, 3:47 PM · Operations, ops-eqsin

Apr 13 2020

RobH updated subscribers of T166368: Wipe of spare/replacement disks.

If I understand it correctly, this task is specifically about a box that was returned to the spare pool and then was reallocated for a new purpose but kept its old data. We should definitely wipe in those cases. I think that has been standard practice in the past, but perhaps not well-documented or applied uniformly? I'm not sure, something to dig in more for sure :)

Also, even in the cases where this is an "unracking" decom (which would be off-topic for this task I think?), I'm not sure if we ever talked about not wiping anymore, but shred disks as an additional measure. I may be misremembering though :) Is there a wikitech diff, task, etc. where this is detailed?

Apr 13 2020, 5:49 PM · DC-Ops, Operations
RobH added a comment to T166368: Wipe of spare/replacement disks.

This task is over a year old (should we resolve/reject it?) Please note that we no longer require all disks be wiped before decom (just reuse), as we physically destroy all disks now.

Apr 13 2020, 3:38 PM · DC-Ops, Operations
RobH closed T175876: document all scs connections, a subtask of T175625: scs-c1-eqiad unresponsive, as Resolved.
Apr 13 2020, 3:36 PM · ops-eqiad, DC-Ops, Operations
RobH closed T175876: document all scs connections as Resolved.

quick review of the scs devices on https://netbox.wikimedia.org/dcim/devices/?q=scs&status=active&mac_address=&has_primary_ip=&local_context_data=&virtual_chassis_member=&console_ports=&console_server_ports=&power_ports=&power_outlets=&interfaces=&pass_through_ports=&cf_owner=&cf_purchase_date=&cf_ticket= shows that all active devices have their ports documented.

Apr 13 2020, 3:36 PM · DC-Ops, Operations
RobH assigned T169286: labstore1005 A PCIe link training failure error on boot to Bstorm.

Please note this was NOT in ops-eqiad, and was likely being overlooked by onsites in eqiad due to that reason. (It also is not assigned to anyone, so no one is touching it.)

Apr 13 2020, 3:34 PM · cloud-services-team (Kanban), DC-Ops, Operations
RobH closed T233039: hw troubleshooting: <type of hardware failre> for <fqhn of server> as Invalid.
Apr 13 2020, 3:32 PM · DC-Ops
RobH closed T214951: standardize workboards for dc-ops subprojects. as Resolved.

Updated the task description. I think removing 'high priority' and 'non urgent' is a good idea, since the entire workboard can be viewed by priority.

Apr 13 2020, 3:31 PM · DC-Ops
RobH moved T235805: ESAMS Refresh/Rebuild (October 2019) from Blocked to Backlog on the ops-esams board.
Apr 13 2020, 3:29 PM · Patch-For-Review, DC-Ops, Operations, ops-esams
RobH moved T237041: wipe backup-array1 from Blocked to Decommission on the ops-esams board.
Apr 13 2020, 3:29 PM · ops-esams, Operations
RobH moved T243088: esams: normalize the power outlet assignments from Procurement to Hardware Failure / Repair on the ops-esams board.
Apr 13 2020, 3:28 PM · Operations, ops-esams
RobH moved T244914: trace qfx5100-spare[12]-esams power cables from Procurement to Hardware Failure / Repair on the ops-esams board.
Apr 13 2020, 3:28 PM · Operations, ops-esams
RobH moved T243450: Audit & update spares part tracking for all sites from Procurement to Racking Tasks on the ops-esams board.
Apr 13 2020, 3:28 PM · ops-eqiad, ops-esams, ops-codfw, ops-eqsin, DC-Ops, Operations
RobH updated the task description for T214951: standardize workboards for dc-ops subprojects..
Apr 13 2020, 3:27 PM · DC-Ops
RobH updated the task description for T214951: standardize workboards for dc-ops subprojects..
Apr 13 2020, 3:25 PM · DC-Ops
RobH added a comment to T214951: standardize workboards for dc-ops subprojects..

Outsider comment: Instead of a "High priority Tasks" column on the ops-codfw workboard which is not disjunct, you could change the view to Sort by Priority. (In a few weeks we will likely also have grouping of tasks within columns, e.g. grouping by priority.)

Apr 13 2020, 3:22 PM · DC-Ops
RobH added a subtask for T250054: Netbox report coherence_rack Icinga alert: T249287: update rack location of decom wmf5801.
Apr 13 2020, 2:58 PM · DC-Ops, ops-ulsfo, Operations, ops-eqiad
RobH added a parent task for T249287: update rack location of decom wmf5801: T250054: Netbox report coherence_rack Icinga alert.
Apr 13 2020, 2:58 PM · Operations, ops-ulsfo

Apr 8 2020

RobH triaged T249757: update accounting report error output - Device with s/n X (duplicate) (N/A) not present in Netbox as Medium priority.
Apr 8 2020, 6:51 PM · DC-Ops, netbox
RobH changed the status of T244900: apply asset tags to s[12]-60[34]-eqsin, a subtask of T245056: snag asset tags from ulsfo, ship some to eqsin, from Open to Stalled.
Apr 8 2020, 6:26 PM · Operations, ops-eqsin
RobH changed the status of T244900: apply asset tags to s[12]-60[34]-eqsin, a subtask of T242250: rack/setup/install ps[12]-60[34]-eqsin, from Open to Stalled.
Apr 8 2020, 6:26 PM · Operations, ops-eqsin
RobH changed the status of T244900: apply asset tags to s[12]-60[34]-eqsin from Open to Stalled.
Apr 8 2020, 6:26 PM · Operations, ops-eqsin
RobH added a comment to T244900: apply asset tags to s[12]-60[34]-eqsin.

Please note that due to covid19 concerns, this task won't be accomplished until we have Jin onsite in late April to receive and install our router ordered for that location.

Apr 8 2020, 6:26 PM · Operations, ops-eqsin

Apr 7 2020

RobH added a parent task for T249653: Netbox: restore two deleted entries from backups: Unknown Object (Task).
Apr 7 2020, 7:32 PM · netbox