Page MenuHomePhabricator

RobH (Rob Halsell)
Operations EngineerAdministrator

Projects (20)

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Friday

  • Clear sailing ahead.

User Details

User Since
Nov 24 2014, 1:43 PM (256 w, 2 d)
Roles
Administrator
Availability
Available
IRC Nick
RobH
LDAP User
RobH
MediaWiki User
RobH [ Global Accounts ]

My GPG Key fingerprint = CB1F C7E7 0FF8 5DB2 6820 9C7E 75ED 14C7 0245 D22A

I am an Operations Engineer on Wikimedia's Datacenter Operations Team.

I also am the primary triage engineer for the hardware-requests project, as well as the private S4 procurement space and procurement project.

All questions involving allocation of hardware can be initially addressed on https://wikitech.wikimedia.org/wiki/Operations_requests.

Please note that private message via phabricator is not my preferred contact means. Please feel free to contact me (robh) directly via irc/freenode, or email my @wikimedia.org email address.

Recent Activity

Today

RobH updated the task description for T236294: rack/setup/install lvs300[567].
Wed, Oct 23, 4:43 PM · Traffic, Operations, ops-esams
RobH added a parent task for T236294: rack/setup/install lvs300[567]: Unknown Object (Task).
Wed, Oct 23, 4:42 PM · Traffic, Operations, ops-esams
RobH added projects to T236294: rack/setup/install lvs300[567]: ops-esams, Traffic.
Wed, Oct 23, 4:42 PM · Traffic, Operations, ops-esams
RobH triaged T236294: rack/setup/install lvs300[567] as Normal priority.
Wed, Oct 23, 4:42 PM · Traffic, Operations, ops-esams
RobH updated the task description for T184066: rack/setup/install ps[12]-oe1[456]-esams.
Wed, Oct 23, 4:35 PM · Operations, ops-esams
RobH renamed T236217: rack/setup/install dns300[12] from rack/setup/install dns300[123] to rack/setup/install dns300[12].
Wed, Oct 23, 3:33 PM · Traffic, DNS, ops-esams, Operations
RobH updated the task description for T236217: rack/setup/install dns300[12].
Wed, Oct 23, 3:33 PM · Traffic, DNS, ops-esams, Operations

Yesterday

RobH added a parent task for T236217: rack/setup/install dns300[12]: Unknown Object (Task).
Tue, Oct 22, 10:36 PM · Traffic, DNS, ops-esams, Operations
RobH triaged T236217: rack/setup/install dns300[12] as Normal priority.
Tue, Oct 22, 10:36 PM · Traffic, DNS, ops-esams, Operations
RobH added a parent task for T236216: rack/setup/install ganeti300[123]: Unknown Object (Task).
Tue, Oct 22, 10:34 PM · Operations, ops-esams
RobH triaged T236216: rack/setup/install ganeti300[123] as Normal priority.
Tue, Oct 22, 10:34 PM · Operations, ops-esams
RobH updated the task description for T184066: rack/setup/install ps[12]-oe1[456]-esams.
Tue, Oct 22, 9:53 PM · Operations, ops-esams
RobH updated the task description for T184066: rack/setup/install ps[12]-oe1[456]-esams.
Tue, Oct 22, 9:51 PM · Operations, ops-esams
RobH updated the task description for T184066: rack/setup/install ps[12]-oe1[456]-esams.
Tue, Oct 22, 9:50 PM · Operations, ops-esams
RobH added a parent task for T184066: rack/setup/install ps[12]-oe1[456]-esams: Unknown Object (Task).
Tue, Oct 22, 9:34 PM · Operations, ops-esams
RobH renamed T184066: rack/setup/install ps[12]-oe1[456]-esams from Procure and install new PDUs to rack/setup/install ps[12]-oe1[456]-esams.
Tue, Oct 22, 9:34 PM · Operations, ops-esams
RobH added a comment to T187456: Decommission labstore100[123] and their disk shelves.

irc update with john:

Tue, Oct 22, 7:54 PM · decommission, cloud-services-team (Kanban), Data-Services, Operations, DC-Ops, ops-eqiad
RobH closed T233129: update puppet for new PDU models, a subtask of T226778: Install new PDUs in rows A/B (Top level tracking task), as Resolved.
Tue, Oct 22, 7:00 PM · DC-Ops, Operations, ops-eqiad
RobH closed T233129: update puppet for new PDU models as Resolved.

Please note this is now a checkbox on all PDU upgrade tasks, so I'm resolving this task.

Tue, Oct 22, 7:00 PM · DC-Ops, Operations, ops-eqiad
RobH added a comment to T227142: a6-eqiad pdu refresh (Tuesday 10/22 @11am UTC).

the icinga downtime was set to expire in less than an hour, so I've extended it until 2300 GMT.

Tue, Oct 22, 5:15 PM · DC-Ops, Operations, ops-eqiad
RobH reassigned T227142: a6-eqiad pdu refresh (Tuesday 10/22 @11am UTC) from RobH to Jclark-ctr.

@wiki_willy requested I step in and setup the software side of things, but cannot do so as serial to this PDU isn't currently working.

Tue, Oct 22, 5:10 PM · DC-Ops, Operations, ops-eqiad
RobH removed a project from T227142: a6-eqiad pdu refresh (Tuesday 10/22 @11am UTC): Patch-For-Review.
Tue, Oct 22, 5:09 PM · DC-Ops, Operations, ops-eqiad
RobH updated the task description for T226782: a1-eqiad pdu refresh (Tuesday 10/15 @11am UTC).
Tue, Oct 22, 4:57 PM · DC-Ops, Operations, ops-eqiad
RobH closed T226782: a1-eqiad pdu refresh (Tuesday 10/15 @11am UTC), a subtask of T226778: Install new PDUs in rows A/B (Top level tracking task), as Resolved.
Tue, Oct 22, 4:56 PM · DC-Ops, Operations, ops-eqiad
RobH closed T226782: a1-eqiad pdu refresh (Tuesday 10/15 @11am UTC) as Resolved.

Ok, just logged in and confirmed the ps1 sees ps2. the rest was already configured from our deployment of ps1 except the model hadn't been updated.

Tue, Oct 22, 4:56 PM · DC-Ops, Operations, ops-eqiad
RobH reassigned T226782: a1-eqiad pdu refresh (Tuesday 10/15 @11am UTC) from RobH to Jclark-ctr.

My understanding of this task state is as follows:

Tue, Oct 22, 4:42 PM · DC-Ops, Operations, ops-eqiad
RobH closed T227536: b1-eqiad pdu refresh (Thursday 10/10 @11am UTC), a subtask of T226778: Install new PDUs in rows A/B (Top level tracking task), as Resolved.
Tue, Oct 22, 4:40 PM · DC-Ops, Operations, ops-eqiad
RobH closed T227536: b1-eqiad pdu refresh (Thursday 10/10 @11am UTC) as Resolved.
Tue, Oct 22, 4:40 PM · DC-Ops, Operations, ops-eqiad
RobH added a comment to T236152: wmf-auto-reimage, decommission & Server_lifecycle documentation for virtual machines reimage confusing.

I will ask @RobH and @akosiaris if I can mess with those pages, a 1 line addition with a warning would probably suffice, but I didn't understood the state, thanks for clarifications. I wanted to stress that I didn't need the functionality, just a clarification of what was the current status.

Tue, Oct 22, 4:38 PM · SRE-tools, Documentation

Mon, Oct 21

RobH closed T235911: ps1-22-ulsfo & ps1-23-ulsfo as Resolved.
Mon, Oct 21, 6:35 PM · Operations, Traffic, ops-ulsfo
RobH updated the task description for T235911: ps1-22-ulsfo & ps1-23-ulsfo.
Mon, Oct 21, 6:35 PM · Operations, Traffic, ops-ulsfo
RobH added a comment to T235911: ps1-22-ulsfo & ps1-23-ulsfo.

Summary of work:

Mon, Oct 21, 6:34 PM · Operations, Traffic, ops-ulsfo
RobH added a comment to T235911: ps1-22-ulsfo & ps1-23-ulsfo.

Ok, I'm onsite and going to attempt the following on ps1-22-ulsfo:

Mon, Oct 21, 6:23 PM · Operations, Traffic, ops-ulsfo
RobH added a comment to T235303: Update authoratiative nameservers for the toolforge.org domain to point to Designate.

Andrew CC'd.

Mon, Oct 21, 4:57 PM · Traffic, Operations, DNS, Toolforge, cloud-services-team (Kanban)
RobH added a comment to T235303: Update authoratiative nameservers for the toolforge.org domain to point to Designate.

@RobH should I email Doneva myself and cc you for approval or do you want to reach out yourself?

@Andrew,
I'll email her right now and CC you, explaining what has to happen and that you are authorized to do so!

Mon, Oct 21, 4:54 PM · Traffic, Operations, DNS, Toolforge, cloud-services-team (Kanban)
RobH updated subscribers of T235911: ps1-22-ulsfo & ps1-23-ulsfo.

Mainly I'd like @BBlack buy in on a date/time for me to do this work, since option 2 requires Traffic approval imo. (It would cause them work if any of the systems fail.)

Mon, Oct 21, 3:59 PM · Operations, Traffic, ops-ulsfo
RobH added a comment to T235911: ps1-22-ulsfo & ps1-23-ulsfo.
  1. reseat the hot swap nic, should reset
  2. unplug the ps1, leaving ps2 powered, to reset the nic
  3. reset with the reset button, will have to reconfigure the entire pdu (non-ideal, these have configs and switched ports, i rather not do this one)
Mon, Oct 21, 3:58 PM · Operations, Traffic, ops-ulsfo

Fri, Oct 18

RobH triaged T235911: ps1-22-ulsfo & ps1-23-ulsfo as Normal priority.
Fri, Oct 18, 6:14 PM · Operations, Traffic, ops-ulsfo
RobH reassigned T235770: decommission eeden from Jclark-ctr to Papaul.
Fri, Oct 18, 5:54 PM · Operations, DC-Ops, decommission
RobH reassigned T235770: decommission eeden from RobH to Jclark-ctr.
Fri, Oct 18, 5:48 PM · Operations, DC-Ops, decommission

Thu, Oct 17

RobH reassigned T225121: (Need By: Sept 30) upgrade msw1-eqiad from EX4200 to EX4300 from Cmjohnson to Jclark-ctr.

Please locate the new msw1-eqiad that I describe below and update the netbox asset tag entry. This will clear up our reporting errors for this device. Then either yourself or @Cmjohnson need to coordinate with @ayounsi on when this can be replaced.

Thu, Oct 17, 6:52 PM · netops, Operations, ops-eqiad
RobH updated subscribers of T225121: (Need By: Sept 30) upgrade msw1-eqiad from EX4200 to EX4300.

Please note this states it was racked, but it was never added into netbox, so I'm not sure where it is racked.

Thu, Oct 17, 6:50 PM · netops, Operations, ops-eqiad
RobH updated the task description for T225121: (Need By: Sept 30) upgrade msw1-eqiad from EX4200 to EX4300.
Thu, Oct 17, 6:49 PM · netops, Operations, ops-eqiad
RobH updated the task description for T227133: a8-eqiad pdu refresh (Thursday 10/17 @11am UTC).
Thu, Oct 17, 6:27 PM · DC-Ops, Operations, ops-eqiad
RobH claimed T227133: a8-eqiad pdu refresh (Thursday 10/17 @11am UTC).
Thu, Oct 17, 6:07 PM · DC-Ops, Operations, ops-eqiad
RobH closed T235785: decommission <FQDN of server> as Declined.
Thu, Oct 17, 4:58 PM · Operations, DC-Ops, decommission
RobH created T235785: decommission <FQDN of server>.
Thu, Oct 17, 4:58 PM · Operations, DC-Ops, decommission

Wed, Oct 16

RobH created T235716: update librenms report.
Wed, Oct 16, 10:13 PM · Operations, netbox
RobH updated the task description for T235685: rack/setup/install cloudvirt-wdqs100[123].eqiad.wmnet.
Wed, Oct 16, 6:41 PM · ops-eqiad, Operations
RobH updated the task description for T235685: rack/setup/install cloudvirt-wdqs100[123].eqiad.wmnet.
Wed, Oct 16, 6:16 PM · ops-eqiad, Operations
RobH reassigned T235685: rack/setup/install cloudvirt-wdqs100[123].eqiad.wmnet from Andrew to Jclark-ctr.

Racking Setup: These will all be cloudvirt-network-restricted hosts. They must go in 1G racks in Row B.
Network Setup: (2) 1G rack connections, similar to cloudvirt hosts but with the network being 1G and hostname being cloudvirt-wdqs100x.

Wed, Oct 16, 6:15 PM · ops-eqiad, Operations
RobH updated the task description for T235685: rack/setup/install cloudvirt-wdqs100[123].eqiad.wmnet.
Wed, Oct 16, 6:14 PM · ops-eqiad, Operations
RobH moved T235685: rack/setup/install cloudvirt-wdqs100[123].eqiad.wmnet from Backlog to Racking Tasks on the ops-eqiad board.

I'm not sure if @Andrew or @Gehel would know this, but I assigned to @Gehel

Wed, Oct 16, 5:41 PM · ops-eqiad, Operations
RobH added a parent task for T235685: rack/setup/install cloudvirt-wdqs100[123].eqiad.wmnet: Unknown Object (Task).
Wed, Oct 16, 5:39 PM · ops-eqiad, Operations
RobH triaged T235685: rack/setup/install cloudvirt-wdqs100[123].eqiad.wmnet as Normal priority.
Wed, Oct 16, 5:39 PM · ops-eqiad, Operations

Tue, Oct 15

RobH added a comment to T235303: Update authoratiative nameservers for the toolforge.org domain to point to Designate.

@RobH should I email Doneva myself and cc you for approval or do you want to reach out yourself?

Tue, Oct 15, 3:12 PM · Traffic, Operations, DNS, Toolforge, cloud-services-team (Kanban)

Fri, Oct 11

RobH updated the task description for T227543: b8-eqiad pdu refresh (Thursday 10/31 @11am UTC).
Fri, Oct 11, 8:42 PM · DC-Ops, Operations, ops-eqiad
RobH updated the task description for T227542: b7-eqiad pdu refresh (Tuesday 11/5 @11am UTC).
Fri, Oct 11, 8:42 PM · DC-Ops, Operations, ops-eqiad
RobH updated the task description for T227540: b4-eqiad pdu refresh (Thursday 10/24 @11am UTC).
Fri, Oct 11, 8:41 PM · DC-Ops, Operations, ops-eqiad
RobH updated the task description for T227538: b2-eqiad pdu refresh (Tuesday 10/29 @11am UTC).
Fri, Oct 11, 8:41 PM · DC-Ops, Operations, ops-eqiad
RobH updated the task description for T227133: a8-eqiad pdu refresh (Thursday 10/17 @11am UTC).
Fri, Oct 11, 8:41 PM · DC-Ops, Operations, ops-eqiad
RobH updated the task description for T226782: a1-eqiad pdu refresh (Tuesday 10/15 @11am UTC).
Fri, Oct 11, 8:40 PM · DC-Ops, Operations, ops-eqiad

Thu, Oct 10

RobH reopened T227138: a2-eqiad pdu refresh (Tuesday 10/8 @11am UTC), a subtask of T226778: Install new PDUs in rows A/B (Top level tracking task), as Open.
Thu, Oct 10, 8:04 PM · DC-Ops, Operations, ops-eqiad
RobH reopened T227138: a2-eqiad pdu refresh (Tuesday 10/8 @11am UTC) as "Open".

I should not have resolved this.

Thu, Oct 10, 8:04 PM · DC-Ops, Operations, ops-eqiad
RobH added a comment to T235125: Move kafka200[123] to logstash202[012].

Please note that the hostname mismatch in netbox versus puppet was causing reporting errors on the puppetdb netbox report.

Thu, Oct 10, 7:27 PM · DC-Ops, Operations, ops-codfw
RobH added a comment to T235124: Move kafka100[123] to logstash102[012].

Please note that the netbox mis-match caused netbox reporting errors. To fix this I have done the following:

Thu, Oct 10, 7:19 PM · DC-Ops, Operations, ops-eqiad
RobH reassigned T187456: Decommission labstore100[123] and their disk shelves from Cmjohnson to Jclark-ctr.

So the labstore1003-array[123] are all causing report erros on https://netbox.wikimedia.org/extras/reports/coherence.Coherence/ section: test_malformed_asset_tags

Thu, Oct 10, 6:39 PM · decommission, cloud-services-team (Kanban), Data-Services, Operations, DC-Ops, ops-eqiad
RobH triaged T235190: fix serial connection for ps1-a2-eqiad as Normal priority.
Thu, Oct 10, 3:42 PM · DC-Ops, Operations, ops-eqiad
RobH created T235190: fix serial connection for ps1-a2-eqiad.
Thu, Oct 10, 3:42 PM · DC-Ops, Operations, ops-eqiad
RobH closed T227138: a2-eqiad pdu refresh (Tuesday 10/8 @11am UTC) as Resolved.

Please note that with the temp serial run, we went ahead and setup ps1-a2-eqiad.

Thu, Oct 10, 3:40 PM · DC-Ops, Operations, ops-eqiad
RobH closed T227138: a2-eqiad pdu refresh (Tuesday 10/8 @11am UTC), a subtask of T226778: Install new PDUs in rows A/B (Top level tracking task), as Resolved.
Thu, Oct 10, 3:40 PM · DC-Ops, Operations, ops-eqiad
RobH added a comment to T227138: a2-eqiad pdu refresh (Tuesday 10/8 @11am UTC).

@Jclark-ctr and I went through the following to fix this issue:

Thu, Oct 10, 3:30 PM · DC-Ops, Operations, ops-eqiad

Wed, Oct 9

RobH reassigned T232367: (2019-09-15) rack/setup/install ms-be105[1-6].eqiad.wmnet from RobH to fgiunchedi.

ms-be105[1-6].eqiad.wmnet are all online and calling into puppet. You can push them into service as you see fit.

Wed, Oct 9, 6:32 PM · User-fgiunchedi, Operations
RobH updated the task description for T232367: (2019-09-15) rack/setup/install ms-be105[1-6].eqiad.wmnet.
Wed, Oct 9, 6:30 PM · User-fgiunchedi, Operations
RobH reassigned T227138: a2-eqiad pdu refresh (Tuesday 10/8 @11am UTC) from RobH to Jclark-ctr.

I've just attempted to connect to ps1-a2-eqiad via serial, and failed. To fix this, I'll outline the steps needed below and after coordination with @wiki_willy, determined best to assign this to @Jclark-ctr to fix (though @Cmjohnson is also able to do so, either can steal this task as needed.)

Wed, Oct 9, 4:42 PM · DC-Ops, Operations, ops-eqiad
RobH added a comment to T227539: b3-eqiad pdu refresh (Tuesday 9/17 @11am UTC).

Clarification: https://netbox.wikimedia.org/dcim/devices/1394/ is the OLD ps1-b3-eqiad that should have its hostname set to asset tag, and then set to offline state as its unracked.

Wed, Oct 9, 4:32 PM · DC-Ops, Operations, ops-eqiad
RobH added a comment to T227541: b6-eqiad pdu refresh (Tuesday 9/10 @11am UTC).

Please note that when I compare librenms output it seems like it sees both towers right now:

Wed, Oct 9, 4:30 PM · DC-Ops, Operations, ops-eqiad
RobH added a comment to T232367: (2019-09-15) rack/setup/install ms-be105[1-6].eqiad.wmnet.

the task description has been updated

Wed, Oct 9, 1:39 AM · User-fgiunchedi, Operations
RobH updated the task description for T232367: (2019-09-15) rack/setup/install ms-be105[1-6].eqiad.wmnet.
Wed, Oct 9, 12:28 AM · User-fgiunchedi, Operations
RobH updated the task description for T232367: (2019-09-15) rack/setup/install ms-be105[1-6].eqiad.wmnet.
Wed, Oct 9, 12:12 AM · User-fgiunchedi, Operations

Tue, Oct 8

RobH updated the task description for T232367: (2019-09-15) rack/setup/install ms-be105[1-6].eqiad.wmnet.
Tue, Oct 8, 11:56 PM · User-fgiunchedi, Operations
RobH updated the task description for T232367: (2019-09-15) rack/setup/install ms-be105[1-6].eqiad.wmnet.
Tue, Oct 8, 11:54 PM · User-fgiunchedi, Operations
RobH removed a project from T232367: (2019-09-15) rack/setup/install ms-be105[1-6].eqiad.wmnet: Patch-For-Review.
Tue, Oct 8, 5:28 PM · User-fgiunchedi, Operations
RobH updated the task description for T232367: (2019-09-15) rack/setup/install ms-be105[1-6].eqiad.wmnet.
Tue, Oct 8, 5:11 PM · User-fgiunchedi, Operations
RobH updated the task description for T232367: (2019-09-15) rack/setup/install ms-be105[1-6].eqiad.wmnet.
Tue, Oct 8, 5:02 PM · User-fgiunchedi, Operations

Thu, Oct 3

RobH closed T234541: decommission <FQDN of server> as Invalid.
Thu, Oct 3, 3:54 PM · DC-Ops, decommission
RobH created T234541: decommission <FQDN of server>.
Thu, Oct 3, 3:54 PM · DC-Ops, decommission

Wed, Oct 2

RobH added a comment to T231066: Host decommission improvements.
Wed, Oct 2, 6:38 PM · Operations, DC-Ops, SRE-tools
RobH updated the task description for T231066: Host decommission improvements.
Wed, Oct 2, 6:38 PM · Operations, DC-Ops, SRE-tools
RobH closed T227314: eqiad+codfw: 6x hardware request for swift backend (each site) as Resolved.

all hardware requested was ordered so this is resolved

Wed, Oct 2, 5:02 PM · hardware-requests, Operations
RobH closed Unknown Object (Task), a subtask of T227314: eqiad+codfw: 6x hardware request for swift backend (each site), as Resolved.
Wed, Oct 2, 4:56 PM · hardware-requests, Operations

Tue, Oct 1

RobH closed Unknown Object (Task), a subtask of T209515: Renew Digicert Unified in 2019, as Resolved.
Tue, Oct 1, 5:33 PM · Operations, Traffic
RobH reopened T228919: replace scs-a8-eqiad as "Open".

@Jclark-ctr: Please note this task was not ready to be resolved, it has many, many steps left.

Tue, Oct 1, 2:56 PM · ops-eqiad, Operations

Fri, Sep 27

RobH removed a member for Wikimedia-Mailing-lists: RobH.
Fri, Sep 27, 11:11 PM
RobH removed a watcher for Wikimedia-Mailing-lists: RobH.
Fri, Sep 27, 11:11 PM
RobH reassigned T228919: replace scs-a8-eqiad from Cmjohnson to Jclark-ctr.

I see you received in the scs on the procurement task T228202. Can you go ahead and do the first two steps on this, so it is in netbox and trackable? The remainder of the steps should be coordinated with @Cmjohnson.

Fri, Sep 27, 11:10 PM · ops-eqiad, Operations
RobH updated the task description for T228919: replace scs-a8-eqiad.
Fri, Sep 27, 11:09 PM · ops-eqiad, Operations
RobH updated the task description for T234076: (Need by Aug 1) rack/setup/install dumpsdata1003.eqiad.wmnet.
Fri, Sep 27, 5:52 PM · Patch-For-Review, ops-eqiad, Operations
RobH updated the task description for T234076: (Need by Aug 1) rack/setup/install dumpsdata1003.eqiad.wmnet.
Fri, Sep 27, 5:46 PM · Patch-For-Review, ops-eqiad, Operations
RobH moved T234076: (Need by Aug 1) rack/setup/install dumpsdata1003.eqiad.wmnet from Backlog to Racking Tasks on the ops-eqiad board.
Fri, Sep 27, 5:45 PM · Patch-For-Review, ops-eqiad, Operations