Page MenuHomePhabricator
Feed Advanced Search

Yesterday

RobH closed T229243: remote hands setups for ganeti500[123] as Resolved.

All the remote setup has been completed, installation to continue on T228099.

Fri, Sep 20, 9:08 PM · Operations, ops-eqsin
RobH added a comment to T228099: rack/setup/install ganeti500[123].eqsin.wmnet.

All remote hands setup on T229243 are done.

Fri, Sep 20, 9:08 PM · Operations, ops-eqsin

Thu, Sep 19

RobH reassigned T220505: Decommission iron from RobH to Cmjohnson.

Ready for disk wipes and continued decom process.

Thu, Sep 19, 4:19 PM · Cloud-VPS, ops-eqiad, decommission, Operations
RobH lowered the priority of T220505: Decommission iron from High to Normal.
Thu, Sep 19, 4:16 PM · Cloud-VPS, ops-eqiad, decommission, Operations
RobH added a project to T233318: scs monitoring missing in Icinga: Icinga.
Thu, Sep 19, 4:02 PM · Icinga, observability, Operations, netops
RobH added a comment to T233318: scs monitoring missing in Icinga.

Just FYI it seems the serial console's have some built in nagios support. I've attached a print out of the nagios configuration screen below.

Thu, Sep 19, 4:01 PM · Icinga, observability, Operations, netops
RobH moved T229557: decommission lithium from Backlog to Decommission on the ops-eqiad board.
Thu, Sep 19, 3:43 PM · Operations, ops-eqiad, DC-Ops, decommission
RobH reassigned T229557: decommission lithium from RobH to Cmjohnson.

ready for disk wipe and unracking

Thu, Sep 19, 3:43 PM · Operations, ops-eqiad, DC-Ops, decommission
RobH updated the task description for T229557: decommission lithium.
Thu, Sep 19, 3:41 PM · Operations, ops-eqiad, DC-Ops, decommission

Wed, Sep 18

RobH moved T174616: set up cr3-esams from Backlog to Racking Tasks on the ops-esams board.
Wed, Sep 18, 5:30 PM · ops-esams, Operations, netops
RobH moved T184064: Prepare racks OE14, OE15 and OE16 with new infrastructure from Backlog to Racking Tasks on the ops-esams board.
Wed, Sep 18, 5:30 PM · Operations, ops-esams
RobH moved T198790: Relabel hooft to bast3002 from Backlog to Next visit on the ops-esams board.
Wed, Sep 18, 5:30 PM · Operations, ops-esams
RobH moved T203272: cp3038, cp3039 - power supply redundancy failure from Backlog to Break/Fix on the ops-esams board.
Wed, Sep 18, 5:30 PM · ops-esams, Operations
RobH moved T84700: Setup management switch in OE12 from Backlog to Racking Tasks on the ops-esams board.
Wed, Sep 18, 5:30 PM · DC-Ops, Operations, ops-esams
RobH moved T174637: Setup esams atlas anchor from Backlog to Racking Tasks on the ops-esams board.
Wed, Sep 18, 5:30 PM · Operations, netops, ops-esams
RobH moved T184065: Setup new access switches from Backlog to Racking Tasks on the ops-esams board.
Wed, Sep 18, 5:30 PM · Operations, ops-esams
RobH moved T202046: cp3032 PS Redundancy Lost from Backlog to Break/Fix on the ops-esams board.
Wed, Sep 18, 5:30 PM · ops-esams, Operations, Traffic
RobH moved T202627: cp3036 PS Redundancy Lost from Backlog to Break/Fix on the ops-esams board.
Wed, Sep 18, 5:29 PM · Traffic, ops-esams, Operations
RobH moved T222041: cp3037 is currently unreachable from Backlog to Break/Fix on the ops-esams board.
Wed, Sep 18, 5:29 PM · ops-esams, Operations, Traffic
RobH moved T225035: cp3035 PS Redundancy Lost from Backlog to Break/Fix on the ops-esams board.
Wed, Sep 18, 5:29 PM · Traffic, Operations, ops-esams
RobH moved T233242: rack/setup/install cp30[50-65].esams.wmnet from Backlog to Racking Tasks on the ops-esams board.
Wed, Sep 18, 5:29 PM · Traffic, Operations, ops-esams
RobH moved T208585: Decommission esams cache_misc hosts from Backlog to Decommission on the ops-esams board.
Wed, Sep 18, 5:29 PM · ops-esams, decommission, Operations, Traffic
RobH added a parent task for T233242: rack/setup/install cp30[50-65].esams.wmnet: Unknown Object (Task).
Wed, Sep 18, 5:29 PM · Traffic, Operations, ops-esams
RobH triaged T233242: rack/setup/install cp30[50-65].esams.wmnet as Normal priority.
Wed, Sep 18, 5:28 PM · Traffic, Operations, ops-esams

Tue, Sep 17

RobH added a parent task for T229328: ps1 eqiad Icinga UNKNOWNs: T226778: Install new PDUs in rows A/B (Top level tracking task).
Tue, Sep 17, 10:27 PM · Operations, ops-eqiad, DC-Ops
RobH added a subtask for T226778: Install new PDUs in rows A/B (Top level tracking task): T229328: ps1 eqiad Icinga UNKNOWNs.
Tue, Sep 17, 10:27 PM · DC-Ops, Operations, ops-eqiad
RobH added a comment to T226778: Install new PDUs in rows A/B (Top level tracking task).

It seems that when the new PDU goes into place, it fails the icinga checks for:

Tue, Sep 17, 10:26 PM · DC-Ops, Operations, ops-eqiad
RobH added a comment to T227541: b6-eqiad pdu refresh (Tuesday 9/10 @11am UTC).

Please note this is an issue that is happening on ALL the new PDUs. I'll update the parent task.

Tue, Sep 17, 10:25 PM · DC-Ops, Operations, ops-eqiad
RobH added a comment to T227541: b6-eqiad pdu refresh (Tuesday 9/10 @11am UTC).

15:15 <@RobH> : So, I can confirm in librenms it sees both towers
15:15 <@RobH> : so, this seems to me to be an icinga issue
15:15 <@RobH> : Does this seem reasonable? If so, we need to likely involve someone with some icinga knowledge.

Tue, Sep 17, 10:16 PM · DC-Ops, Operations, ops-eqiad
RobH closed T227425: codfw: 1 misc node for the Kerberos KDC service, a subtask of T226089: Make the Kerberos infrastructure production ready, as Resolved.
Tue, Sep 17, 6:40 PM · Analytics-Kanban, User-Elukey, Analytics
RobH closed T227425: codfw: 1 misc node for the Kerberos KDC service as Resolved.

T233142 created for setup, resolving this request task!

Tue, Sep 17, 6:40 PM · hardware-requests, Operations, User-Elukey, Analytics
RobH closed T227288: eqiad: 1 misc node for the Kerberos KDC service, a subtask of T226089: Make the Kerberos infrastructure production ready, as Resolved.
Tue, Sep 17, 6:40 PM · Analytics-Kanban, User-Elukey, Analytics
RobH closed T227288: eqiad: 1 misc node for the Kerberos KDC service as Resolved.

T233141 created for setup. resolving this request task!

Tue, Sep 17, 6:40 PM · hardware-requests, Operations, User-Elukey, Analytics
RobH reassigned T233141: setup/install eqiad kerbos node WMF5173 from RobH to elukey.

Please note that both T233141 (eqiad) and T233142 (codfw) are nearly identical.

Tue, Sep 17, 6:38 PM · Operations, User-Elukey, Analytics
RobH reassigned T233142: setup/install codfw kerbos node WMF6577 from RobH to elukey.

Please note that both T233141 (eqiad) and T233142 (codfw) are nearly identical.

Tue, Sep 17, 6:38 PM · Operations, User-Elukey, Analytics
RobH created T233142: setup/install codfw kerbos node WMF6577.
Tue, Sep 17, 6:33 PM · Operations, User-Elukey, Analytics
RobH created T233141: setup/install eqiad kerbos node WMF5173.
Tue, Sep 17, 6:33 PM · Operations, User-Elukey, Analytics
RobH closed T227539: b3-eqiad pdu refresh (Tuesday 9/17 @11am UTC), a subtask of T226778: Install new PDUs in rows A/B (Top level tracking task), as Resolved.
Tue, Sep 17, 6:21 PM · DC-Ops, Operations, ops-eqiad
RobH closed T227539: b3-eqiad pdu refresh (Tuesday 9/17 @11am UTC) as Resolved.

I've gone ahead and setup remote access and settings identical to the other new PDUs. It now is online/ping/ssh/syslog accessible.

Tue, Sep 17, 6:21 PM · DC-Ops, Operations, ops-eqiad
RobH added a comment to T233129: update puppet for new PDU models.

If that is the case, then no puppet updates are required, as sentry4 is already listed for all the PDU models in eqiad.

Tue, Sep 17, 5:24 PM · DC-Ops, Operations, ops-eqiad
RobH updated the task description for T233129: update puppet for new PDU models.
Tue, Sep 17, 5:24 PM · DC-Ops, Operations, ops-eqiad
RobH updated subscribers of T233129: update puppet for new PDU models.
Tue, Sep 17, 4:28 PM · DC-Ops, Operations, ops-eqiad
RobH created T233129: update puppet for new PDU models.
Tue, Sep 17, 4:27 PM · DC-Ops, Operations, ops-eqiad

Thu, Sep 12

RobH removed a member for LDAP-Access-Requests: RobH.
Thu, Sep 12, 3:23 PM
RobH removed a watcher for LDAP-Access-Requests: RobH.
Thu, Sep 12, 3:22 PM
RobH removed a watcher for SRE-Access-Requests: RobH.
Thu, Sep 12, 3:22 PM

Wed, Sep 11

RobH added a parent task for T232630: rack/setup/install frqueue2001: Unknown Object (Task).
Wed, Sep 11, 4:36 PM · Operations, fundraising-tech-ops, ops-codfw
RobH created T232630: rack/setup/install frqueue2001.
Wed, Sep 11, 4:36 PM · Operations, fundraising-tech-ops, ops-codfw
RobH reassigned T227288: eqiad: 1 misc node for the Kerberos KDC service from RobH to faidon.

Please note that T227425 & T227288 are for spare pool allocations for kerbos in both codfw and eqiad. as such, I need approvals for allocating a single spare pool system in each location:

Wed, Sep 11, 3:04 PM · hardware-requests, Operations, User-Elukey, Analytics
RobH reassigned T227425: codfw: 1 misc node for the Kerberos KDC service from RobH to faidon.

Please note that T227425 & T227288 are for spare pool allocations for kerbos in both codfw and eqiad. as such, I need approvals for allocating a single spare pool system in each location:

Wed, Sep 11, 3:04 PM · hardware-requests, Operations, User-Elukey, Analytics

Mon, Sep 9

RobH added a comment to T232367: (2019-09-15) rack/setup/install ms-be105[1-6].eqiad.wmnet.

I've set the due date via the require by date on the ordering task:

Mon, Sep 9, 5:04 PM · Operations, ops-eqiad
RobH renamed T232367: (2019-09-15) rack/setup/install ms-be105[1-6].eqiad.wmnet from rack/setup/install ms-be105[1-6].eqiad.wmnet to (2019-09-15) rack/setup/install ms-be105[1-6].eqiad.wmnet.
Mon, Sep 9, 5:03 PM · Operations, ops-eqiad
RobH edited parent tasks for T232367: (2019-09-15) rack/setup/install ms-be105[1-6].eqiad.wmnet, added: Unknown Object (Task); removed: Unknown Object (Task).
Mon, Sep 9, 5:03 PM · Operations, ops-eqiad
RobH added a parent task for T232367: (2019-09-15) rack/setup/install ms-be105[1-6].eqiad.wmnet: Unknown Object (Task).
Mon, Sep 9, 5:02 PM · Operations, ops-eqiad
RobH triaged T232367: (2019-09-15) rack/setup/install ms-be105[1-6].eqiad.wmnet as Normal priority.
Mon, Sep 9, 5:02 PM · Operations, ops-eqiad
RobH renamed T230746: (Aug 30th, 2019) rack/setup/install elastic10[53-67].eqiad.wmnet from rack/setup/install elastic10[53-67].eqiad.wmnet to (Aug 30th, 2019) rack/setup/install elastic10[53-67].eqiad.wmnet.
Mon, Sep 9, 4:54 PM · Patch-For-Review, ops-eqiad, Operations

Fri, Sep 6

RobH placed T227540: b4-eqiad pdu refresh (Thursday 10/24 @11am UTC) up for grabs.
Fri, Sep 6, 3:35 PM · DC-Ops, Operations, ops-eqiad

Thu, Sep 5

RobH renamed T232137: rack/setup/install frnetmon1001 from rack/setup/install new eqiad netsec server to rack/setup/install frnetmon1001.
Thu, Sep 5, 6:22 PM · fundraising-tech-ops, Operations, ops-eqiad
RobH added a parent task for T232137: rack/setup/install frnetmon1001: Unknown Object (Task).
Thu, Sep 5, 6:21 PM · fundraising-tech-ops, Operations, ops-eqiad
RobH triaged T232137: rack/setup/install frnetmon1001 as Normal priority.
Thu, Sep 5, 6:21 PM · fundraising-tech-ops, Operations, ops-eqiad
RobH closed T225137: codfw humidity too high as Resolved.

I went ahead and pulled the CyrusOne report for this month, and humidity seems to be in the 50% range. It started high, but seems CyrusOne rebalanced and now its back to normal.

Thu, Sep 5, 3:22 PM · Operations, ops-codfw

Wed, Sep 4

RobH added a member for acl*procurement-review: Dwisehaupt.
Wed, Sep 4, 5:28 PM

Fri, Aug 30

RobH added a parent task for T231687: refresh/replace scs-c1-codfw: Unknown Object (Task).
Fri, Aug 30, 8:34 PM · Operations, ops-codfw
RobH added a parent task for T231686: refresh/replace scs-a1-codfw: Unknown Object (Task).
Fri, Aug 30, 8:34 PM · ops-codfw, Operations
RobH updated the task description for T231686: refresh/replace scs-a1-codfw.
Fri, Aug 30, 8:34 PM · ops-codfw, Operations
RobH triaged T231687: refresh/replace scs-c1-codfw as Normal priority.
Fri, Aug 30, 8:34 PM · Operations, ops-codfw
RobH triaged T231686: refresh/replace scs-a1-codfw as Normal priority.
Fri, Aug 30, 8:30 PM · ops-codfw, Operations

Thu, Aug 29

RobH updated the task description for T227540: b4-eqiad pdu refresh (Thursday 10/24 @11am UTC).
Thu, Aug 29, 6:08 PM · DC-Ops, Operations, ops-eqiad
RobH updated the task description for T227540: b4-eqiad pdu refresh (Thursday 10/24 @11am UTC).
Thu, Aug 29, 4:24 PM · DC-Ops, Operations, ops-eqiad
RobH added a comment to T214283: Memory correctable errors -EDAC- elastic1029.

Also, in the future, please open a new task for hardware troubleshooting and follow all directions on:

Thu, Aug 29, 3:47 PM · Discovery-Search (Current work), ops-eqiad, Discovery, DC-Ops, Operations
RobH added a comment to T214283: Memory correctable errors -EDAC- elastic1029.

This hsows no errors in the service event log for the memory:

Thu, Aug 29, 3:45 PM · Discovery-Search (Current work), ops-eqiad, Discovery, DC-Ops, Operations

Wed, Aug 28

RobH updated subscribers of T231365: Trouble accessing Jupyter Lab (SWAP).
Wed, Aug 28, 7:56 PM · Analytics-Kanban, Analytics-SWAP, Analytics, Jupyter-Hub
RobH placed T227539: b3-eqiad pdu refresh (Tuesday 9/17 @11am UTC) up for grabs.
Wed, Aug 28, 6:42 PM · DC-Ops, Operations, ops-eqiad
RobH placed T228859: dbproxy1012 and dbprov1001 alerting on PS Redundancy up for grabs.
Wed, Aug 28, 6:41 PM · DC-Ops, Operations, ops-eqiad
RobH placed T227141: a5-eqiad pdu refresh up for grabs.
Wed, Aug 28, 6:40 PM · DC-Ops, Operations, ops-eqiad
RobH placed T227140: a4-eqiad pdu refresh up for grabs.
Wed, Aug 28, 6:40 PM · DC-Ops, Operations, ops-eqiad
RobH placed T227139: a3-eqiad pdu refresh up for grabs.
Wed, Aug 28, 6:40 PM · DC-Ops, Operations, ops-eqiad
RobH changed Due Date from Nov 15 2019, 12:00 AM to Nov 15 2019, 11:00 AM on T227542: b7-eqiad pdu refresh (Tuesday 11/5 @11am UTC).
Wed, Aug 28, 6:40 PM · DC-Ops, Operations, ops-eqiad
RobH set Due Date to Nov 15 2019, 12:00 AM on T227542: b7-eqiad pdu refresh (Tuesday 11/5 @11am UTC).
Wed, Aug 28, 6:39 PM · DC-Ops, Operations, ops-eqiad
RobH placed T227542: b7-eqiad pdu refresh (Tuesday 11/5 @11am UTC) up for grabs.
Wed, Aug 28, 6:39 PM · DC-Ops, Operations, ops-eqiad
RobH triaged T227542: b7-eqiad pdu refresh (Tuesday 11/5 @11am UTC) as High priority.
Wed, Aug 28, 6:31 PM · DC-Ops, Operations, ops-eqiad
RobH updated the task description for T227542: b7-eqiad pdu refresh (Tuesday 11/5 @11am UTC).
Wed, Aug 28, 6:31 PM · DC-Ops, Operations, ops-eqiad
RobH triaged T227543: b8-eqiad pdu refresh (Thursday 10/31 @11am UTC) as High priority.
Wed, Aug 28, 6:30 PM · DC-Ops, Operations, ops-eqiad
RobH placed T227543: b8-eqiad pdu refresh (Thursday 10/31 @11am UTC) up for grabs.
Wed, Aug 28, 6:30 PM · DC-Ops, Operations, ops-eqiad
RobH updated the task description for T227543: b8-eqiad pdu refresh (Thursday 10/31 @11am UTC).
Wed, Aug 28, 6:30 PM · DC-Ops, Operations, ops-eqiad
RobH updated the task description for T227543: b8-eqiad pdu refresh (Thursday 10/31 @11am UTC).
Wed, Aug 28, 6:28 PM · DC-Ops, Operations, ops-eqiad
RobH updated the task description for T227543: b8-eqiad pdu refresh (Thursday 10/31 @11am UTC).
Wed, Aug 28, 6:27 PM · DC-Ops, Operations, ops-eqiad
RobH placed T227541: b6-eqiad pdu refresh (Tuesday 9/10 @11am UTC) up for grabs.

Removing myself as assignee since this has all the servers populated in the task description.

Wed, Aug 28, 6:17 PM · DC-Ops, Operations, ops-eqiad

Fri, Aug 23

RobH reassigned T223216: Decommission db2034 from RobH to Papaul.
Fri, Aug 23, 11:42 PM · Operations, decommission, ops-codfw
RobH removed a project from T223216: Decommission db2034: Patch-For-Review.
Fri, Aug 23, 11:42 PM · Operations, decommission, ops-codfw
RobH added a comment to T223216: Decommission db2034.

I cannot locate the labeled switch port on the switch, so @Papaul will need to trace and disable this via on-site work.

Fri, Aug 23, 11:36 PM · Operations, decommission, ops-codfw
RobH reassigned T228281: decommission db2045.codfw.wmnet from RobH to Papaul.
Fri, Aug 23, 11:13 PM · Operations, ops-codfw, DC-Ops, decommission
RobH updated the task description for T228281: decommission db2045.codfw.wmnet.
Fri, Aug 23, 11:11 PM · Operations, ops-codfw, DC-Ops, decommission
RobH updated the task description for T228281: decommission db2045.codfw.wmnet.
Fri, Aug 23, 11:02 PM · Operations, ops-codfw, DC-Ops, decommission
RobH reassigned T200209: Decom graphite2001/WMF6160 from RobH to Papaul.
Fri, Aug 23, 7:41 PM · Patch-For-Review, decommission, ops-codfw, Operations, observability
RobH updated the task description for T200209: Decom graphite2001/WMF6160 .
Fri, Aug 23, 7:40 PM · Patch-For-Review, decommission, ops-codfw, Operations, observability
RobH added a comment to T200209: Decom graphite2001/WMF6160 .

statsd.codfw.wmnet points to graphite2001.codfw.wmnet, so I'm not sure what to point this at.

Fri, Aug 23, 7:35 PM · Patch-For-Review, decommission, ops-codfw, Operations, observability
RobH added a comment to T200209: Decom graphite2001/WMF6160 .

Ok, I synced with @wiki_willy about this and the comment above.

Fri, Aug 23, 7:34 PM · Patch-For-Review, decommission, ops-codfw, Operations, observability
RobH renamed T200209: Decom graphite2001/WMF6160 from Decom graphite2001 to Decom graphite2001/WMF6160.
Fri, Aug 23, 7:19 PM · Patch-For-Review, decommission, ops-codfw, Operations, observability
RobH updated the task description for T200209: Decom graphite2001/WMF6160 .
Fri, Aug 23, 7:13 PM · Patch-For-Review, decommission, ops-codfw, Operations, observability