Page MenuHomePhabricator

scs-c1-eqiad CPU usage over 85%
Closed, ResolvedPublic

Description

Since 01:08 UTC, scs-c1-eqiad reports a CPU usage at 100% as it can be seen in librenms: https://librenms.wikimedia.org/device/device=158/tab=health/metric=processor/

CPU usage suddenly went from a 15% on average to a 100%

Event Timeline

Restricted Application added a project: Operations. · View Herald TranscriptNov 12 2019, 4:02 AM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Mentioned in SAL (#wikimedia-operations) [2019-11-12T16:21:02Z] <XioNoX> reboot scs-c1-eqiad.mgmt.eqiad.wmnet - T238036

PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND    
1855 root      20   0  4576 2084  876 S  2.7  0.8  26623:27 portmanager

did a kill -9 portmanager just in case but it didn't change anything (the process restarted with the same 2% CPU load).
Then killed snmpd, which lower the CPU for a bit but then went back up.
Trying a reboot.

ayounsi closed this task as Resolved.Nov 12 2019, 4:33 PM
ayounsi claimed this task.

CPU is back to normal.

ayounsi reopened this task as Open.Jul 22 2020, 5:13 AM
ayounsi removed ayounsi as the assignee of this task.
ayounsi triaged this task as High priority.
ayounsi edited projects, added DC-Ops; removed netops.

This has been alerting since a few days ago. It might be worth following up with the vendor instead of rebooting the console servers every few months.

wiki_willy added a project: ops-eqiad.
Cmjohnson reassigned this task from Cmjohnson to RobH.Wed, Sep 2, 6:57 PM
Cmjohnson added subscribers: RobH, Cmjohnson.

@ayounsi I am not sure if there is a vendor to follow up with on this. checking with @RobH

Mentioned in SAL (#wikimedia-operations) [2020-09-02T19:12:46Z] <robh> updating firmware on scs-c1-eqiad via T238036

Mentioned in SAL (#wikimedia-operations) [2020-09-02T19:14:52Z] <robh> updating firmware on scs-c1-eqiad via T238036

RobH added a comment.Wed, Sep 2, 7:16 PM

scs-a1-eqiad firmware was 3.16.6u4, newest stable at this time is 4.9.0u1, updating

Mentioned in SAL (#wikimedia-operations) [2020-09-02T19:20:14Z] <robh> scs-c1-eqiad firmware update complete and back online T238036

RobH closed this task as Resolved.Wed, Sep 2, 7:50 PM
RobH removed RobH as the assignee of this task.

Firmware updated to the newest version. If it happens again, we can reopen and investigate with OpenGear.