Page MenuHomePhabricator

kakfa1013 shows a failed PSU
Closed, DuplicatePublic

Description

Hi!

Icinga complains about a failed PSU for kafka1013:

-------------------------------------------------------------------------------
Record:      192
Date/Time:   01/03/2019 10:37:25
Source:      system
Severity:    Critical
Description: An under voltage fault detected on power supply 2.
-------------------------------------------------------------------------------
Record:      193
Date/Time:   01/03/2019 10:37:29
Source:      system
Severity:    Critical
Description: The power input for power supply 2 is lost.
-------------------------------------------------------------------------------
Record:      194
Date/Time:   01/03/2019 10:37:34
Source:      system
Severity:    Critical
Description: Power supply redundancy is lost.
-------------------------------------------------------------------------------

The host is OOW but we are not ready to decom it yet (we might at the end of the quarter), so there are two things that we could do:

  • check with @Cmjohnson if any spare/old PSU is available for the host and in case replace it
  • if the above option is not available, reduce the Kafka cluster to 5 nodes and decom the host. It requires a bit of manual work but it should be doable.

Event Timeline

elukey created this task.Jan 3 2019, 11:15 AM