Page MenuHomePhabricator

restbase2018 down
Closed, ResolvedPublic

Description

restbase2018 is down and can't be reached via SSH

I could still get on mgmt and saw it sitting at login. That seemed like it's cable or switch port.

But when trying to login as root on mgmt console it did not work either.

So this seems more broken.

This has had previous tickets

purchase date: 2018-11-14

https://netbox.wikimedia.org/dcim/devices/2012/

please take a look at it, unfortunately nothing here:

/admin1-> racadm getsel
Record:      1
Date/Time:   06/09/2021 13:54:04
Source:      system
Severity:    Ok
Description: Log cleared.
-------------------------------------------------------------------------------

Event Timeline

Mentioned in SAL (#wikimedia-operations) [2022-07-01T22:12:38Z] <mutante> restbase2018 - attempting power cycle via mgmt - /admin1-> racadm serveraction powercycle (T311890)

powercycling via mgmt brought it back as if nothing happened

nothing obvious in syslog, or restbase/syslog.

Dzahn claimed this task.

feel free to reopen if you see any issue with this again

22:20 <+icinga-wm> RECOVERY - cassandra-a CQL 10.192.48.124:9042 on restbase2018 is OK: TCP OK - 0.033 second response time on 10.192.48.124 port 9042 https://phabricator.wikimedia.org/T93886
22:20 < mutante> I glanced at syslog as well
22:21 < mutante> there is a separate syslog just for restbase too, but:
22:21 < mutante> May 17 14:08:38 restbase2018 restbase[27229]: #033]0;firejail /usr/bin/nodejs restbase/server.js -c /etc/restbase/config.yaml #007Child process initialized in 98.93 ms
22:21 < mutante> Jul  1 22:14:34 restbase2018 restbase[937]: Reading profile /etc/firejail/default.profile
22:21 <+icinga-wm> RECOVERY - puppet last run on restbase2018 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun
22:36 <+icinga-wm> RECOVERY - cassandra-b CQL 10.192.48.125:9042 on restbase2018 is OK: TCP OK - 0.037 second response time on 10.192.48.125 port 9042 https://phabricator.wikimedia.org/T93886
22:38 <+icinga-wm> RECOVERY - cassandra-b SSL 10.192.48.125:7001 on restbase2018 is OK: SSL OK - Certificate restbase2018-b valid until 2022-10-08 10:54:09 +0000 (expires in 98 days) 
                   https://wikitech.wikimedia.org/wiki/Cassandra%23Installing_and_generating_certificates
22:41 <+icinga-wm> RECOVERY - cassandra-c CQL 10.192.48.126:9042 on restbase2018 is OK: TCP OK - 0.033 second response time on 10.192.48.126 port 9042 https://phabricator.wikimedia.org/T93886
22:43 <+icinga-wm> RECOVERY - cassandra-c SSL 10.192.48.126:7001 on restbase2018 is OK: SSL OK - Certificate restbase2018-c valid until 2022-10-08 10:54:12 +0000 (expires in 98 days) 
                   https://wikitech.wikimedia.org/wiki/Cassandra%23Installing_and_generating_certificates