Page MenuHomePhabricator

http://extdist.wmflabs.org/ 502's (Bad Gateway)
Closed, ResolvedPublic

Description

curl extdist-01.extdist.eqiad.wmflabs and curl extdist-02.extdist.eqiad.wmflabs from tools-bastion-01 seems to fail as well.

  • extdist-01.extdist.eqiad.wmflabs does not respond to SSH
  • extdist-02.extdist.eqiad.wmflabs closes connection immediately after initial handshake (ssh_exchange_identification: Connection closed by remote host)

Event Timeline

I'm having trouble reading the console log ("Your account is not in the project extdist.", even though it is), but I assume this is another instance of T141673: Track labs instances hanging .

Mentioned in SAL [2016-08-17T22:20:21Z] <legoktm> rebooting extdist-01 T143209

Mentioned in SAL [2016-08-17T22:26:11Z] <legoktm> rebooting extdist-02 T143209

-02 is definitely a hanging instance:

[321000.360669] INFO: task nscd:611 blocked for more than 120 seconds.
[321000.361327]       Not tainted 4.4.0-1-amd64 #1
[321000.361736] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[321000.362977] INFO: task nscd:612 blocked for more than 120 seconds.
[321000.363927]       Not tainted 4.4.0-1-amd64 #1
[321000.366289] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.

with respect to -01, I can only find this in /var/log/syslog:

Aug 17 10:19:37 extdist-01 puppet-agent[28289]: Finished catalog run in 8.55 seconds
Aug 17 22:22:57 extdist-01 rsyslogd: [origin software="rsyslogd" swVersion="8.4.2" x-pid="559" x-info="http://www.rsyslog.com"] start

which pinpoints when the host locked up completely. (the 22:22 is when the host was restarted)

chasemp claimed this task.
chasemp added a subscriber: chasemp.

this seems back now