Page MenuHomePhabricator

tools-webgrid-lighttpd-1201 webservices and ssh unaccessible
Closed, ResolvedPublic

Description

13:27 <phe> valhallasw`cloud, can you get a look if tools-webgrid-lighttpd-1201 has trouble?
13:28 <valhallasw`cloud> phe: what's wrong?
13:28 <phe> my tools return 404, web server seems up from qstat but webservice restart timeout trying to kill the server
13:28 <phe> it runs on tools-webgrid-lighttpd-1201 which seems down, I can't ssh to
13:29 <phe> other tools on 1201 seems to freeze too

Event Timeline

valhallasw raised the priority of this task from to Needs Triage.
valhallasw updated the task description. (Show Details)
valhallasw added a project: Toolforge.
valhallasw added subscribers: valhallasw, Phe.
Restricted Application added a project: Cloud-Services. · View Herald TranscriptJan 1 2016, 12:30 PM
Restricted Application added subscribers: StudiesWorld, Aklapper. · View Herald Transcript

root login also hangs:

debug1: Offering RSA public key: labs-root.id_rsa
debug1: Server accepts key: pkalg ssh-rsa blen 277
debug1: Authentication succeeded (publickey).
Authenticated to tools-webgrid-lighttpd-1201. (via proxy).
debug1: channel 0: new [client-session]
debug1: Requesting no-more-sessions@openssh.com
debug1: Entering interactive session.

Currently-running webservices are

valhallasw@tools-bastion-01:~/accountingtools$ qmod -rq qhost -h tools-webgrid-lighttpd-1201 -j
HOSTNAME                ARCH         NCPU  LOAD  MEMTOT  MEMUSE  SWAPTO  SWAPUS
-------------------------------------------------------------------------------
global                  -               -     -       -       -       -       -
tools-webgrid-lighttpd-1201.eqiad.wmflabs lx26-amd64      4     -    7.8G       -   23.9G       -
   job-ID  prior   name       user         state submit/start at     queue      master ja-task-ID
   ----------------------------------------------------------------------------------------------
       596 0.30601 lighttpd-p tools.potd-f r     12/30/2015 04:08:02 webgrid-li MASTER
       640 0.30601 lighttpd-d tools.dexbot r     12/30/2015 04:08:51 webgrid-li MASTER
       678 0.30600 lighttpd-w tools.wikida r     12/30/2015 04:09:29 webgrid-li MASTER
       710 0.30600 lighttpd-c tools.catmon r     12/30/2015 04:10:05 webgrid-li MASTER
       775 0.30600 lighttpd-w tools.wiktio r     12/30/2015 04:10:48 webgrid-li MASTER
       818 0.30600 lighttpd-s tools.sdbot  r     12/30/2015 04:11:35 webgrid-li MASTER
       855 0.30600 lighttpd-t tools.tools- r     12/30/2015 04:12:07 webgrid-li MASTER
       876 0.30600 lighttpd-y tools.yadkar dr    12/30/2015 04:12:38 webgrid-li MASTER
       909 0.30600 lighttpd-t tools.tree-o r     12/30/2015 04:13:13 webgrid-li MASTER
       959 0.30600 lighttpd-g tools.geohac r     12/30/2015 04:14:14 webgrid-li MASTER
      1053 0.30599 lighttpd-h tools.heimda r     12/30/2015 04:15:17 webgrid-li MASTER
      1102 0.30599 lighttpd-b tools.blahma r     12/30/2015 04:16:10 webgrid-li MASTER
      1148 0.30599 lighttpd-b tools.betabo r     12/30/2015 04:17:05 webgrid-li MASTER
      1182 0.30599 lighttpd-w tools.wikili r     12/30/2015 04:18:02 webgrid-li MASTER
      1218 0.30599 lighttpd-c tools.cats-p r     12/30/2015 04:18:32 webgrid-li MASTER
      1272 0.30599 lighttpd-p tools.projek dr    12/30/2015 04:19:33 webgrid-li MASTER
      1371 0.30598 lighttpd-c tools.cobain r     12/30/2015 04:20:56 webgrid-li MASTER
      1409 0.30598 lighttpd-u tools.url-co r     12/30/2015 04:21:44 webgrid-li MASTER
      1443 0.30598 lighttpd-t tools.transl r     12/30/2015 04:22:19 webgrid-li MASTER
      1462 0.30598 lighttpd-i tools.icalen r     12/30/2015 04:22:51 webgrid-li MASTER
     52246 0.30300 lighttpd-p tools.phetoo dr    12/31/2015 08:23:28 webgrid-li MASTER

After

qmod -rq "webgrid-lighttpd@tools-webgrid-lighttpd-1201.eqiad.wmflabs"

these are left:

  876 0.30600 lighttpd-y tools.yadkar dr    12/30/2015 04:12:38 webgrid-li MASTER
 1272 0.30599 lighttpd-p tools.projek dr    12/30/2015 04:19:33 webgrid-li MASTER
52246 0.30301 lighttpd-p tools.phetoo dr    12/31/2015 08:23:28 webgrid-li MASTER

I force-deleted those with

valhallasw@tools-bastion-01:~/accountingtools$ qdel -f 876 1272 52246
warning: valhallasw forced the deletion of job 876
warning: valhallasw forced the deletion of job 1272
warning: valhallasw forced the deletion of job 52246

which should bring the tools back online.

Phe added a comment.Jan 12 2016, 3:20 PM

same trouble but on ssh tools-webgrid-lighttpd-1202, ssh and my tools running on it freeze

Phe added a comment.Jan 12 2016, 3:48 PM

working now, someone restarted it as far I can see.

scfc added a subscriber: scfc.Jan 20 2016, 6:18 PM

At https://wikitech.wikimedia.org/wiki/Special:NovaInstance, "get console output" gives "Failed to get console output for instance tools-webgrid-lighttpd-1201.". Trying to reboot gives "Failed to reboot instance tools-webgrid-lighttpd-1201."

chasemp closed this task as Resolved.Jan 20 2016, 7:49 PM
chasemp claimed this task.
scfc reopened this task as Open.Jan 20 2016, 9:04 PM

I still cannot ssh into that instance:

[tim@passepartout ~]$ ssh -v tools-webgrid-lighttpd-1201.tools.eqiad.wmflabs
OpenSSH_7.1p2, OpenSSL 1.0.2e-fips 3 Dec 2015
debug1: Reading configuration data /home/tim/.ssh/config
debug1: /home/tim/.ssh/config line 10: Applying options for *.eqiad.wmflabs
debug1: /home/tim/.ssh/config line 16: Applying options for *.wmflabs
debug1: Reading configuration data /etc/ssh/ssh_config
debug1: /etc/ssh/ssh_config line 56: Applying options for *
debug1: Control socket "/home/tim/.ssh/scfc@tools-webgrid-lighttpd-1201.tools.eqiad.wmflabs:22" does not exist
debug1: Executing proxy command: exec ssh -a -q -W tools-webgrid-lighttpd-1201.tools.eqiad.wmflabs:22 bastion.wmflabs.org
debug1: permanently_drop_suid: 1000
debug1: identity file /home/tim/.ssh/id_rsa type 1
debug1: key_load_public: No such file or directory
debug1: identity file /home/tim/.ssh/id_rsa-cert type -1
debug1: key_load_public: No such file or directory
debug1: identity file /home/tim/.ssh/id_dsa type -1
debug1: key_load_public: No such file or directory
debug1: identity file /home/tim/.ssh/id_dsa-cert type -1
debug1: key_load_public: No such file or directory
debug1: identity file /home/tim/.ssh/id_ecdsa type -1
debug1: key_load_public: No such file or directory
debug1: identity file /home/tim/.ssh/id_ecdsa-cert type -1
debug1: key_load_public: No such file or directory
debug1: identity file /home/tim/.ssh/id_ed25519 type -1
debug1: key_load_public: No such file or directory
debug1: identity file /home/tim/.ssh/id_ed25519-cert type -1
debug1: Enabling compatibility mode for protocol 2.0
debug1: Local version string SSH-2.0-OpenSSH_7.1
ssh_exchange_identification: Connection closed by remote host
[tim@passepartout ~]$