Page MenuHomePhabricator

No file system on toollabs, unable to login, web service broken
Closed, ResolvedPublic

Assigned To
Authored By
Multichill
Aug 30 2015, 9:31 AM
Referenced Files
None
Tokens
"The World Burns" token, awarded by Liuxinyu970226."The World Burns" token, awarded by Legoktm."The World Burns" token, awarded by MGChecker."The World Burns" token, awarded by Florian."The World Burns" token, awarded by Thibaut120094."The World Burns" token, awarded by Sjoerddebruin."The World Burns" token, awarded by Enterprisey."The World Burns" token, awarded by revi."The World Burns" token, awarded by doctaxon."The World Burns" token, awarded by Romaine."The World Burns" token, awarded by DerHexer."The World Burns" token, awarded by Luke081515."The World Burns" token, awarded by Steinsplitter."The World Burns" token, awarded by Addshore.

Description

multichill@tools-bastion-01:~/queries/wikidata$ ls
(nothing happens)

ssh tools-login.wmflabs.org

(just times out)

Web service on http://tools.wmflabs.org/ also broken. It gives 500 Internal Server Error (currently replaced by a "Our servers are currently experiencing a technical problem. " placeholder)

Event Timeline

Multichill raised the priority of this task from to Unbreak Now!.
Multichill updated the task description. (Show Details)
Multichill added a project: Toolforge.
Multichill subscribed.
Restricted Application added a subscriber: Aklapper. · View Herald Transcript

The NFS server is down due to some kernel issues, we're working on it.

The last successful backup of tools is from 2015-08-30T01:59:35.787Z so at least we have a very recent backup

Multichill renamed this task from No file system on toollabs, unable to login to No file system on toollabs, unable to login, web service broken.Aug 30 2015, 11:00 AM
Multichill updated the task description. (Show Details)

I wonder why you haven't a redundant NFS server system not yet.

Romaine rescinded a token.
Romaine awarded a token.

The NFS server and tool labs are back online.

Tool operators are getting a lot of failed jobs emails. All because of the outage. Emails should all have a timestamp that falls in the outage period.

valhallasw claimed this task.

The initial issue was resolved sunday afternoon (CEST), but I forgot to close the task at that point.