Page MenuHomePhabricator

Can't login into Wikimedia Space
Closed, ResolvedPublic

Description

Hi,

I'm trying to create my account (or login) today and there is an Error 500 (the image is Spanish localizated).

Per url, the link is trying to authenticate with Mediawiki https://discuss-space.wmflabs.org/auth/mediawiki

Kindly

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptSep 30 2019, 2:10 PM
Qgil triaged this task as High priority.Sep 30 2019, 2:27 PM
Qgil edited projects, added Space (Jul-Sep-2019); removed Space.
Qgil added a subscriber: Qgil.

Thank you for reporting this problem @Superzerocool

I can see several messages like this in Discourse's error log:

SocketError (Failed to open TCP connection to meta.wikimedia.org:443 (getaddrinfo: Temporary failure in name resolution))

Today there has been a change in Wikimedia Cloud's proxies, Discourse was down because of this, and I wonder if both problems have any relation.

In the meantime, I am going to restart Discourse in case it helps.

Mmm... I have to run now and I don't want to do this in a rush.

I'm adding the Cloud-Services tag for now, just in case.

Qgil added a comment.Sep 30 2019, 2:34 PM

Meh. I have to rebuild after all...

Qgil added a comment.Sep 30 2019, 2:35 PM

OK, now the problem is bigger and Discourse is down. :(

fatal: unable to access 'https://github.com/discourse/pups.git/': Could not resolve host: github.com

Ok, I'll wait for it.

Anyway, it has a lot of 502 error right now D:

Qgil raised the priority of this task from High to Unbreak Now!.Sep 30 2019, 2:37 PM
Restricted Application added a subscriber: Liuxinyu970226. · View Herald TranscriptSep 30 2019, 2:37 PM
Qgil added a comment.Sep 30 2019, 2:38 PM

I have pointed to this problem at the wikimedia-cloud chat room. Let's see if someone can help.

I see some issues with docker in the discuss-space server:

root@discuss-space:~# sudo systemctl status docker
● docker.service - Docker Application Container Engine
   Loaded: loaded (/lib/systemd/system/docker.service; enabled; vendor preset: enabled)
   Active: active (running) since Mon 2019-07-29 12:51:44 UTC; 2 months 2 days ago
     Docs: https://docs.docker.com
 Main PID: 572 (dockerd)
    Tasks: 13
   Memory: 424.6M
      CPU: 1h 19min 52.136s
   CGroup: /system.slice/docker.service
           └─572 /usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock

Sep 30 14:34:03 discuss-space dockerd[572]: time="2019-09-30T14:34:03.996626400Z" level=info msg="ignoring event" module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"
Sep 30 14:34:21 discuss-space dockerd[572]: time="2019-09-30T14:34:21.311702774Z" level=info msg="ignoring event" module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"
Sep 30 14:43:49 discuss-space dockerd[572]: time="2019-09-30T14:43:49.259395181Z" level=info msg="ignoring event" module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"
Sep 30 14:43:49 discuss-space dockerd[572]: time="2019-09-30T14:43:49.864165633Z" level=error msg="Error setting up exec command in container app: Container 7b319f93596d2a67ec2ed9d5596ba86585d913f97cc8a6470454381e7891868c is not running"
Sep 30 14:44:00 discuss-space dockerd[572]: time="2019-09-30T14:44:00.574590474Z" level=info msg="ignoring event" module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"
Sep 30 14:44:09 discuss-space dockerd[572]: time="2019-09-30T14:44:09.106721491Z" level=info msg="ignoring event" module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"
Sep 30 14:45:03 discuss-space dockerd[572]: time="2019-09-30T14:45:03.391822155Z" level=info msg="ignoring event" module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"
Sep 30 14:45:04 discuss-space dockerd[572]: time="2019-09-30T14:45:04.401662158Z" level=warning msg="Failed to allocate and map port 443-443:  (iptables failed: iptables --wait -t nat -A DOCKER -p tcp -d 0/0 --dport 443 -j DNAT --to-destination 172.17.0.2:443 ! -i docker0: iptables
Sep 30 14:45:04 discuss-space dockerd[572]: time="2019-09-30T14:45:04.483043170Z" level=error msg="7b319f93596d2a67ec2ed9d5596ba86585d913f97cc8a6470454381e7891868c cleanup: failed to delete container from containerd: no such container"
Sep 30 14:45:04 discuss-space dockerd[572]: time="2019-09-30T14:45:04.484040870Z" level=error msg="Handler for POST /v1.39/containers/app/start returned error: driver failed programming external connectivity on endpoint app (c11cb5d0f7fefb9d6cc37ec6950f78dfce53eaedeeaa93539[..]

Please, restart the docker service by hand sudo systemctl restart docker. I can do that myself, but I want to at least coordinate with you to make sure I'm not breaking anything else.

Per https://lists.wikimedia.org/pipermail/cloud-announce/2019-September/000213.html we installed ferm into every VM (including this one) by mistake. Then we deleted it. Ferm is a software that manages iptables rules. When we deleted ferm, it wiped the iptables configuration, and docker requires a very specific iptables setup to properly work.

Mentioned in SAL (#wikimedia-cloud) [2019-09-30T14:59:55Z] <tgr> restarted docker on discuss-space for T234218

Tgr closed this task as Resolved.Sep 30 2019, 3:03 PM
Tgr claimed this task.

Verified working.

Thanks everyone for your help, now it's working as expected :)

Qgil added a comment.Sep 30 2019, 3:13 PM

Phew! Thank you so much for this very very quick response and fix!!!

Qgil moved this task from Backlog to Evaluated on the Space (Jul-Sep-2019) board.Sep 30 2019, 9:29 PM