Page MenuHomePhabricator

502s and kubernetes-based tool labs service not restarting
Closed, ResolvedPublic


Hey all, we noticed consistent 502s on our tool Monumental ( Checked the logs, no errors. Went to restart the service, webservice said it wasn't running and attempted to start it, but no luck. Is there an ongoing Kubernetes issue or something similar?

The command we used to restart was:

webservice --backend=kubernetes python2 restart

I'd expect to at least see something in the logs if there was something wrong on our side.

Also, I tried to join Cloud-Services on IRC, but was blocked with an error message telling me I wasn't invited to the channel. Is there a policy change here? Or someone I could talk to?

Event Timeline

mahmoud created this task.Jun 29 2017, 8:11 PM
Restricted Application added a project: Cloud-Services. · View Herald TranscriptJun 29 2017, 8:11 PM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript

I can tell you the IRC issue: there's been a rebranding. New channel is #wikimedia-cloud.

@bd808 informed me they were looking into it. Don't know if there's a task yet.

The Kubernetes cluster is not allowing new job submissions globally right now. We are in the midst of a NFS related event following today's planned server reboot for kernel updates. We will email to the labs-l/labs-announce mailing list when things are better. Sorry for the inconvenience.

We finally got an announcement of the ongoing issue up on the mailing lists:

WE will follow up with an all clear announcement once things are in better shape and then also work on an outage report that we will share explaining what happened and the causes/corrections we found.

Phamhi claimed this task.Mon, Oct 28, 12:36 PM
Phamhi closed this task as Resolved.Mon, Oct 28, 12:38 PM

This ticket is closed as part of scheduled clean-up. Please let us if this issue needs to be re-opened.