Page MenuHomePhabricator

Tools Docker Registry is Dead
Closed, ResolvedPublic

Description

I can't ssh into the node - it got killed in the middle of a push. Rebooting it has no effect, and I tried to create a new node and that only gives me a password prompt, forever.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

You didn't put down the name of the node you tried to create :)

I'm guessing:

3e0dc2bb-4e99-4e84-aca4-f3be6a4134b0tools-docker-registry-02ACTIVEpublic=10.68.21.176
OS-SRV-USG:launched_at2016-11-16T05:33:10.000000

It seems up to me via root key but there are issues with the NFS mounts. I ran into a bit of trouble here this morning actually. I believe https://gerrit.wikimedia.org/r/#/c/321875/ may resolve this. The VM is currently up but has an odd mount state in that /home is fine but /data/project atm is not. @Andrew is helping me take a look.

Change 321883 had a related patch set uploaded (by Rush):
tools: when establishing /home from NFS force creation

https://gerrit.wikimedia.org/r/321883

Change 321883 merged by Andrew Bogott:
tools: when establishing links to NFS force creation

https://gerrit.wikimedia.org/r/321883

Change 321886 had a related patch set uploaded (by Rush):
nfs-exportd: ensure running and start on boot

https://gerrit.wikimedia.org/r/321886

Change 321886 merged by Rush:
nfs-exportd: ensure running and start on boot

https://gerrit.wikimedia.org/r/321886

The VM is up and running. I'm not resolving and I'm not sure what remains here re: the registry itself.

yuvipanda claimed this task.

This is all sorted out now. For some reason my ssh setup borked exactly at the same time as this happened (see T150896). I was able to ssh into the first docker registry, and clean out log files that had filled its / (nginx was logging in debug mode), and all is well now.