Page MenuHomePhabricator

Become failed for newly created tool
Closed, DuplicatePublic

Description

Created a new tool https://toolsadmin.wikimedia.org/tools/id/dspull

The last Puppet run was at Fri Jul 19 16:22:24 UTC 2019 (8 minutes ago). 
Last login: Fri Jul 19 16:24:44 2019 from 8.8.8.8
ghuron@tools-sgebastion-07:~$ become dspull
become: no such tool 'dspull'

Event Timeline

@Krenair, I assume that only home directory was not created:

ghuron@tools-sgebastion-07:~$ ls /data/project/dspull/
ls: cannot access '/data/project/dspull/': No such file or directory
ghuron@tools-sgebastion-07:~$ id
uid=19300(ghuron) gid=500(wikidev) groups=500(wikidev),50380(project-tools),53381(tools.mw2sparql),53738(tools.wdml),54111(tools.dspull)
Ghuron triaged this task as High priority.Jul 22 2019, 12:47 PM

Mentioned in SAL (#wikimedia-cloud) [2019-07-22T23:44:30Z] <bd808> Restarted maintain-kubeusers on tools-k8s-master-01 (T228529)

bd808 claimed this task.
bd808 edited projects, added cloud-services-team (Kanban); removed Tool-admin.
bd808 added subscribers: yuvipanda, bd808.

The service that creates home directories for new tools had gotten stuck. Immediately after being restarted it created the directories for dspull and two other tools that were pending.

MusikAnimal subscribed.

I'm guessing the tool creation service is stuck again. I have the same symptoms with the [[ https://toolsadmin.wikimedia.org/tools/id/wikiwho | wikiwho ]] tool. become says it doesn't exist and the home directory is missing.

I just saw wikiwho get created:

Sep  6 15:10:41 tools-k8s-master-01 maintain-kubeusers[5844]: starting a run
Sep  6 15:10:41 tools-k8s-master-01 maintain-kubeusers[5844]: Wrote config in /data/project/wikiwho/.kube/config
Sep  6 15:10:41 tools-k8s-master-01 maintain-kubeusers[5844]: (b'namespace "wikiwho" created\n', b'')
Sep  6 15:10:41 tools-k8s-master-01 maintain-kubeusers[5844]: Provisioned creds for tool wikiwho

Since it was turned into a systemd timer, I think there's a problem with the scheduling. It had hung for a while and then started again without human interaction.

Since it was turned into a systemd timer, I think there's a problem with the scheduling. It had hung for a while and then started again without human interaction.

I had to hit it with a hammer -- https://phabricator.wikimedia.org/T194859#5471477