Page MenuHomePhabricator

tools-sgegrid-master needs to be a submit host for tool deletion
Closed, ResolvedPublic

Description

I wrote disable-tools assuming that I could qdel on a grid master -- it seems that this is no longer the case:

root@tools-sgegrid-master:/srv/disable-tool# /usr/bin/qdel 3119
denied: host "tools-sgegrid-master.tools.eqiad1.wikimedia.cloud" is not a submit host

I can probably rearrange the code if necessary, but can we just make this a submit host?

Event Timeline

I can probably rearrange the code if necessary, but can we just make this a submit host?

I think that would be probably ok. Brooke made fewer nodes in the grid system submit hosts when she redesigned everything, I think mostly to reduce attack surfaces. The master node isn't user accessible, so I'm not sure what new attack vector letting it do things like qdel would create. We certainly have docs on wikitech showing past use of qdel on the grid master node.

@aborrero do you remember any specific reason that this would be a very bad idea?

Other than reducing attack surface, I don't remember any specific concern.

Other than reducing attack surface, I don't remember any specific concern.

I think I remembered the specific attack surface concerns, but for WP:BEANS reasons I will not enumerate them here.

@Andrew I was wondering if all of the grid bits could be handled from the grid's cron server? It is by necessity a submit host.

@Andrew I was wondering if all of the grid bits could be handled from the grid's cron server? It is by necessity a submit host.

If we can run qconf as root on the cron servers then, yes, moving everything to the cron server is the right thing. If not then we can split things up but we'll probably need more lockfiles.

@Andrew I was wondering if all of the grid bits could be handled from the grid's cron server? It is by necessity a submit host.

If we can run qconf as root on the cron servers then, yes, moving everything to the cron server is the right thing. If not then we can split things up but we'll probably need more lockfiles.

$ ssh tools-sgecron-01.tools.eqiad.wmflabs
$ sudo qconf --help
denied: host "tools-sgecron-01.tools.eqiad.wmflabs" is not an admin host

The security model that Brooke adopted isolates "admin" functions from "submit" functions. We could:

  1. undo this separation
  2. add more agents for grid bits (one for admin and one for submit)
  3. add a special host just for the cleanup actions that is both and try really hard not to let normal users interact with it

Change 751474 had a related patch set uploaded (by Andrew Bogott; author: Andrew Bogott):

[cloud/toolforge/disable-tool@master] Move _kill_grid_jobs from the grid master to the cron host

https://gerrit.wikimedia.org/r/751474

Change 751474 merged by Andrew Bogott:

[cloud/toolforge/disable-tool@master] Move _kill_grid_jobs from the grid master to the cron host

https://gerrit.wikimedia.org/r/751474