Page MenuHomePhabricator

Review and productionize webservice manifest monitor
Closed, ResolvedPublic

Description

Because it is developed on github (https://github.com/yuvipanda/tools-manager), it needs to go through review before being moved to gerrit. And then it can go through normal review practices.

Also since it'll run as a privileged user it'll need more review!

Event Timeline

yuvipanda raised the priority of this task from to Medium.
yuvipanda updated the task description. (Show Details)
yuvipanda added subscribers: Elitre, Matanya, Pathoschild and 8 others.

collector.py:

  • toollog: just append to the file ('a') instead of 'w'riting to it
  • collect: use yaml.safe_load

manifest.py:

  • missing space between =( in WEBSERVICE_TYPES

I have imported it from github to gerrit, is at operations/software/tools-manifest now. I'll move it to python 3.4 shortly as well.

Not complete:

  • collector.py:
    • toollog():
      • Files created in the tool's directory by root should be deletable and even modifyable by the tool due to the directory's setgid bit. So you don't need to care about the permissions.
      • You need to ensure that you don't follow symlinks.
      • If you stick to the sudo approach, you could probably just open a bidirectional pipe to sudo -iu tools.something tee -a /data/project/tools.something/services.log and throw away stdout.
    • collect():
      • You check where the symlink points, only to then read the original symlink again. labs/toollabs's take goes to great lengths to avoid such race conditions, and I assume Python has the tools for that as well.
      • self.manifests is only reset in __init__() so multiple invocations will amass an ever growing number of manifests, and deleted manifests will live forever.
  • servicemonitor.py:
    • /usr/bin/qstat -u * -xml can produce invalid XML; you need a similar fix as in … webservice2 (?).
    • It looks to me as if you are throwing away the output of webservice.

Avoiding symlink race conditions seems to be fun :)

Not complete:

  • collector.py:
    • toollog():
      • Files created in the tool's directory by root should be deletable and even modifyable by the tool due to the directory's setgid bit. So you don't need to care about the permissions.

Maybe, but I guess it is nicer to just have it be owned by the tool? But if this means I can get rid of the touch and help fix the symlink issues...

    • You need to ensure that you don't follow symlinks.
    • If you stick to the sudo approach, you could probably just open a bidirectional pipe to sudo -iu tools.something tee -a /data/project/tools.something/services.log and throw away stdout.
  • collect():
    • You check where the symlink points, only to then read the original symlink again. labs/toollabs's take goes to great lengths to avoid such race conditions, and I assume Python has the tools for that as well.

Boo, you're right. I wasn't thinking about the race at all.

    • self.manifests is only reset in __init__() so multiple invocations will amass an ever growing number of manifests, and deleted manifests will live forever.
  • servicemonitor.py:
    • /usr/bin/qstat -u * -xml can produce invalid XML; you need a similar fix as in … webservice2 (?).

Actually, webservice2 used to use qstat -j *, which produces invalid XML. -u * does not seem to produce invalid XML...

  • It looks to me as if you are throwing away the output of webservice.

I am! :D Needs error handling and what not.

One of the differences from take is that it ignores symlinks, while I don't want to.

Change 202291 had a related patch set uploaded (by Yuvipanda):
Clear manifests before every collect

https://gerrit.wikimedia.org/r/202291

Change 202291 merged by jenkins-bot:
Clear manifests before every collect

https://gerrit.wikimedia.org/r/202291

Change 202297 had a related patch set uploaded (by Yuvipanda):
Better symlink race protection

https://gerrit.wikimedia.org/r/202297

@scfc ^ should help with that race condition checking, I think. Take a look? You also have merge rights on that repo.

That looks okay to me, but my knowledge about race conditions and symlinks is based on @coren's work in take. @coren, do you want to chime in?

Unrelated I just found out that yaml.safe_load() apparently always succeeds (or that everything is YAML):

scfc@tools-bastion-01:~$ python
Python 2.7.6 (default, Mar 22 2014, 22:59:56) 
[GCC 4.8.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import yaml
>>> print repr(yaml.safe_load('key: value'))
{'key': 'value'}
>>> print repr(yaml.safe_load('some arbitrary string'))
'some arbitrary string'
>>> 
scfc@tools-bastion-01:~$

So the program needs to handle that manifest can be a string (and I would add a test suite rather sooner than later :-)).

Totally planning on doing a test suite at least for Manifest :) I am
not sure how exactly to test the other parts, however.

Also do take a look at the other open patches too :) https://gerrit.wikimedia.org/r/#/projects/operations/software/tools-manifest,dashboards/default

Everything there *should* be clear even to pythonnewbies...

@scfc: The protections to take against symlink races depend on whether it is possible to create the file or not; when it is possible that the open() call creates the file, then the simple uid check won't help as it remains possible to create a file owned by the user in an arbitrary location (which may be problematic for many places) even though it won't allow simple overwriting.

Normally, the only perfectly safe way of creating files as root in a directory where endusers have write access at all is to be careful to: use O_NOFOLLOW during the open() call; never have unsanitized user input injected into the filename; and never create a file outside of the CWD (that latter one is critical - otherwise no amount of checking your target will protect you from someone manipulating "bar" in /foo/bar/baz/the_file and doing really nasty things.

At the very least, doing an open() on the target directory, doing an fstat() to make sure the directory is suitable, doing a fchdir() to it then creating the file without a full path is necessary.

Better than all this is to drop filesystem privileges before doing any write operation when it is possible.

As a note, the code in take.cc is indeed very carefully crafted to avoid symlink shennanigans but wouldn't be quite enough to protect if it tried to create files - it suffices only because the only alteration it makes to its targets is to fchown() them.

Change 202297 merged by jenkins-bot:
Better symlink race protection

https://gerrit.wikimedia.org/r/202297

yuvipanda renamed this task from Review and productionize service manifest monitor to Review and productionize webservice manifest monitor.Apr 9 2015, 12:35 AM

Change 202978 had a related patch set uploaded (by Yuvipanda):
tools: Add role / class for tools manifest services

https://gerrit.wikimedia.org/r/202978

Change 202978 merged by Yuvipanda:
tools: Add role / class for tools manifest services

https://gerrit.wikimedia.org/r/202978

Change 202981 had a related patch set uploaded (by Yuvipanda):
tools: Make services host a submit host

https://gerrit.wikimedia.org/r/202981

Change 202981 merged by Yuvipanda:
tools: Make services host a submit host

https://gerrit.wikimedia.org/r/202981

Change 202982 had a related patch set uploaded (by Yuvipanda):
tools: Make toollabs::services inherit from toollabs

https://gerrit.wikimedia.org/r/202982

Change 202982 merged by Yuvipanda:
tools: Make toollabs::services inherit from toollabs

https://gerrit.wikimedia.org/r/202982

Wheee, it is puppetized and running on tools-services-01 now \o/

Ok, so this just needs an email to labs-l and then we're all good!

I need to write the email, and that somehow feels like a lot more work than writing all the code. Also needs documentaiton on what service manifests are now, and what they will become, etc.

yuvipanda claimed this task.