Page MenuHomePhabricator

bigbrother doesn't stop
Closed, ResolvedPublic

Description

Deleting .bigbrotherrc does not stop it from attempting to restart old jobs. There is no other obvious way to stop it.

Event Timeline

Earwig raised the priority of this task from to High.
Earwig updated the task description. (Show Details)
Earwig added a project: Toolforge.
Earwig subscribed.
scfc lowered the priority of this task from High to Lowest.Apr 6 2015, 5:53 AM
scfc added subscribers: yuvipanda, scfc.

bigbrother is structured in a way that makes it almost impossible to solve this issue and similar ones (cf. T88122) without effectively rewriting it. Conveniently, @yuvipanda currently does that in T90561, but that also means that it is unlikely that someone will work on this task in parallel.

@bd808, how does the new bigbrother react to a .bigbrotherrc having been deleted?

This looks like something that did not get addressed in the rewrite. My my reading of the code, update_db will check for configuration for each running job's owner by calling read_config(owner). read_config will then:

  1. Check for an existing record in self.watchdb for the owner
    1. Exit if no homedir is found
    2. Add a basic record otherwise
  2. Check to see if the record needs to be refreshed and exit if the rcfile has been read "recently" (randomized refresh time for each record in the range 0-60m)
    1. This actually has a bug that needs to be fixed where ~/.bigbrotherrc won't be read on the first pass
  3. If no rcfile is found for a given tool then read_config will exit silently
  4. Read the rcfile and merge its contents with the existing job tracking state
    1. This step looks like it will drop old jobs that are no longer listed in the rcfile

I think the retry forever behavior could be fixed by having read_config remove all 'jobs' entries for a tool account when the rcfile is missing as well as when it is empty.

This actually has a bug that needs to be fixed where ~/.bigbrotherrc won't be read on the first pass

Poor reading of the code on my part. The exit condition is now < self.watchdb[tool]['refresh'] and in the first pass situation self.watchdb[tool]['refresh'] == now so the early exit will not fire and the rcfile will be read.

Change 330265 had a related patch set uploaded (by BryanDavis):
toollabs: bigbrother: stop tracking jobs when rcfile is deleted

https://gerrit.wikimedia.org/r/330265

Change 330265 merged by Andrew Bogott:
toollabs: bigbrother: stop tracking jobs when rcfile is deleted

https://gerrit.wikimedia.org/r/330265

Change 330265 merged by Andrew Bogott:
toollabs: bigbrother: stop tracking jobs when rcfile is deleted

https://gerrit.wikimedia.org/r/330265

@Andrew does this resolve this task then?

Andrew claimed this task.

Yes, I think this is resolved.