Page MenuHomePhabricator

[Epic] Write basic process-control, something good enough to run all CRM jobs.
Closed, ResolvedPublic

Description

This is urgent, blocking work so please help limit scope. This task is finished when the following features are verified working:

Job migration status:
https://docs.google.com/spreadsheets/d/1qfXaBmhW45qSbFRqgJs_zpeEt6ZIo2hrZcBtRhVXS9w/edit#gid=0

process-control

  • Ops can package and deploy the tool.
  • stdout logfiles are written one file per job run. (T161155)
  • Devs can run jobs one-off.
  • Runs jobs according to a code-generated crontab.
  • Never drop logs even (especially!) if the process is killed unexpectedly. (T161571)
  • Failmail when job exits with non-zero return code--let not perfect be thine enemy.
  • Nobody can accidentally run the script as their own user.
  • Working workaround for specific chained jobs. (T161035)

puppet

  • Jobs configuration is sync'ed to /var/lib/process-control with localsettings. Read-only.
  • Global configuration file is synced to /etc/process-control.yaml
  • Devs have sudo access to the scripts and can pass any CLI params.
  • jenkins g+ws /var/log/process-control
  • cron-generate can somehow write to /etc/cron.d/process-control

Not in MVP scope

  • Devs can kill jobs.
  • Log actions and errors to syslog. Echo to console when os.isatty()
  • script to list all jobs and statuses (T161584)
  • should be able to disable groups of jobs (T160699)
  • repeated failure handling (T161567)
  • Turn process-control lock module into a context manager (T161536)
  • Clean up deb packaging once we're on Jessie.
  • 100% test coverage coziness.

Event Timeline

awight created this task.Mar 27 2017, 10:55 PM
Restricted Application removed a project: Patch-For-Review. · View Herald TranscriptMar 27 2017, 10:55 PM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript
awight updated the task description. (Show Details)Mar 27 2017, 11:08 PM
awight updated the task description. (Show Details)Mar 27 2017, 11:12 PM
awight updated the task description. (Show Details)Mar 27 2017, 11:16 PM
awight updated the task description. (Show Details)
awight updated the task description. (Show Details)Mar 27 2017, 11:19 PM
awight updated the task description. (Show Details)Mar 28 2017, 4:54 AM

@Jgreen @cwdent
I noticed that we're only provisioning to the new CRM server and not the old one. Wasn't the plan to migrate jobs on the old server first, to minimize changes as we go? Sorry if we already discussed and I was talked out of this.

We can abandon that approach at any point, if it looks like backports stuff will cause extra work...

awight added a comment.EditedMar 28 2017, 5:09 AM

job files are being provisioned as group www-data, 640, but I don't see why the webservers should be able to read these. We could the jenkins service group just as well.

I'm not sure why the job files would be 640 yet the /etc config would be 555... looks like a default puppet mode?

awight triaged this task as High priority.Mar 28 2017, 5:15 AM

We packaged it for Precise and puppetflung it to barium last week, is
something absent?

When I looked last night, there was no /srv/p-c...

Ah, you're right--I just hadn't rsyncblastered it. It's there now.

awight updated the task description. (Show Details)Mar 29 2017, 12:32 AM
awight updated the task description. (Show Details)Mar 30 2017, 6:24 PM
awight updated the task description. (Show Details)Mar 30 2017, 6:28 PM
awight updated the task description. (Show Details)
awight updated the task description. (Show Details)Mar 30 2017, 6:45 PM
awight updated the task description. (Show Details)Mar 30 2017, 6:49 PM
awight updated the task description. (Show Details)Mar 30 2017, 6:54 PM
awight updated the task description. (Show Details)Mar 30 2017, 7:01 PM
awight updated the task description. (Show Details)Mar 30 2017, 8:53 PM
awight updated the task description. (Show Details)Mar 30 2017, 8:57 PM
awight updated the task description. (Show Details)Mar 30 2017, 9:05 PM
awight updated the task description. (Show Details)Mar 30 2017, 9:29 PM
awight updated the task description. (Show Details)Apr 4 2017, 12:20 AM
awight updated the task description. (Show Details)Apr 4 2017, 12:28 AM
awight renamed this task from [Epic] Basic process-control good enough to run all CRM jobs to [Epic] Write basic process-control, something good enough to run all CRM jobs..Apr 4 2017, 12:31 AM
Jgreen updated the task description. (Show Details)Apr 4 2017, 12:29 PM

I checked-off "cron-generate can somehow write to /etc/cron.d/process-control" -- we have a sudo wrapper /usr/local/bin/cron-generate puppetized, which uses sudo to runs /usr/bin/cron-generate as root. Also rsync_blaster will optionally trigger this from the deployment host after syncing changes.

awight added a comment.Apr 4 2017, 3:42 PM

@Jgreen
Great, thanks! I've verified that it works.

awight closed this task as Resolved.Apr 4 2017, 4:30 PM
awight claimed this task.
awight updated the task description. (Show Details)

Marking this task as done. Next step is to convert and test all the jobs.

mmodell removed a subscriber: awight.Jun 22 2017, 9:44 PM