Page MenuHomePhabricator

Remove support for precise OGE exec hosts
Closed, ResolvedPublic

Description

Migrate all precise jobs on the job grid to trusty in several steps.

Announcements

Details

Timeline

  • late August 2016: Jobs started without -l release=... and webservices started with --release=precise will print a warning.
  • mid October 2016: Jobs started with jsub will no longer run on Precise hosts by default, and will instead run on Trusty hosts. This means jobs running via cron will migrate to trusty automatically.
  • October 2016-January 2017: Tools starting jobs with jsub -l release=precise will start receiving emails urging them to migrate to Trusty.
  • March 14 2017: Jobs started with jsub -l release=precise will no longer function and will crash with an error.

Progress

oge-precise-vs-trusty.png (300×600 px, 28 KB)

Event Timeline

yuvipanda raised the priority of this task from to Medium.
yuvipanda updated the task description. (Show Details)
yuvipanda added a project: Toolforge.
yuvipanda added subscribers: faidon, Aklapper, coren and 2 others.

I think this is now a reasonable step; but we probably want a month's warning or so. I'm going to make an annoucement about this - the change itself is trivial enough.

During the last discussion on this, @scfc suggested to skip trusty and go to jessie immediately. That limits the number of times users are forced to upgrade.

On the other hand, our bastions are trusty (except for precise-dev), so it's odd to run jobs on precise while everyone tests stuff on trusty...

Jessie is a no-starter while we rely on gridengine; which is going to be for a while still (k8s provides a superior alternative for many, but not all, scenarios and migration in that direction by users is going to be gradual).

Let's try this again, maybe. I'll send out a message saying this is going to happen in like a month.

un cookie licking right now

bd808 updated the task description. (Show Details)
bd808 set Security to None.

Switched default to trusty at 2016-10-26T16:48Z

With the default switched we are now in the long tail phase of prodding people who have pinned to -l release=precise to switch. We need to create a nag system that looks at the precise job runners once a week or so, makes a list of running processes, maps them to tools, and emails maintainers.

bd808 renamed this task from Make jsub / qsub default to trusty instances to Remove support for precise OGE exec hosts.Oct 27 2016, 7:15 PM
bd808 updated the task description. (Show Details)

Change 335569 had a related patch set uploaded (by BryanDavis):
Ignore lighttpd-precise in service.manifest

https://gerrit.wikimedia.org/r/335569

Change 335569 abandoned by BryanDavis:
Ignore lighttpd-precise in service.manifest

https://gerrit.wikimedia.org/r/335569

jmail (T158722) submits jobs by:

/usr/bin/qsub -N mail.$(/usr/bin/id -nu) \
        -sync y -b y -m n \
        -o "$email.out" -j y -i "$email" \
        -q mailq -l h_vmem=500M -r n \
        "$exe" "$@" >/dev/null

Currently this causes jobs like mail.tools.drtrigonbot to be run (sometimes) on Precise hosts. I believe this is not a stated requirement and just depends on which instance has the lowest load, i. e. will switch automatically to Trusty when the Precise hosts are removed, but is probably worth a look after the switch.

Change 341666 merged by jenkins-bot:
[labs/toollabs] jsub: Remove support for release=precise

https://gerrit.wikimedia.org/r/341666

Change 342061 merged by jenkins-bot:
[operations/software/tools-webservice] Remove support for Precise

https://gerrit.wikimedia.org/r/342061

Change 342161 merged by Rush:
[operations/puppet] toolschecker: remove precise checks

https://gerrit.wikimedia.org/r/342161