Page MenuHomePhabricator

Create a stretch and Son of Grid Engine grid in toolsbeta
Closed, ResolvedPublic

Description

Since it is confirmed that SGE 6 and 8 are entirely incompatible, despite working nearly exactly alike, build a parallel grid with bastion, master and exec nodes in toolsbeta so that we don't break toolforge in the experimental build phase.

This should be refactored to use current puppet standards, ideally puppetized in general, and should stay out of NFS for basic operation--partially to prevent collisions with SGE 6.2.

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Change 472743 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[operations/puppet@production] sonofgridengine: correct typo in template

https://gerrit.wikimedia.org/r/472743

Change 472743 merged by Bstorm:
[operations/puppet@production] sonofgridengine: correct typo in template

https://gerrit.wikimedia.org/r/472743

Change 473293 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[operations/puppet@production] sonofgridengine: bring k8s bastion role information into bastion profile

https://gerrit.wikimedia.org/r/473293

Change 473293 merged by Bstorm:
[operations/puppet@production] sonofgridengine: bring k8s bastion role information into bastion profile

https://gerrit.wikimedia.org/r/473293

Change 473574 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[operations/puppet@production] sonofgridengine: stretch bastions want libboost-dev

https://gerrit.wikimedia.org/r/473574

Change 473574 merged by Bstorm:
[operations/puppet@production] sonofgridengine: stretch bastions want libboost-dev

https://gerrit.wikimedia.org/r/473574

Change 473628 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[operations/puppet@production] sonofgridengine: fighting through the dependency quirks

https://gerrit.wikimedia.org/r/473628

Change 473628 merged by Bstorm:
[operations/puppet@production] sonofgridengine: fighting through the dependency quirks

https://gerrit.wikimedia.org/r/473628

Change 473641 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[operations/puppet@production] sonofgridengine: Try directly setting a docker install

https://gerrit.wikimedia.org/r/473641

Change 473641 merged by Bstorm:
[operations/puppet@production] sonofgridengine: Try directly setting a docker install

https://gerrit.wikimedia.org/r/473641

Change 473647 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[operations/puppet@production] sonofgridengine: reworking bastion for stretch and docker

https://gerrit.wikimedia.org/r/473647

Change 473647 merged by Bstorm:
[operations/puppet@production] sonofgridengine: reworking bastion for stretch and docker

https://gerrit.wikimedia.org/r/473647

Change 473788 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[operations/puppet@production] sonofgridengine: remove storage mentions from docker config for bastion

https://gerrit.wikimedia.org/r/473788

Change 473788 merged by Bstorm:
[operations/puppet@production] sonofgridengine: remove storage mentions from docker config for bastion

https://gerrit.wikimedia.org/r/473788

Change 473793 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[operations/puppet@production] sonofgridengine: remove dependency on storage part

https://gerrit.wikimedia.org/r/473793

Change 473793 merged by Bstorm:
[operations/puppet@production] sonofgridengine: remove dependency on storage part

https://gerrit.wikimedia.org/r/473793

Change 473833 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[operations/puppet@production] sonofgridengine: add cronrunner role for stretch grid

https://gerrit.wikimedia.org/r/473833

Change 473833 merged by Bstorm:
[operations/puppet@production] sonofgridengine: add cronrunner role for stretch grid

https://gerrit.wikimedia.org/r/473833

Change 474400 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[operations/puppet@production] sonofgridengine: configure grid hosts from OpenStack

https://gerrit.wikimedia.org/r/474400

Change 474400 merged by Bstorm:
[operations/puppet@production] sonofgridengine: configure grid hosts from OpenStack

https://gerrit.wikimedia.org/r/474400

Change 474755 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[operations/puppet@production] sonofgridengine: Fix up the grid_configurator script to parse exec configs

https://gerrit.wikimedia.org/r/474755

Change 474755 merged by Bstorm:
[operations/puppet@production] sonofgridengine: Fix up the grid_configurator script to parse exec configs

https://gerrit.wikimedia.org/r/474755

Change 474776 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[operations/puppet@production] sonofgridengine: observer env and the openstack client libs for SGE master

https://gerrit.wikimedia.org/r/474776

Change 474776 merged by Bstorm:
[operations/puppet@production] sonofgridengine: observer env and the openstack client libs for SGE master

https://gerrit.wikimedia.org/r/474776

Change 474795 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[operations/puppet@production] openstack client: Install python3 stuff on stretch

https://gerrit.wikimedia.org/r/474795

Change 474801 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[operations/puppet@production] sonofgridengine: fix the path of the exechosts config dir

https://gerrit.wikimedia.org/r/474801

Change 474801 merged by Bstorm:
[operations/puppet@production] sonofgridengine: fix the path of the exechosts config dir

https://gerrit.wikimedia.org/r/474801

Change 474811 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[operations/puppet@production] sonofgridengine: sonofgridengine still gives nonzero status for "none found"

https://gerrit.wikimedia.org/r/474811

Change 474811 merged by Bstorm:
[operations/puppet@production] sonofgridengine: sonofgridengine still gives nonzero status for "none found"

https://gerrit.wikimedia.org/r/474811

Change 474816 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[operations/puppet@production] sonofgridengine: fix missing instantiation

https://gerrit.wikimedia.org/r/474816

Change 474816 merged by Bstorm:
[operations/puppet@production] sonofgridengine: fix missing instantiation

https://gerrit.wikimedia.org/r/474816

Change 474827 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[operations/puppet@production] sonofgridengine: Fix tests and errors in the script

https://gerrit.wikimedia.org/r/474827

Change 474827 merged by Bstorm:
[operations/puppet@production] sonofgridengine: Fix tests and errors in the script

https://gerrit.wikimedia.org/r/474827

Change 474831 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[operations/puppet@production] sonofgridengine: Read in horrible configuration output

https://gerrit.wikimedia.org/r/474831

Change 474831 merged by Bstorm:
[operations/puppet@production] sonofgridengine: Read in horrible configuration output

https://gerrit.wikimedia.org/r/474831

Change 474925 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[operations/puppet@production] sonofgridengine: one last bug -- works now

https://gerrit.wikimedia.org/r/474925

Change 474925 merged by Bstorm:
[operations/puppet@production] sonofgridengine: one last bug -- works now

https://gerrit.wikimedia.org/r/474925

Change 474795 merged by Bstorm:
[operations/puppet@production] openstack client: Install python3 stuff on stretch

https://gerrit.wikimedia.org/r/474795

Change 474975 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[operations/puppet@production] sonofgridengine: handle when an exec host is new and lacks a config

https://gerrit.wikimedia.org/r/474975

Change 474975 merged by Bstorm:
[operations/puppet@production] sonofgridengine: handle when an exec host is new and lacks a config

https://gerrit.wikimedia.org/r/474975

Change 474996 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[operations/puppet@production] sonofgridengine: take unnecessary remove out of script

https://gerrit.wikimedia.org/r/474996

Change 474996 merged by Bstorm:
[operations/puppet@production] sonofgridengine: take unnecessary remove out of script

https://gerrit.wikimedia.org/r/474996

Change 475103 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[labs/toollabs@master] jsub: Add stretch to acceptable releases

https://gerrit.wikimedia.org/r/475103

Change 475118 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[operations/puppet@production] sonofgridengine: cronrunners need hba

https://gerrit.wikimedia.org/r/475118

Change 475118 merged by Bstorm:
[operations/puppet@production] sonofgridengine: cronrunners need hba

https://gerrit.wikimedia.org/r/475118

Change 475140 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[operations/puppet@production] sonofgridengine: correct hba manifest for this grid variant

https://gerrit.wikimedia.org/r/475140

Change 475140 merged by Bstorm:
[operations/puppet@production] sonofgridengine: correct hba manifest for this grid variant

https://gerrit.wikimedia.org/r/475140

Change 475103 merged by jenkins-bot:
[labs/toollabs@master] jsub: Make release a deprecated noop

https://gerrit.wikimedia.org/r/475103

Change 476092 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[labs/toollabs@master] jsub: add the changelog for stretch changes

https://gerrit.wikimedia.org/r/476092

Change 476092 merged by Bstorm:
[labs/toollabs@master] jsub: add the changelog for stretch changes

https://gerrit.wikimedia.org/r/476092

Change 476430 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[operations/puppet@production] sonofgridengine: set up shadow_master profile

https://gerrit.wikimedia.org/r/476430

JJMC89 added a subscriber: JJMC89.Nov 29 2018, 1:54 AM

Change 475103 merged by jenkins-bot:
[labs/toollabs@master] jsub: Make release a deprecated noop
https://gerrit.wikimedia.org/r/475103

On 475103 @bd808 wrote

there are going to be a LOT of crontabs and helper scripts that are adding release=trusty from the prior migration.

I wish this had been announced before being deployed so the cron daemon spam could be avoided.

Sorry about that, @JJMC89, I missed that the warning message would come through in cron spam.

Sorry about that, @JJMC89, I missed that the warning message would come through in cron spam.

How about we change this to be a true no-op for now and worry about getting people to actually remove the flag from their crons, etc later?

Yeah, I just found 469 cron entries with the word trusty in them. I can fix it faster right now by deploying the last version to the trusty hosts.

Once that's out, I'll try putting up a new version that doesn't produce warning output.

In fact, I just need to install it on the cron hosts. Doing that. This way, the warning will display on bastions, but not during cron runs.

Change 476572 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[labs/toollabs@master] jsub: Correct generator issue and silence warnings when not on tty

https://gerrit.wikimedia.org/r/476572

Change 476572 merged by Bstorm:
[labs/toollabs@master] jsub: Correct generator issue and silence warnings when not on tty

https://gerrit.wikimedia.org/r/476572

Change 476572 merged by Bstorm:
[labs/toollabs@master] jsub: Correct generator issue and silence warnings when not on tty
https://gerrit.wikimedia.org/r/476572

Thanks

Yeah, I just found 469 cron entries with the word trusty in them.

It can also be in .jsubrc instead of directly in cron.

Ok. I confirmed on a cron host that the logic I used in the 1.35 version of jobutils/jsub does not produce output in a cron. It should be safe to deploy across the cluster without a rising tide of spam.

Change 476430 merged by Bstorm:
[operations/puppet@production] sonofgridengine: set up shadow_master profile

https://gerrit.wikimedia.org/r/476430

@JJMC89 I've deployed the latest version of jsub to the grid. Please let me know if my guard against cron spam doesn't work. I tested it on a cron server, and it worked well for me.

Change 476902 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[operations/puppet@production] sonofgridengine: remove useless comment and fix the shadow profile

https://gerrit.wikimedia.org/r/476902

Change 476902 merged by Bstorm:
[operations/puppet@production] sonofgridengine: remove useless comment and fix the shadow profile

https://gerrit.wikimedia.org/r/476902

@JJMC89 I've deployed the latest version of jsub to the grid. Please let me know if my guard against cron spam doesn't work. I tested it on a cron server, and it worked well for me.

Confirmed. I temporarily added the option back to one of my tools and didn't get any cron spam.

Change 476914 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[operations/puppet@production] sonofgridengine: move systemd file to correct location

https://gerrit.wikimedia.org/r/476914

Change 476914 merged by Bstorm:
[operations/puppet@production] sonofgridengine: move systemd file to correct location

https://gerrit.wikimedia.org/r/476914

Change 476990 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[operations/puppet@production] sonofgridengine: correct shadow and master init issues

https://gerrit.wikimedia.org/r/476990

Change 476990 merged by Bstorm:
[operations/puppet@production] sonofgridengine: correct shadow and master init issues

https://gerrit.wikimedia.org/r/476990

Change 477002 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[operations/puppet@production] sonofgridengine: found where the unconfigurable pid file is

https://gerrit.wikimedia.org/r/477002

Change 477002 merged by Bstorm:
[operations/puppet@production] sonofgridengine: found where the unconfigurable pid file is

https://gerrit.wikimedia.org/r/477002

Ok. There is now a fully working, systemd controlled, version of a shadow master in beta. It makes me tempted to override the bad grid master init script that came with the package for the master with a similar systemd unit, but I might just leave well-enough alone for now. On Monday, I'll test failover. I have a doubt about the way our shadow master file list is arranged. In the docs, it says to put the real master at the top of the file. We do not have that in tools or toolsbeta with existing puppet stuff. Thus the need to actually test failover.

Change 477358 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[operations/puppet@production] sonofgridengine: build a correct shadow_masters file

https://gerrit.wikimedia.org/r/477358

Change 477358 merged by Bstorm:
[operations/puppet@production] sonofgridengine: build a correct shadow_masters file

https://gerrit.wikimedia.org/r/477358

Change 477413 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[operations/puppet@production] sonofgridengine: remove pointless dependency

https://gerrit.wikimedia.org/r/477413

Change 477413 merged by Bstorm:
[operations/puppet@production] sonofgridengine: remove pointless dependency

https://gerrit.wikimedia.org/r/477413

Change 477423 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[operations/puppet@production] sonofgridengine: file_line corrections

https://gerrit.wikimedia.org/r/477423

Change 477423 merged by Bstorm:
[operations/puppet@production] sonofgridengine: file_line corrections

https://gerrit.wikimedia.org/r/477423

Change 477437 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[operations/puppet@production] sonofgridengine: stop the gridengine-master service on shadow nodes

https://gerrit.wikimedia.org/r/477437

Change 477437 merged by Bstorm:
[operations/puppet@production] sonofgridengine: stop the gridengine-master service on shadow nodes

https://gerrit.wikimedia.org/r/477437

Change 477446 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[operations/puppet@production] sonofgridengine: remove weird accounting link

https://gerrit.wikimedia.org/r/477446

Change 477446 merged by Bstorm:
[operations/puppet@production] sonofgridengine: remove weird accounting link

https://gerrit.wikimedia.org/r/477446

Change 477700 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[operations/puppet@production] sonofgridengine: explicit dependency for shadow_masters file

https://gerrit.wikimedia.org/r/477700

Change 477700 merged by Bstorm:
[operations/puppet@production] sonofgridengine: explicit dependency for shadow_masters file

https://gerrit.wikimedia.org/r/477700

Bstorm closed this task as Resolved.Dec 19 2018, 3:35 PM

This is done!