Migrate all jobs to labs slaves
Closed, ResolvedPublic

Description

The architecture for the disposable VMs (T47499) will have Zuul mergers in the labs infrastructure. None of the production slave (gallium, lanthanum) will be able to reach them due to security constraints.

Thus we need to migrate all jobs tied to productionSlaves label to labs instance (contintLabSlaves).

Tied to gallium

To keep on gallium:

  • integration-docroot-deploy Keep, needs gallium for web server
  • publish-doc Keep, needs gallium for web server

To be moved:

  • jsduck - T86175
    • mediawiki-core-jsduck
    • mediawiki-core-jsduck-publish
    • oojs-core-jsduck
    • oojs-core-jsduck-publish
    • oojs-ui-jsduck
    • oojs-ui-jsduck-publish
    • unicodejs-jsduck
    • unicodejs-jsduck-publish - https://gerrit.wikimedia.org/r/187798
    • VisualEditor-jsduck
    • mwext-GuidedTour-jsduck
    • mwext-GuidedTour-jsduck-publish
    • mwext-MultimediaViewer-jsduck
    • mwext-MultimediaViewer-jsduck-publish
    • mwext-VisualEditor-jsduck
    • mwext-VisualEditor-jsduck-publish

Tied to production slaves (gallium or lanthanum):

  • jobs which have no node: stanza in JJB configuration file. Important

Jobs roaming on any slaves

ssh gallium.wikimedia.org "grep -l '<canRoam>true</canRoam>'  /var/lib/jenkins/jobs/*/config.xml | cut -d\/ -f6"

DONE as of July 3rd 2015.

Search jobs: https://gerrit.wikimedia.org/r/217507

DONE

  • search-extra
  • search-extra-javadoc
  • search-extra-javadoc-publish
  • search-highlighter
  • search-repository-swift

Related Objects

There are a very large number of changes, so older changes are hidden. Show Older Changes
hashar edited the task description. (Show Details)Jan 16 2015, 11:04 AM
hashar triaged this task as "Normal" priority.Feb 10 2015, 8:57 PM
Krinkle edited the task description. (Show Details)Mar 3 2015, 11:04 PM
Krinkle edited the task description. (Show Details)
Krinkle changed the title from "Migrate all jobs depending on Zuul git repos out of production slaves" to "Migrate all jobs to labs slaves".

Change 198655 had a related patch set uploaded (by Krinkle):
Migrate mobile-qunit jobs to new CI slaves in labs

https://gerrit.wikimedia.org/r/198655

Change 198655 merged by jenkins-bot:
Migrate mobile-qunit jobs to new CI slaves in labs

https://gerrit.wikimedia.org/r/198655

Krinkle edited the task description. (Show Details)Mar 25 2015, 1:18 AM
hashar edited the task description. (Show Details)Apr 10 2015, 10:22 PM

Change 204706 had a related patch set uploaded (by Legoktm):
Use generic phpunit job for operations/mediawiki-config

https://gerrit.wikimedia.org/r/204706

Change 204707 had a related patch set uploaded (by Legoktm):
Pin generic 'phpunit' job to labs slaves

https://gerrit.wikimedia.org/r/204707

Change 204706 merged by jenkins-bot:
Use generic phpunit job for operations/mediawiki-config

https://gerrit.wikimedia.org/r/204706

Change 204707 merged by jenkins-bot:
Pin generic 'phpunit' job to labs slaves

https://gerrit.wikimedia.org/r/204707

Legoktm edited the task description. (Show Details)Apr 18 2015, 1:23 AM

Change 204980 had a related patch set uploaded (by Legoktm):
Convert 'mediawiki-vagrant-puppet-doc' job to run on a labs slave

https://gerrit.wikimedia.org/r/204980

Change 204982 had a related patch set uploaded (by Legoktm):
Convert 'operations-puppet-doc' job to run on a labs slave

https://gerrit.wikimedia.org/r/204982

hashar edited the task description. (Show Details)Apr 20 2015, 1:35 PM

Change 204980 merged by jenkins-bot:
Convert 'mediawiki-vagrant-puppet-doc' job to run on a labs slave

https://gerrit.wikimedia.org/r/204980

hashar edited the task description. (Show Details)Apr 22 2015, 1:35 PM
hashar edited the task description. (Show Details)May 4 2015, 1:33 PM

I have added some jobs that do not have any node: and hence roam on any of our slaves (the Jenkins XML has: canRoam>true</canRoam>).

hashar edited the task description. (Show Details)Jun 8 2015, 11:46 AM

Change 217507 had a related patch set uploaded (by Hashar):
Tie search* jobs to labs and Trusty

https://gerrit.wikimedia.org/r/217507

hashar edited the task description. (Show Details)Jun 11 2015, 1:31 PM

Change 217507 merged by jenkins-bot:
Tie search* jobs to labs and Trusty

https://gerrit.wikimedia.org/r/217507

Change 217508 had a related patch set uploaded (by Hashar):
Phase out wikimedia/bugzilla/wikibugs

https://gerrit.wikimedia.org/r/217508

hashar edited the task description. (Show Details)Jun 11 2015, 1:35 PM

Change 217508 merged by jenkins-bot:
Phase out wikimedia/bugzilla/wikibugs

https://gerrit.wikimedia.org/r/217508

Change 217509 had a related patch set uploaded (by Hashar):
Migrate *gembuild to Precise labs

https://gerrit.wikimedia.org/r/217509

hashar edited the task description. (Show Details)Jun 11 2015, 1:44 PM

Change 217509 merged by jenkins-bot:
Migrate *gembuild to Precise labs

https://gerrit.wikimedia.org/r/217509

hashar edited the task description. (Show Details)Jul 3 2015, 1:12 PM
hashar edited the task description. (Show Details)Jul 3 2015, 1:24 PM
hashar edited the task description. (Show Details)Jul 3 2015, 1:31 PM
hashar added a comment.Jul 3 2015, 2:20 PM

I believe nothing will run on lanthanum.eqiad.wmnet anymore. What is left to migrate are the jobs currently running on gallium. Some might be migrated to labs, some not.

Krinkle edited the task description. (Show Details)Jul 6 2015, 8:43 PM
greg edited the task description. (Show Details)Aug 22 2015, 12:48 AM
hashar lowered the priority of this task from "Normal" to "Low".

Almost completed. There is still a few jobs on gallium.wikimedia.org though.

Change 204982 merged by jenkins-bot:
Convert 'operations-puppet-doc' job to run on a labs slave

https://gerrit.wikimedia.org/r/204982

hashar edited the task description. (Show Details)Oct 27 2015, 11:24 AM
hashar added a comment.EditedOct 28 2015, 1:59 PM

Status as October 28th 2015

$ ssh gallium.wikimedia.org ls -1 /srv/ssd/jenkins-slave/workspace
integration-docroot-deploy
mediawiki-vagrant-puppet-doc
mwext-VisualEditor-sync-gerrit
publish-on-gallium

I suppose we could turn the integration-docroot-deploy into a rsync job like the doc ones? Is it worth it?

I'm guessing mediawiki-vagrant-puppet-doc is straightforward, similar to the ops puppet one.

Is it even possible to move publish-on-gallium onto labs?

Paladox edited the task description. (Show Details)Feb 29 2016, 3:29 PM
Stashbot added a subscriber: Stashbot.

Mentioned in SAL [2016-09-08T10:02:05Z] <hashar> Delete mwext-VisualEditor-sync-gerrit job, already got removed by ostriches in 139d17c8f1c4bcf2bb761e13a6501e4d85684066 . The issue in Gerrit (T51846) has been fixed. Poke T86659 , one less job on slaves.

hashar edited the task description. (Show Details)Sep 8 2016, 10:02 AM

Mentioned in SAL [2016-09-08T10:03:29Z] <hashar> Delete Jenkins job https://integration.wikimedia.org/ci/job/mwext-VisualEditor-sync-gerrit/ that has been left behind. It is no more needed. T51846 T86659

This is almost finished we just need to do

jobs which have no node: stanza in JJB configuration file. Important

now.

Mentioned in SAL (#wikimedia-releng) [2016-11-24T20:49:59Z] <hashar> make contint1001 Jenkins slave to only builds jobs with a label matching the node https://integration.wikimedia.org/ci/computer/contint1001/configure T86659

hashar closed this task as "Resolved".Nov 24 2016, 8:51 PM
hashar claimed this task.

Paladox pointed:

jobs which have no node: stanza in JJB configuration file. Important

I have changed contint1001 slave to:

Only build jobs with label restrictions matching this node

In this mode, Jenkins will only build a project on this node when that project is restricted to certain nodes using a label expression, and that expression matches this node's name and/or labels. This allows a slave to be reserved for certain kinds of jobs. For example, to run performance tests continuously from Jenkins, you can use this setting with # of executors as 1, so that only one performance test runs at any given time, and that one executor won't be blocked by other builds that can be done on other nodes.

This way it only runs jobs that have node: contint1001 :]

The rest of jobs running on the production machine contint1001 are related to publish artifacts for doc.wikimedia.org. They will be moved to another host eventually via another task (cant find it).