Page MenuHomePhabricator

Toolforge: add Debian Buster to the grid and eliminate Debian Stretch
Closed, ResolvedPublic

Description

Migrate tools/toolsbeta grid to Debian Buster since Stretch is deprecated. This is mostly just adding Buster queues, but then Stretch queues need to go away.

About the migration itself, this will be handled by (@komla):

Details

Other Assignee
taavi
SubjectRepoBranchLines +/-
operations/puppetproduction+0 -13
operations/software/tools-webservicemaster+2 -2
labs/toollabsmaster+1 -1
operations/puppetproduction+6 -6
operations/puppetproduction+2 -0
operations/puppetproduction+11 -0
labs/toollabsmaster+15 -15
operations/software/tools-webservicemaster+2 -2
operations/puppetproduction+47 -0
operations/puppetproduction+20 -0
operations/puppetproduction+8 -8
operations/cookbookswmcs+154 -3
operations/cookbookswmcs+120 -0
operations/puppetproduction+8 -9
operations/puppetproduction+1 -1
operations/puppetproduction+10 -0
operations/puppetproduction+299 -140
operations/puppetproduction+256 -151
operations/puppetproduction+57 -60
operations/puppetproduction+11 -11
operations/puppetproduction+11 -18
operations/puppetproduction+2 -0
operations/puppetproduction+0 -3
Show related patches Customize query in gerrit

Related Objects

StatusSubtypeAssignedTask
Resolved Bstorm
Resolved MoritzMuehlenhoff
Resolved Bstorm
OpenNone
ResolvedAndrew
Resolved taavi
Resolved taavi
Resolved taavi
Resolvedhashar
Resolved taavi
StalledNone
Resolvedaborrero
Resolvedaborrero
Resolvedaborrero
Resolvedaborrero
Resolvedaborrero
Resolved taavi
Resolvedaborrero
Resolved taavi
DuplicateNone
Resolved taavi
DeclinedNone
Resolvedaborrero
DeclinedNone
Resolvedaborrero
Resolved taavi
Resolved taavi
Resolved nskaggs
Declined taavi

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
aborrero changed the task status from Stalled to In Progress.Dec 15 2021, 10:37 AM
aborrero claimed this task.

Change 747503 had a related patch set uploaded (by Majavah; author: Majavah):

[operations/puppet@production] P::toolforge::grid: fix tomcat on buster

https://gerrit.wikimedia.org/r/747503

Change 747503 merged by Arturo Borrero Gonzalez:

[operations/puppet@production] P::toolforge::grid: fix tomcat on buster

https://gerrit.wikimedia.org/r/747503

Mentioned in SAL (#wikimedia-cloud) [2021-12-21T11:06:13Z] <arturo> bump quotas, instances from 50 to 55, CPU from 100 to 150, RAM from 200GB to 250GB (T277653)

Change 749221 had a related patch set uploaded (by Arturo Borrero Gonzalez; author: Arturo Borrero Gonzalez):

[operations/cookbooks@wmcs] wmcs: toolforge: grid: add cookbook to depool a node

https://gerrit.wikimedia.org/r/749221

Change 749222 had a related patch set uploaded (by Arturo Borrero Gonzalez; author: Arturo Borrero Gonzalez):

[operations/cookbooks@wmcs] wmcs: toolforge: grid: add remove instance cookbook

https://gerrit.wikimedia.org/r/749222

Change 749221 merged by Arturo Borrero Gonzalez:

[operations/cookbooks@wmcs] wmcs: toolforge: grid: add cookbook to depool a node

https://gerrit.wikimedia.org/r/749221

Change 749222 merged by Arturo Borrero Gonzalez:

[operations/cookbooks@wmcs] wmcs: toolforge: grid: add remove instance cookbook

https://gerrit.wikimedia.org/r/749222

Change 753446 had a related patch set uploaded (by Arturo Borrero Gonzalez; author: Arturo Borrero Gonzalez):

[operations/puppet@production] toolforge: grid: weblight: support debian buster

https://gerrit.wikimedia.org/r/753446

Change 753446 merged by Arturo Borrero Gonzalez:

[operations/puppet@production] toolforge: grid: weblight: support debian buster

https://gerrit.wikimedia.org/r/753446

Mentioned in SAL (#wikimedia-cloud) [2022-01-20T12:56:50Z] <arturo> scaling up the grid with 10 buster exec nodes (T277653)

Mentioned in SAL (#wikimedia-cloud) [2022-01-24T15:22:59Z] <arturo> scaling up the grid with 10 buster exec nodes (T277653)

Mentioned in SAL (#wikimedia-cloud) [2022-01-26T13:55:33Z] <arturo> scaling up the buster web grid with 5 lighttd and 2 generic nodes (T277653)

Change 757499 had a related patch set uploaded (by Arturo Borrero Gonzalez; author: Arturo Borrero Gonzalez):

[operations/puppet@production] toolforge: automated-tests: introduce check to verify default grid release

https://gerrit.wikimedia.org/r/757499

Change 757499 merged by Arturo Borrero Gonzalez:

[operations/puppet@production] toolforge: automated-tests: introduce check to verify default grid release

https://gerrit.wikimedia.org/r/757499

aborrero updated the task description. (Show Details)
aborrero updated the task description. (Show Details)

Mentioned in SAL (#wikimedia-cloud) [2022-02-09T18:56:13Z] <wm-bot> pooled 10 grid nodes tools-sgeweblight-10-[1-5],tools-sgewebgen-10-[1,2],tools-sgeexec-10-[1-10] (T277653) - cookbook ran by arturo@nostromo

Change 762811 had a related patch set uploaded (by Arturo Borrero Gonzalez; author: Arturo Borrero Gonzalez):

[operations/puppet@production] toolforge: automated-tests: introduce some TJF checks

https://gerrit.wikimedia.org/r/762811

Change 762811 merged by Arturo Borrero Gonzalez:

[operations/puppet@production] toolforge: automated-tests: introduce some TJF checks

https://gerrit.wikimedia.org/r/762811

On 1st April, emails were sent to individual maintainers who still have projects running on Stretch GridEngine.

Some have started migrating away from Stretch GridEngine.

One user sent a technical enquiry to the Cloud mailing list but the post is currently being held for moderation because of size.
Can this be reviewed?

One user sent a technical enquiry to the Cloud mailing list but the post is currently being held for moderation because of size.
Can this be reviewed?

apparently somebody already let the message in.

I am seeing services that have been migrated from Stretch grid still appearing on grid-deprecation.toolforge.org portal.
Example services: srwiki and linedwell

Does this mean the maintainers have left these services running on Stretch grid or is it in the way the data is fetched?

I am seeing services that have been migrated from Stretch grid still appearing on grid-deprecation.toolforge.org portal.
Example services: srwiki and linedwell

Does this mean the maintainers have left these services running on Stretch grid or is it in the way the data is fetched?

  • The srwiki tool's crontab does not specify a -release for either job, so it is still running its jobs on the stretch grid. I don't see any sign of that tool attempting to use the new kubernetes job service.
  • The linedwell tool's report shows its grid engine webservice job shutting down on 2022-04-03 09:32. The grid-deprecation app has some code (R2043:b18c8d8eff90: app: remove workloads migrated to Kubernetes) that tries to remove things that are now seen on running on Kubernetes, but apparently the matching logic is not currently working for webservices.

If you have other tools like this, I don't think there is any way to know if it is a tool issue or a reporting issue until someone goes an manually checks each tool.

Thanks!
I will remind maintainers to make sure they shutdown all services on Stretch grid after migration.

Some maintainers are asking for their projects(either abandoned or very old) to be shut down.
Is this something I can do? Or how do I mark such projects for shutdown?

Some maintainers are asking for their projects(either abandoned or very old) to be shut down.
Is this something I can do? Or how do I mark such projects for shutdown?

Change 779462 had a related patch set uploaded (by Majavah; author: Majavah):

[labs/toollabs@master] jsub: set buster as default release

https://gerrit.wikimedia.org/r/779462

Mentioned in SAL (#wikimedia-cloud) [2022-04-29T14:22:35Z] <andrewbogott> changing login.toolforge.org, bastion.toolforge.org, and dev.toolforge.org dns entries to refer to the new Buster bastions T277653 https://wikitech.wikimedia.org/wiki/News/Toolforge_Stretch_deprecation#Timeline

Mentioned in SAL (#wikimedia-cloud) [2022-05-30T08:24:26Z] <taavi> depool tools-sgeexec-[0901-0909] (7 nodes total) T277653

Change 801385 had a related patch set uploaded (by Majavah; author: Majavah):

[operations/puppet@production] sonofgridengine: grid_configurator: make the grid master a submit host

https://gerrit.wikimedia.org/r/801385

Change 802470 had a related patch set uploaded (by Majavah; author: Majavah):

[operations/software/tools-webservice@master] gridengine: default to buster

https://gerrit.wikimedia.org/r/802470

Change 802470 merged by jenkins-bot:

[operations/software/tools-webservice@master] gridengine: default to buster

https://gerrit.wikimedia.org/r/802470

Mentioned in SAL (#wikimedia-cloud) [2022-06-02T10:16:57Z] <taavi> publish tools-webservice 0.84 that updates the grid default from stretch to buster T277653

Change 779462 merged by jenkins-bot:

[labs/toollabs@master] jsub: set buster as default release

https://gerrit.wikimedia.org/r/779462

Mentioned in SAL (#wikimedia-cloud) [2022-06-02T11:17:36Z] <taavi> publish jobutils 1.44 that updates the grid default from stretch to buster T277653

Change 807168 had a related patch set uploaded (by Majavah; author: Majavah):

[operations/puppet@production] P:toolforge::checker: add buster endpoints

https://gerrit.wikimedia.org/r/807168

Change 807169 had a related patch set uploaded (by Majavah; author: Majavah):

[operations/puppet@production] icinga::monitor::toollabs: replace stretch with buster

https://gerrit.wikimedia.org/r/807169

Change 807170 had a related patch set uploaded (by Majavah; author: Majavah):

[operations/puppet@production] P:toolforge::checker: remove stretch endpoints

https://gerrit.wikimedia.org/r/807170

Change 807168 merged by David Caro:

[operations/puppet@production] P:toolforge::checker: add buster endpoints

https://gerrit.wikimedia.org/r/807168

Change 807182 had a related patch set uploaded (by Majavah; author: Majavah):

[operations/puppet@production] P:toolforge::checker: add missing endpoint config

https://gerrit.wikimedia.org/r/807182

Change 807184 had a related patch set uploaded (by Majavah; author: Majavah):

[operations/software/tools-webservice@master] Remove stretch support

https://gerrit.wikimedia.org/r/807184

Change 807185 had a related patch set uploaded (by Majavah; author: Majavah):

[labs/toollabs@master] jsub: Remove stretch support

https://gerrit.wikimedia.org/r/807185

Change 807182 merged by David Caro:

[operations/puppet@production] P:toolforge::checker: add missing endpoint config

https://gerrit.wikimedia.org/r/807182

Change 807169 merged by Filippo Giunchedi:

[operations/puppet@production] icinga::monitor::toollabs: replace stretch with buster

https://gerrit.wikimedia.org/r/807169

The following SGE jobs are still running on the exec grid:

tools.alchimista 8787914 tlgm
tools.ammarbot 9058295 cine_review
tools.avicbot 400742 uaaby5min
tools.avicbot 6444029 avicbotirc
tools.cewbot 3280296 cron-tools.cewbot-IRC
tools.earwigbot 9782320 earwigbot
tools.germancon-mobile 400540 germancon-mobile-parser
tools.hat-collector 3965351 go
tools.listeria 3216144 bot
tools.nokib-bot 1593642 daily
tools.patest 6882116 script_wui
tools.phetools 2742445 ws_ocr_daemon
tools.phetools 2742447 verify_match
tools.phetools 2742450 extract_text_layer
tools.sdzerobot 6303775 stream
tools.signature-checker 2297341 cron-20170515.signature_check.wikinews
tools.signature-checker 2297344 cron-20170515.signature_check.wikisource
tools.signature-checker 2297345 cron-20170515.signature_check.wikiversity
tools.signature-checker 4615899 cron-20170515.signature_check.zh
tools.signature-checker 5086089 cron-20170515.signature_check.simple
tools.signature-checker 5969401 cron-20170515.signature_check.wiktionary
tools.signature-checker 6091396 cron-20170515.signature_check.zh-classical
tools.signature-manquante-bot 4769872 signature-manquante
tools.telegram-wikilinksbot 5699376 wikilinksbot
tools.toc 1826241 cron-20170915.topic_list.ja
tools.toc 3609056 cron-20170915.topic_list.wikinews
tools.toc 3609088 cron-20170915.topic_list.wikisource
tools.toc 3609172 cron-20170915.topic_list.wikiversity
tools.toc 5086111 cron-20170915.topic_list.wiktionary
tools.toc 5287268 cron-20170915.topic_list.moegirl
tools.urbanecmbot 2811129 patrolAfterPatrol
tools.urbanecmbot 6443111 patrolSandbox
tools.wikidata-todo 2840123 wc_all
tools.wikidata-todo 2840124 wc_rc
tools.wikitanvirbot 1366931 doubredi
tools.wikitanvirbot 5512585 sand-wbbn
tools.wikitanvirbot 5512746 corona
tools.wikitanvirbot 5816767 sand-wtbn

I'm going to kill those sometime tomorrow (so 2022-06-23).

Mentioned in SAL (#wikimedia-cloud) [2022-06-23T13:59:45Z] <taavi> removing remaining continuous jobs from the stretch grid T277653

Change 807185 merged by jenkins-bot:

[labs/toollabs@master] jsub: Remove stretch support

https://gerrit.wikimedia.org/r/807185

Change 807184 merged by jenkins-bot:

[operations/software/tools-webservice@master] Remove stretch support

https://gerrit.wikimedia.org/r/807184

taavi updated Other Assignee, added: taavi.

Change 807170 merged by David Caro:

[operations/puppet@production] P:toolforge::checker: remove stretch endpoints

https://gerrit.wikimedia.org/r/807170