Page MenuHomePhabricator

Migrate mix-n-match from Toolforge GridEngine to Toolforge Kubernetes
Closed, ResolvedPublic

Description

Kindly migrate your tool(https://grid-deprecation.toolforge.org/t/mix-n-match) from Toolforge GridEngine to Toolforge Kubernetes.

Toolforge GridEngine is getting deprecated.
See: https://techblog.wikimedia.org/2022/03/14/toolforge-and-grid-engine/

Please note that a volunteer may perform this migration if this has not been done after some time.
If you have already migrated this tool, kindly mark this as resolved.

If you would rather shut down this tool, kindly do so and mark this as resolved.

Useful Resources:
Migrating Jobs from GridEngine to Kubernetes
https://wikitech.wikimedia.org/wiki/Help:Toolforge/Jobs_framework#Grid_Engine_migration
Migrating Web Services from GridEngine to Kubernetes
https://wikitech.wikimedia.org/wiki/News/Toolforge_Stretch_deprecation#Move_a_grid_engine_webservice
Python
https://wikitech.wikimedia.org/wiki/News/Toolforge_Stretch_deprecation#Rebuild_virtualenv_for_python_users

Event Timeline

My apologies if this ticket comes as a surprise to you. In order to ensure WMCS can provide a stable, secure and supported platform, it’s important we migrate away from GridEngine. I want to assure you that while it is WMCS’s intention to shutdown GridEngine as outlined in the blog post https://techblog.wikimedia.org/2022/03/14/toolforge-and-grid-engine/, a shutdown date for GridEngine has not yet been set. The goal of the migration is to migrate as many tools as possible onto kubernetes and ensure as smooth a transition as possible for everyone. Once the majority of tools have migrated, discussion on a shutdown date is more appropriate. See T314664: [infra] Decommission the Grid Engine infrastructure.

As noted in https://techblog.wikimedia.org/2022/03/16/toolforge-gridengine-debian-10-buster-migration/ some use cases are already supported by kubernetes and should be migrated. If your tool can migrate, please do plan a migration. Reach out if you need help or find you are blocked by missing features. Most of all, WMCS is here to support you.

However, it’s possible your tool needs a mixed runtime environment or some other features that aren't yet present in https://techblog.wikimedia.org/2022/03/18/toolforge-jobs-framework/. We’d love to hear of this or any other blocking issues so we can work with you once a migration path is ready. Thanks for your hard work as volunteers and help in this migration!

Hi, as you may know, I am a massive user of the toolforge environment. I am also an eager early adopter of new technologies. I will use this ticket (and not the other dozen or so I got for other tools) for some feedback, until it's specific to other tools.

For this particular tool, I have 33 (more, but not all are active) cronjobs that need migrating. I have used regexps and manual editing to convert the cronfile to a shell script setting up the schedules. A few points:

  • Many of these jobs work on "catalogs" (lists of entries) of vastly different length, from a few dozen to millions. This often (to some degree at least) correlates with memory use. Therefore, I have to pick a high memory value, otherwise larger catalogs never get processed. I average ~3Gi per job, which eats very quickly through my "allowance".
  • One of these jobs is a "wrapper", which on the grid engine ran every minute (!) to pick a new job from a to-do-list, and execute it. Usually, ~5 of these ran in parallel on the grid engine. As I understand, this is no longer possible with the toolforge-jobs utility, as only one job "name" can run at any given time. I can think of other ways to do this, but it involves some major rewriting, unless you have a better idea (yes, I can keep using grid engine for this, and I might, but I'd rather not). Even without the name limit, this would eat up my available memory quickly, and block other jobs from running. This is a major step backwards from grid engine for me!
  • FYI, many of the above jobs are low-CPU, but spend a lot of time with I/O (http requests mostly).
  • This tool is about the worst when it comes to cronjobs, so the others will be easier...

Update: I migrated most of the cronjobs (except two) to kubernetes. Best I can do for now, I'll have a look at the remaining ones again once I migrate all my other tools.

Moved all the jobs now except one, which depends on mysqldump, but that command is not in tf-bullseye-std. Is there any way to access that command from k8s?

Never mind, I just copied the binary to the tools directory and run it from there.

All my tools should now be running jobs without the grid, k8s only.