Page MenuHomePhabricator

Migrate yifeibot from Toolforge GridEngine to Toolforge Kubernetes
Closed, ResolvedPublic

Description

Kindly migrate your tool(https://grid-deprecation.toolforge.org/t/yifeibot) from Toolforge GridEngine to Toolforge Kubernetes.

Toolforge GridEngine is getting deprecated.
See: https://techblog.wikimedia.org/2022/03/14/toolforge-and-grid-engine/

Please note that a volunteer may perform this migration if this has not been done after some time.
If you have already migrated this tool, kindly mark this as resolved.

If you would rather shut down this tool, kindly do so and mark this as resolved.

Useful Resources:
Migrating Jobs from GridEngine to Kubernetes
https://wikitech.wikimedia.org/wiki/Help:Toolforge/Jobs_framework#Grid_Engine_migration
Migrating Web Services from GridEngine to Kubernetes
https://wikitech.wikimedia.org/wiki/News/Toolforge_Stretch_deprecation#Move_a_grid_engine_webservice
Python
https://wikitech.wikimedia.org/wiki/News/Toolforge_Stretch_deprecation#Rebuild_virtualenv_for_python_users

Event Timeline

My apologies if this ticket comes as a surprise to you. In order to ensure WMCS can provide a stable, secure and supported platform, it’s important we migrate away from GridEngine. I want to assure you that while it is WMCS’s intention to shutdown GridEngine as outlined in the blog post https://techblog.wikimedia.org/2022/03/14/toolforge-and-grid-engine/, a shutdown date for GridEngine has not yet been set. The goal of the migration is to migrate as many tools as possible onto kubernetes and ensure as smooth a transition as possible for everyone. Once the majority of tools have migrated, discussion on a shutdown date is more appropriate. See T314664: [infra] Decommission the Grid Engine infrastructure.

As noted in https://techblog.wikimedia.org/2022/03/16/toolforge-gridengine-debian-10-buster-migration/ some use cases are already supported by kubernetes and should be migrated. If your tool can migrate, please do plan a migration. Reach out if you need help or find you are blocked by missing features. Most of all, WMCS is here to support you.

However, it’s possible your tool needs a mixed runtime environment or some other features that aren't yet present in https://techblog.wikimedia.org/2022/03/18/toolforge-jobs-framework/. We’d love to hear of this or any other blocking issues so we can work with you once a migration path is ready. Thanks for your hard work as volunteers and help in this migration!

Multichill added a subscriber: zhuyifei1999.

I'll handle it during this new year break.

I'll handle it during this new year break.

Thanks!

A direct conversion from the crontab (which has like ~40 entries for various bot jobs) gets a quota error. Though a lot of scripts have been dead for years so let me check which ones to keep.

Also I'm trying to spawn a currently-working job on grid via k8s. Getting No module named 'pkg_resources'. I'm suspecting the venv version is too old and needs a rebuild. Is there a supported way to spawn a shell for a given container image?

I guess you need to rebuild the python venv. Current instructions are here: https://wikitech.wikimedia.org/wiki/Help:Toolforge/Python#Kubernetes_python_jobs No shell container in the jobs framework though.

Introducing some convenience method for this venv bootstrap process is in our radar, but that's what we have today.

Did that. Used a toolforge-jobs run testshell --command "sleep infinity" --image python3.9 followed by kubectl exec -it testshell-l5qw2 -- /bin/bash

I converted the crontab to a jobs.yaml, removing quite a few ancient defunct scripts and fixed a few others, into /data/project/yifeibot/jobs.yaml. This has 30 jobs, including 7 continuous jobs.

This needs a quota increase. I deployed only the most critical jobs.

No shell container in the jobs framework though.

webservice shell works fine here too, we should probably rename the tool (T311917).

This needs a quota increase. I deployed only the most critical jobs.

Please file a task in Toolforge (Quota-requests) for that.

All deployed via tools.yifeibot@tools-sgebastion-10:~$ toolforge-jobs load jobs.yaml https://k8s-status.toolforge.org/namespaces/tool-yifeibot/ looks good to me, gonna see if any jobs start failing when they get scheduled.