Page MenuHomePhabricator

Migrate pbbot from Toolforge GridEngine to Toolforge Kubernetes
Closed, ResolvedPublic


Kindly migrate your tool( from Toolforge GridEngine to Toolforge Kubernetes.

Toolforge GridEngine is getting deprecated.

Please note that a volunteer may perform this migration if this has not been done after some time.
If you have already migrated this tool, kindly mark this as resolved.

If you would rather shut down this tool, kindly do so and mark this as resolved.

Useful Resources:
Migrating Jobs from GridEngine to Kubernetes
Migrating Web Services from GridEngine to Kubernetes

Event Timeline

My apologies if this ticket comes as a surprise to you. In order to ensure WMCS can provide a stable, secure and supported platform, it’s important we migrate away from GridEngine. I want to assure you that while it is WMCS’s intention to shutdown GridEngine as outlined in the blog post, a shutdown date for GridEngine has not yet been set. The goal of the migration is to migrate as many tools as possible onto kubernetes and ensure as smooth a transition as possible for everyone. Once the majority of tools have migrated, discussion on a shutdown date is more appropriate. See T314664: Toolforge: Decomission the Grid Engine infrastructure.

As noted in some use cases are already supported by kubernetes and should be migrated. If your tool can migrate, please do plan a migration. Reach out if you need help or find you are blocked by missing features. Most of all, WMCS is here to support you.

However, it’s possible your tool needs a mixed runtime environment or some other features that aren't yet present in We’d love to hear of this or any other blocking issues so we can work with you once a migration path is ready. Thanks for your hard work as volunteers and help in this migration!

Hello, thank you for the heads-up. No surprise at all, I had already taken some initial steps toward the migration and thus found and reported a few issues about the new framework I'd like to see solved before that happens: T301901, T302211, T304421. Anyway, those are no real blockers for me, so let me analyze the matter once again to see how I can adapt my jobs. Most of them should work fine; I'm mostly worried about the "dumpWatcher" one, which needs to start jobs from within another job (T315729).

After some testing, I'm considering T317998 a blocker.

Most cron jobs are now moved to k8s and email notifications have been fixed, but I noticed that failed jobs are re-run, which is highly undesirable for me: T304893, T315114.

My cron jobs are now fully migrated to k8s. I am still inclined to using the grid for one-off jobs due to the current retry policy, but surely will go with k8s as soon as it is solved, too.