Page MenuHomePhabricator

Migrate billsbots from Toolforge GridEngine to Toolforge Kubernetes
Closed, ResolvedPublic

Description

Kindly migrate your tool(https://grid-deprecation.toolforge.org/t/billsbots) from Toolforge GridEngine to Toolforge Kubernetes.

Toolforge GridEngine is getting deprecated.
See: https://techblog.wikimedia.org/2022/03/14/toolforge-and-grid-engine/

Please note that a volunteer may perform this migration if this has not been done after some time.
If you have already migrated this tool, kindly mark this as resolved.

If you would rather shut down this tool, kindly do so and mark this as resolved.

Useful Resources:
Migrating Jobs from GridEngine to Kubernetes
https://wikitech.wikimedia.org/wiki/Help:Toolforge/Jobs_framework#Grid_Engine_migration
Migrating Web Services from GridEngine to Kubernetes
https://wikitech.wikimedia.org/wiki/News/Toolforge_Stretch_deprecation#Move_a_grid_engine_webservice
Python
https://wikitech.wikimedia.org/wiki/News/Toolforge_Stretch_deprecation#Rebuild_virtualenv_for_python_users

Event Timeline

My apologies if this ticket comes as a surprise to you. In order to ensure WMCS can provide a stable, secure and supported platform, it’s important we migrate away from GridEngine. I want to assure you that while it is WMCS’s intention to shutdown GridEngine as outlined in the blog post https://techblog.wikimedia.org/2022/03/14/toolforge-and-grid-engine/, a shutdown date for GridEngine has not yet been set. The goal of the migration is to migrate as many tools as possible onto kubernetes and ensure as smooth a transition as possible for everyone. Once the majority of tools have migrated, discussion on a shutdown date is more appropriate. See T314664: [infra] Decommission the Grid Engine infrastructure.

As noted in https://techblog.wikimedia.org/2022/03/16/toolforge-gridengine-debian-10-buster-migration/ some use cases are already supported by kubernetes and should be migrated. If your tool can migrate, please do plan a migration. Reach out if you need help or find you are blocked by missing features. Most of all, WMCS is here to support you.

However, it’s possible your tool needs a mixed runtime environment or some other features that aren't yet present in https://techblog.wikimedia.org/2022/03/18/toolforge-jobs-framework/. We’d love to hear of this or any other blocking issues so we can work with you once a migration path is ready. Thanks for your hard work as volunteers and help in this migration!

Dear komla and nskaggs,

After I was notified of this Phabricator my efforts to automate my bot using Grid Engine stopped dead in their tracks and I began working on automation using the Kubernetes "jobs framework". I've asked for help solving my problem at Wikipedia:Bots/Requests for approval/Bot1058 8#Job logs as well as at Phabricator Task T301901 but help has yet to arrive there.

I've also pinged Arturo Borrero Gonzalez, author of the "custom log files" instructions, at Help talk:Toolforge/Jobs framework.

Sorry I'm not particularly fluent in Linux; my bots so far run only on my Windows PC. There I use Windows Task Scheduler to automate jobs running on PowerShell. This decade-plus old system running on my Windows 7 PC seems light years ahead of the primitive system I have to use on the Toolforge.

If I don't get past this roadblock of being unable to set up "custom log files" I will need to either attempt to run raw Kubernetes jobs using the Kubernetes API or go back and re-start my attempt to automate my bot using the Grid Engine.

@Wbm1058 Thanks for the update. It looks like you are already involved on T301901: Allow specifying the path for log files for jobs executed on the new toolforge Jobs framework. I believe some development activity has been occurring there and you should continue to participate in that ticket. As you note, you can also develop the tool with kubernetes directly in mind. Either way, best of luck on your automation efforts!

Status update. I only started working on GridEngine on October 1 2022 with submitting simple one-off jobs using 'jsub' and never got to the point of having any fully automated jobs running on that platform, as just five days later on October 6 I got your email informing me of this Phabricator. From that point I stopped even running one-offs on GridEngine.

What I have been doing since before June 2022 is running automated bots on my own desktop PC. So this Phab might either be closed or renamed to "Migrate billsbots from Bill's PC to Toolforge Kubernetes".

I hope you're following my English Wikipedia bot request for approval (BRFA) as that's the primary place I use for reporting my progress.

In that BRFA I've twice noted that there is no automatic way to prune log files, so tool users must take care of such files growing too large.

I also expressed my disappointment that T301901 had closed without any response to my comment

Per the documentation:

Subsequent same-name job runs will append to the same files. NOTE: as of this writing there is no automatic way to prune log files, so tool users must take care of such files growing too large.

Supporting standard shell command output redirection will avoid this issue.
>> syntax says to append to the named log file, while
> says to supersede the contents of the previous file.

Using > allows me to fully automate my bot by avoiding the need for babysitting to "take care" of my log file growing too large.

I see >> in the code that JJMC89 linked to in the previous post in this thread, which is forcing this need for tending to our logs.

This was either overlooked or ignored. You have the feature allowing job logs to be suppressed entirely; why not also support superseding job logs?

But I don't know whether it's worth my trouble to create a new Phab in order to raise this question again.

Since then I've taken advantage of the --continuous feature to start a continuous job on Kubernetes. That job's log file has grown to over 32,000 KB and there are now long delays for me as I wait for it to open, watching the spinning wheel go 'round. I tried renaming the file but the continuous job just kept appending to the renamed file.

What means should I use to take care that this log does not grow too large? I suppose I could shut down the continuous job, but that kind of defeats its purpose.

What means should I use to take care that this log does not grow too large? I suppose I could shut down the continuous job, but that kind of defeats its purpose.

That is exactly what is needed today on both the Kubernetes jobs service and the legacy grid engine as well to implement log rotation.

It would be reasonable to create a Phabricator task requesting some log rotation functionality in the new jobs service system. Implementation may be tricky, but we should at least have a place to talk about the use case and what would be ideal from a general perspective. One thing that would be nice is stating if you would have a preference for file naming, rotation based on dates vs file size, rotated log retention, and compression. Writing all that out makes me think that it would be nice to hand off to something like logrotate that already knows how to deal with these variations, but I expect that implementation will be a bit trickier than just plugging together a pipeline of existing unix tools in the end.

Per the task description which says "If you have already migrated this tool, kindly mark this as resolved." I am marking this as resolved.

As I explained above, billsbots only ran on a one-off basis for a few days on Toolforge GridEngine and I have not used GridEngine at all in months and have no intention to.

billsbots are all currently running on Toolforge Kubernetes.

My only remaining issue with running on Toolforge Kubernetes is controlling the log size of my continuous jobs. I will consider creating a new Phabricator task requesting some log rotation functionality in the Toolforge Kubernetes system. In the meantime my solution is to rename my billsbots log files and periodically restart my continuous jobs.

I am struggling to find an image containing logrotate (or its alternative). I have been using /usr/sbin/logrotate to rotate log files daily.

EDIT: I brute-forced and found that mariadb contains it.

Hmm...

My only remaining issue with running on Toolforge Kubernetes is controlling the log size of my continuous jobs. I will consider creating a new Phabricator task requesting some log rotation functionality in the Toolforge Kubernetes system. In the meantime my solution is to rename my billsbots log files and periodically restart my continuous jobs.

I figured someone already did this and it was just a matter of finding it: T327165