Page MenuHomePhabricator

Migrate yapperbot from Toolforge GridEngine to Toolforge Kubernetes
Closed, ResolvedPublic

Description

Kindly migrate your tool(https://grid-deprecation.toolforge.org/t/yapperbot) from Toolforge GridEngine to Toolforge Kubernetes.

Toolforge GridEngine is getting deprecated.
See: https://techblog.wikimedia.org/2022/03/14/toolforge-and-grid-engine/

Please note that a volunteer may perform this migration if this has not been done after some time.
If you have already migrated this tool, kindly mark this as resolved.

If you would rather shut down this tool, kindly do so and mark this as resolved.

Useful Resources:
Migrating Jobs from GridEngine to Kubernetes
https://wikitech.wikimedia.org/wiki/Help:Toolforge/Jobs_framework#Grid_Engine_migration
Migrating Web Services from GridEngine to Kubernetes
https://wikitech.wikimedia.org/wiki/News/Toolforge_Stretch_deprecation#Move_a_grid_engine_webservice
Python
https://wikitech.wikimedia.org/wiki/News/Toolforge_Stretch_deprecation#Rebuild_virtualenv_for_python_users

Event Timeline

My apologies if this ticket comes as a surprise to you. In order to ensure WMCS can provide a stable, secure and supported platform, it’s important we migrate away from GridEngine. I want to assure you that while it is WMCS’s intention to shutdown GridEngine as outlined in the blog post https://techblog.wikimedia.org/2022/03/14/toolforge-and-grid-engine/, a shutdown date for GridEngine has not yet been set. The goal of the migration is to migrate as many tools as possible onto kubernetes and ensure as smooth a transition as possible for everyone. Once the majority of tools have migrated, discussion on a shutdown date is more appropriate. See T314664: [infra] Decommission the Grid Engine infrastructure.

As noted in https://techblog.wikimedia.org/2022/03/16/toolforge-gridengine-debian-10-buster-migration/ some use cases are already supported by kubernetes and should be migrated. If your tool can migrate, please do plan a migration. Reach out if you need help or find you are blocked by missing features. Most of all, WMCS is here to support you.

However, it’s possible your tool needs a mixed runtime environment or some other features that aren't yet present in https://techblog.wikimedia.org/2022/03/18/toolforge-jobs-framework/. We’d love to hear of this or any other blocking issues so we can work with you once a migration path is ready. Thanks for your hard work as volunteers and help in this migration!

Just asked on yapperbot's talk if it was down due to gridengine migration; I see it is. @Naypta are you around to respond to the comment there? Does the bot need adoption?

Nice to see you @taavi, hello to you too. That was my question! This should apparently be directed at @Naypta? My mistake. Just trying to avoid a confused-yet-avoidable limbo for this tool, which is still warmly in use at least on en:wp. :)

It seems Naypta is not around. Someone else has tentatively offered to take over bot maintenance, and could use guidance, see this wiki discussion.

Luckily these are all statically linked golang binaries, so moving them to the grid is straightforward:

tools.yapperbot@tools-sgebastion-10:~$ cat crontab.grid_stopped 
# m	h	dom	mon	dow	command
30      *	*	*	*	cd frs && jsub -mem 950m -once -cwd ./frs >/dev/null
0	18	*	*	1	cd pruner && jsub -mem 950m -cwd ./pruner
0	*	*	*	*	cd uncurrenter && jsub -mem 950m -once -cwd ./uncurrenter >/dev/null
0	*/12	*	*	*	cd wikidatable && jsub -mem 2g -cwd ./wikidatable >/dev/null
*/5	*	*	*	*	cd scantag && jsub -mem 950m -once -cwd ./scantag --sandbox >/dev/null

is now:

jobs.yml
---
- name: frs
  command: bash -c "cd frs && ./frs > /dev/null"
  image: bookworm
  schedule: "30 * * * *"
  mem: 950M
- name: pruner
  command: bash -c "cd pruner && ./pruner"
  image: bookworm
  schedule: "0 18 * * 1"
  mem: 950M
- name: uncurrenter
  command: bash -c "cd uncurrenter && ./uncurrenter > /dev/null"
  image: bookworm
  schedule: "0 * * * * "
  mem: 950M
- name: wikidatable
  command: bash -c "cd wikidatable && ./wikidatable > /dev/null"
  image: bookworm
  schedule: "0 */12 * * * "
  mem: 2G
- name: scantag
  command: bash -c "cd scantag && ./scantag --sandbox > /dev/null"
  image: bookworm
  schedule: "*/5 * * * * "
  mem: 950M

We'll see how it works in about 15 minutes. I'm just fixing this as an emergency stop-gap measure, we'll need to do an adoption request for the person that's interested in taking it over.

Hmm, something is wrong:

2024/02/21 06:30:40 Error editing user talk for Compassionate727 meant they couldn't be notified and were ignored. The error was badtoken: Invalid CSRF token.

and then a ton more of these errors. Not sure why exactly, it does look like it would panic if the authentication fails...anyways, something to look into tomorrow.

Legoktm claimed this task.
Legoktm added a subscriber: Naypta.

Seems like it fixed itself after I went to sleep. I'm going to call this resolved, we're working on a plan to have someone properly take this over, see https://en.wikipedia.org/wiki/User_talk:Yapperbot#c-David_Tornheim-20240221203800-Legoktm-20240221190500

In researching the above jobs, I looked up the "uncurrenter", originally approved as:

https://en.wikipedia.org/wiki/Wikipedia:Bots/Requests_for_approval/Yapperbot_2

It appears to have been taken over by ProcBot_10:

https://en.wikipedia.org/wiki/Wikipedia:Bots/Requests_for_approval/ProcBot_10

If that is the case, then should the job described above as:

name: uncurrenter
command: bash -c "cd uncurrenter && ./uncurrenter > /dev/null"
image: bookworm
schedule: "0 * * * * 
mem: 950M"

be disabled?