Page MenuHomePhabricator

Migrate urbanecmbot from Toolforge GridEngine to Toolforge Kubernetes
Closed, DeclinedPublic

Description

Kindly migrate your tool(https://grid-deprecation.toolforge.org/t/urbanecmbot) from Toolforge GridEngine to Toolforge Kubernetes.

Toolforge GridEngine is getting deprecated.
See: https://techblog.wikimedia.org/2022/03/14/toolforge-and-grid-engine/

Please note that a volunteer may perform this migration if this has not been done after some time.
If you have already migrated this tool, kindly mark this as resolved.

If you would rather shut down this tool, kindly do so and mark this as resolved.

Useful Resources:
Migrating Jobs from GridEngine to Kubernetes
https://wikitech.wikimedia.org/wiki/Help:Toolforge/Jobs_framework#Grid_Engine_migration
Migrating Web Services from GridEngine to Kubernetes
https://wikitech.wikimedia.org/wiki/News/Toolforge_Stretch_deprecation#Move_a_grid_engine_webservice
Python
https://wikitech.wikimedia.org/wiki/News/Toolforge_Stretch_deprecation#Rebuild_virtualenv_for_python_users

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

My apologies if this ticket comes as a surprise to you. In order to ensure WMCS can provide a stable, secure and supported platform, it’s important we migrate away from GridEngine. I want to assure you that while it is WMCS’s intention to shutdown GridEngine as outlined in the blog post https://techblog.wikimedia.org/2022/03/14/toolforge-and-grid-engine/, a shutdown date for GridEngine has not yet been set. The goal of the migration is to migrate as many tools as possible onto kubernetes and ensure as smooth a transition as possible for everyone. Once the majority of tools have migrated, discussion on a shutdown date is more appropriate. See T314664: [infra] Decommission the Grid Engine infrastructure.

As noted in https://techblog.wikimedia.org/2022/03/16/toolforge-gridengine-debian-10-buster-migration/ some use cases are already supported by kubernetes and should be migrated. If your tool can migrate, please do plan a migration. Reach out if you need help or find you are blocked by missing features. Most of all, WMCS is here to support you.

However, it’s possible your tool needs a mixed runtime environment or some other features that aren't yet present in https://techblog.wikimedia.org/2022/03/18/toolforge-jobs-framework/. We’d love to hear of this or any other blocking issues so we can work with you once a migration path is ready. Thanks for your hard work as volunteers and help in this migration!

Hello all,

urbanecmbot currently uses the recommended Pywikibot approach from Wikitech, ie. using the shared copy in /data/project/shared. Unfortunately, the usage described at the Wikitech page does not work, because requests is not installed by default:

tools.urbanecmbot@tools-sgebastion-10 ~
$ toolforge-jobs run test-py39-sleep --command 'sleep 3600' --image tf-python39 --no-filelog
tools.urbanecmbot@tools-sgebastion-10 ~
$ kubectl get pods
NAME                    READY   STATUS              RESTARTS   AGE
test-py39-sleep-q26g6   0/1     ContainerCreating   0          3s
tools.urbanecmbot@tools-sgebastion-10 ~
$ kubectl exec -it test-py39-sleep-q26g6 -- bash
tools.urbanecmbot@test-py39-sleep-q26g6:~$ . .profile
tools.urbanecmbot@test-py39-sleep-q26g6 ~
$ echo $PYTHONPATH
/data/project/shared/pywikipedia/core:/data/project/shared/pywikipedia/core/externals/httplib2:/data/project/shared/pywikipedia/core/scripts
tools.urbanecmbot@test-py39-sleep-q26g6 ~
$ python3
Python 3.9.2 (default, Feb 28 2021, 17:03:44)
[GCC 10.2.1 20210110] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pywikibot
/data/project/shared/pywikipedia/core/pywikibot/config.py:1031: _ConfigurationDeprecationWarning:
"db_hostname" present in our user-config.py is no longer a supported
configuration variable and should be removed. Please inform the
maintainers if you depend on it.
  warn('\n' + fill(DEPRECATED_VARIABLE.format(name)),
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/data/project/shared/pywikipedia/core/pywikibot/__init__.py", line 43, in <module>
    from pywikibot.bot import (
  File "/data/project/shared/pywikipedia/core/pywikibot/bot.py", line 112, in <module>
    from pywikibot import config, daemonize, i18n, version
  File "/data/project/shared/pywikipedia/core/pywikibot/version.py", line 26, in <module>
    from pywikibot.comms.http import fetch
  File "/data/project/shared/pywikipedia/core/pywikibot/comms/http.py", line 44, in <module>
    import requests
ModuleNotFoundError: No module named 'requests'
>>>

I know that the WMCS team is not fond of installing Python packages by default, but...can we make an exception for package(s) required by Pywikibot in the shared copy setup? It'd make converting my bot much easier. Without the required packages, the shared copy of the source is more or less useless (as it'd be easier to install pywikibot in the venv that'd have to exist anyway). Thanks!

I know that the WMCS team is not fond of installing Python packages by default, but...can we make an exception for package(s) required by Pywikibot in the shared copy setup? It'd make converting my bot much easier. Without the required packages, the shared copy of the source is more or less useless (as it'd be easier to install pywikibot in the venv that'd have to exist anyway). Thanks!

The tf-python39 image contains no explicitly installed Python library packages at all. The prior "recommended" instructions for Pywikbot that you reference have been rewritten to no longer recommend that prior grid engine centered approach. There may still be things that can be worked on to help make the newer recommendations easier for folks, so please do take a look and let us know if there are still strong blockers for your adoption.

Change 921663 had a related patch set uploaded (by Urbanecm; author: Urbanecm):

[labs/tools/urbanecmbot@master] Prepare for k8s migration

https://gerrit.wikimedia.org/r/921663

Change 921669 had a related patch set uploaded (by Urbanecm; author: Urbanecm):

[labs/tools/urbanecmbot@master] Add jobs.yaml

https://gerrit.wikimedia.org/r/921669

Change 921670 had a related patch set uploaded (by Urbanecm; author: Urbanecm):

[labs/tools/urbanecmbot@master] Migrate continuous jobs

https://gerrit.wikimedia.org/r/921670

Change 921671 had a related patch set uploaded (by Urbanecm; author: Urbanecm):

[labs/tools/urbanecmbot@master] Remove cron-related files

https://gerrit.wikimedia.org/r/921671

Change 921663 merged by jenkins-bot:

[labs/tools/urbanecmbot@master] Prepare for k8s migration

https://gerrit.wikimedia.org/r/921663

Change 921669 merged by jenkins-bot:

[labs/tools/urbanecmbot@master] Add jobs.yaml

https://gerrit.wikimedia.org/r/921669

Change 921670 merged by jenkins-bot:

[labs/tools/urbanecmbot@master] Migrate continuous jobs

https://gerrit.wikimedia.org/r/921670

Change 921671 merged by jenkins-bot:

[labs/tools/urbanecmbot@master] Remove cron-related files

https://gerrit.wikimedia.org/r/921671

Change 921675 had a related patch set uploaded (by Urbanecm; author: Urbanecm):

[labs/tools/urbanecmbot@master] standardization: load venv earlier

https://gerrit.wikimedia.org/r/921675

Change 921675 merged by jenkins-bot:

[labs/tools/urbanecmbot@master] standardization: load venv earlier

https://gerrit.wikimedia.org/r/921675

Change 921689 had a related patch set uploaded (by Urbanecm; author: Urbanecm):

[labs/tools/urbanecmbot@master] patrolTrusted: Make the job work on k8s

https://gerrit.wikimedia.org/r/921689

Change 921690 had a related patch set uploaded (by Urbanecm; author: Urbanecm):

[labs/tools/urbanecmbot@master] Install sseclient

https://gerrit.wikimedia.org/r/921690

Change 921691 had a related patch set uploaded (by Urbanecm; author: Urbanecm):

[labs/tools/urbanecmbot@master] jobs.yaml: Make jobs run via venv

https://gerrit.wikimedia.org/r/921691

Change 921689 merged by Urbanecm:

[labs/tools/urbanecmbot@master] patrolTrusted: Make the job work on k8s

https://gerrit.wikimedia.org/r/921689

Change 921690 merged by Urbanecm:

[labs/tools/urbanecmbot@master] Install sseclient

https://gerrit.wikimedia.org/r/921690

Change 921691 merged by Urbanecm:

[labs/tools/urbanecmbot@master] jobs.yaml: Make jobs run via venv

https://gerrit.wikimedia.org/r/921691

During the Hackathon, I worked on this task a bit. There are few outstanding things left:

Change 923528 had a related patch set uploaded (by Urbanecm; author: Urbanecm):

[labs/tools/urbanecmbot@master] [k8s] Fix several scripts

https://gerrit.wikimedia.org/r/923528

Change 923528 merged by jenkins-bot:

[labs/tools/urbanecmbot@master] [k8s] Fix several scripts

https://gerrit.wikimedia.org/r/923528

@Urbanecm Hi! I don't see any processes on the grid anymore, if you finished migrating, can you close the task as resolved?

If not, is there anything we can help with? (we will be turning off all grid tools that did not ask for an extension tomorrow)

Hi @dcaro, thanks for asking. I'm having troubles with migrating the webservice. I have a mix of Python CGI scripts and generic files to host under the urbanecmbot URL, and for certain reasons (such as, dependency of the Android app on the exact URI), I can't really move the Python part to a different tool and host it via the k8s webservice. I've been using the lighttpd grid service, which allowed the use of Python CGI scripts as well as other scripts, but I can't reach the same effect at k8s.

Currently, the https://urbanecmbot.toolforge.org/ webservice is fully down, and I'm struggling to get it up again. Can you help, please?

@Urbanecm I think there was a typo on your .lighttpd.conf file, an extra =:

tools.urbanecmbot@tools-sgebastion-10 ~ 
$ diff .lighttpd.conf .lighttpd.conf.old 
4c4
< $HTTP["url"] =~ "^/commonsmisc/config.yaml" {
---
> $HTTP["url"] =~ = "^/commonsmisc/config.yaml" {

I fixed that, and then it let me start in on the grid:

tools.urbanecmbot@tools-sgebastion-10 ~ 
$ toolforge webservice --backend=gridengine lighttpd start  
...

tools.urbanecmbot@tools-sgebastion-10 ~ 
$ toolforge webservice status
Your webservice of type lighttpd is running on backend gridengine

You'll have to verify that everything works as you expect though :)

Thanks @dcaro! I was figuring out how to make this work in k8s instead, and it was just a typo.

The question still stands though: assuming the grid engine is going to go away, how would I keep the tool working there? Or does this qualify for some sort of extension, as you mentioned in your prior message?

Thanks @dcaro! I was figuring out how to make this work in k8s instead, and it was just a typo.

The question still stands though: assuming the grid engine is going to go away, how would I keep the tool working there? Or does this qualify for some sort of extension, as you mentioned in your prior message?

We can extend it for up to a month yes, you are clearly working on moving to k8s :), it's a hard limit though.

On how to run in on k8s, I see that you have several things mixed up into the public_html:

tools.urbanecmbot@tools-sgebastion-10 ~ 
$ ls -la public_html/
total 136
drwxrwsr-x  6 tools.urbanecmbot tools.urbanecmbot  4096 Sep 15 19:45 .
drwxrwsr-x 39 tools.urbanecmbot tools.urbanecmbot 12288 Feb 13 15:32 ..
lrwxrwxrwx  1 tools.urbanecmbot tools.urbanecmbot    37 May 21  2017 commonsmisc -> /data/project/urbanecmbot/commonsmisc
drwxr-sr-x  2 tools.urbanecmbot tools.urbanecmbot 69632 Feb 14 07:00 deleted-everywhere
drwxr-sr-x  2 tools.urbanecmbot tools.urbanecmbot  4096 Feb 10  2020 exps
lrwxrwxrwx  1 tools.urbanecmbot tools.urbanecmbot    18 Aug 25  2018 heritage -> ../heritage/public
-rw-rw----  1 tools.urbanecmbot tools.urbanecmbot 35141 May  9  2016 LICENSE.txt
drwxr-sr-x  2 tools.urbanecmbot tools.urbanecmbot  4096 Oct 26  2021 mark-students
drwxr-sr-x  4 tools.urbanecmbot tools.urbanecmbot  4096 Mar 11  2020 stats
lrwxrwxrwx  1 tools.urbanecmbot tools.urbanecmbot    37 Jan 18  2016 test -> /data/project/urbanecmbot/test/public
lrwxrwxrwx  1 tools.urbanecmbot tools.urbanecmbot    15 Nov 20  2017 uploadsbyuser -> ../commonsmisc/
lrwxrwxrwx  1 tools.urbanecmbot tools.urbanecmbot    17 May 16  2018 wikipediamisc -> ../wikipediamisc/

Some of them are small redirects using php (stats/), but most of them seem like independent python applications (wikipediamisc, upleadsbuyuser, heritage, commonsmisc).

It feels like it's a mix of several tools :), if that's the case, you can try moving the biggest chunks to their own tool, and probably use the build-service for that.

Can you give me more details on how is the tool used?
Do all the things be bound together in the same page?
Can some of the subdirectories (specially, the ones that are git repositories) be split off to their own tools?

Thanks @dcaro! I was figuring out how to make this work in k8s instead, and it was just a typo.

The question still stands though: assuming the grid engine is going to go away, how would I keep the tool working there? Or does this qualify for some sort of extension, as you mentioned in your prior message?

We can extend it for up to a month yes, you are clearly working on moving to k8s :), it's a hard limit though.

I guess that means it's a very good idea to discuss how to do the migration :).

Some of them are small redirects using php (stats/), but most of them seem like independent python applications (wikipediamisc, upleadsbuyuser, heritage, commonsmisc).

It feels like it's a mix of several tools :), if that's the case, you can try moving the biggest chunks to their own tool, and probably use the build-service for that.

That's a fairly accurate impression. The issue is that the URL is hardcoded in some places already, so the current URI needs to continue working. This is most pressing for the Wikipedia mobile app, which talks to the commonsmisc "subtool" to calculate certain items. Since it is difficult to get Android users to upgrade their apps, the URI needs to continue working one way or another.

Can you give me more details on how is the tool used?
Do all the things be bound together in the same page?

See above.

Can some of the subdirectories (specially, the ones that are git repositories) be split off to their own tools?

Possibly, if we can keep the URI working :).

That's a fairly accurate impression. The issue is that the URL is hardcoded in some places already, so the current URI needs to continue working. This is most pressing for the Wikipedia mobile app, which talks to the commonsmisc "subtool" to calculate certain items. Since it is difficult to get Android users to upgrade their apps, the URI needs to continue working one way or another.

Hmm, so we could keep the commonsmisc on this tool (so it retains the URL) and move the others to other tools then right?

You can start with any of the others, and we can go chopping them off bit by bit until only commonsmisc is left, that'd make it way easier to migrate.

Reminder that next week (14th March) the grid will go down for good, any tools that have not migrated will stop working. Let me know if you need some help/guidance or similar.

If you do, it would help greatly to move the code to a public repo (ex. gitlab.wikimedia.org) so I can better help with MRs/reviews if needed.

taavi subscribed.

The grid engine has been shut down, so I'm closing any remaining migration tasks as Declined. If you're still planning to migrate this tool, please re-open this task and add one or more active project tags to it. (If you need a project tag for your tool, those can be created via the Toolforge admin console.)