Page MenuHomePhabricator

Migrate ganfilter from Toolforge GridEngine to Toolforge Kubernetes
Closed, ResolvedPublic

Description

Moving tools running on Toolforge GridEngine to Kubernetes.
See: https://techblog.wikimedia.org/2022/03/14/toolforge-and-grid-engine/
See: https://grid-deprecation.toolforge.org/t/ganfilter

Please note that a volunteer may perform this migration if this has not been done after some time.
If you have already migrated this tool, kindly mark this as resolved.

If you would rather shut down this tool, kindly do so and mark this as resolved.

Useful Resources:
Migrating Jobs from GridEngine to Kubernetes
https://wikitech.wikimedia.org/wiki/Help:Toolforge/Jobs_framework#Grid_Engine_migration
Migrating Web Services from GridEngine to Kubernetes
https://wikitech.wikimedia.org/wiki/News/Toolforge_Stretch_deprecation#Move_a_grid_engine_webservice
Python
https://wikitech.wikimedia.org/wiki/News/Toolforge_Stretch_deprecation#Rebuild_virtualenv_for_python_users

Event Timeline

Hi @coldchrist! Have you been able to give a look at migrating your tool? Is there any issues you have found I can help with?

If so, can you make your code public so I can take a look?

Remember that the grid is going to be shut off next week (14th March).

Cheers!

@dcaro, thanks for the ping -- ganfilter doesn't use a webservice, so as far as I can tell the only thing that needs to happen is to change the cron from jsub. Currently the cron command is

jsub -N ChristieBot python3 www/python/src/GANbot.py >> /data/project/ganfilter/ChristieBot.cron.log

so I think this would be

toolforge jobs run ChristieBot --command www/python/src/GANbot.py --image python3 >> /data/project/ganfilter/ChristieBot.cron.log

Does that look right? If so, I'll try that this evening. I don't know if you need to see the Python source code, but if so it's on enwiki at

https://en.wikipedia.org/wiki/User:ChristieBot/Source_code

I think that might not work out of the box, you might need something like https://wikitech.wikimedia.org/wiki/Help:Toolforge/Running_Pywikibot_scripts:

toolforge jobs run --image tool-pywikibot/pywikibot-scripts-stable:latest --mount=all --command 'sh -c "exec 1>>$TOOL_DATA_DIR/ChristieBot.out; exec 2>>$TOOL_DATA_DIR/ChristieBot.err; $TOOL_DATA_DIR/www/python/src/GANbot.py"' ChristieBot

Looking at https://en.wikipedia.org/wiki/User:ChristieBot/GANbot.py, you might need to change also:

# Config
HOME=os.environ.get('HOME') #get environment variable $HOME
replica_path=HOME + '/replica.my.cnf'
if os.path.exists(replica_path):          #check that the file is found
    config = configparser.ConfigParser()
    config.read(replica_path)
else:
    print('replica.my.cnf file not found')
    #GAN.log(gan_conn,"GANbot","Looking for replica.my.cnf","File not found")

...
gan_conn = pymysql.connections.Connection(user=config['client']['user'], password=config['client']['password'], database="s55175__ganfilter", host='tools.db.svc.eqiad.wmflabs')

For something like:

# Config from environment variables
toolsdb_user=os.environ.get('TOOL_TOOLSDB_USER')
toolsdb_password=os.environ.get('TOOL_TOOLSDB_PASSWORD')

if not toolsdb_user or not toolsdb_password:
    print('Unable to get toolsdb user and password from environment TOOL_TOOLSDB_USER and TOOL_TOOLSDB_PASSWORD')
    sys.exit(1)  # stop if we did not get user/pass

...
# the toolsdb host is now tools.db.svc.wikimedia.cloud
gan_conn = pymysql.connections.Connection(user=toolsdb_user, password=toolsdb_password, database=database, host='tools.db.svc.wikimedia.cloud')

You can see the environment variables with toolforge envvars list (https://wikitech.wikimedia.org/wiki/Help:Toolforge/Envvars_Service).

And

sys.path.append('./www/python/src') # Not needed if I run from that directory

For

from pathlib import Path
...

# using the path relative to the script, not the working directory
sys.path.append(Path(__file__).parent.absolute())

That last one will be needed also in https://en.wikipedia.org/wiki/User:ChristieBot/GA_history.py, in GA_history.py also you will have to change every function that uses config to use replica_user and replica_password, gotten from the envirnoment variables TOOL_REPLICA_USER and TOOL_REPLICA_PASSWORD, for example:

def find_incomplete_moves(cls, conn, config):

Would become:

def find_incomplete_moves(cls, conn, replica_user, replica_password):

or similar, and in the same function:

wp_conn = pymysql.connections.Connection(user=config['client']['user'], password=config['client']['password'], database=wpDatabase, host='enwiki.analytics.db.svc.wikimedia.cloud')

would become:

wp_conn = pymysql.connections.Connection(user=replica_user, password=replica_password, database=wpDatabase, host='enwiki.analytics.db.svc.wikimedia.cloud')

I'll plan on doing the switchover this weekend. Can I make the code changes first and the job changes second, or do they have to be coordinated? E.g. if I make the code changes as you suggest, and run the code from the shell, will that work, or will it have to be executed via the new job mechanism to work?

I'll plan on doing the switchover this weekend. Can I make the code changes first and the job changes second, or do they have to be coordinated? E.g. if I make the code changes as you suggest, and run the code from the shell, will that work, or will it have to be executed via the new job mechanism to work?

It will need the new framework, mainly because the environment variables don't exist in the grid, you could try having a bit more logic to handle both cases for the migration, and cleanup later, for example:

# Config
HOME=os.environ.get('HOME', '/idontexistforsure' )  #get environment variable $HOME
replica_path=HOME + '/replica.my.cnf'
if os.path.exists(replica_path):          #check that the file is found
    config = configparser.ConfigParser()
    config.read(replica_path)
    toolsdb_user=config["client"]["user"]
    toolsdb_password=config["client"]["password"]
else:
    print('replica.my.cnf file not found, checking environment')
    # Config from environment variables
    toolsdb_user=os.environ.get('TOOL_TOOLSDB_USER')
    toolsdb_password=os.environ.get('TOOL_TOOLSDB_PASSWORD')

    if not toolsdb_user or not toolsdb_password:
        print('Unable to get toolsdb user and password from environment TOOL_TOOLSDB_USER and TOOL_TOOLSDB_PASSWORD')
        sys.exit(1)  # stop if we did not get user/pass

The changes that don't directly use environment variables can be done without problems.

@dcaro, I made the code changes (just to GANbot.py so far) and tried running the submit command as you give it:

toolforge jobs run --image tool-pywikibot/pywikibot-scripts-stable:latest --mount=all --command 'sh -c "exec 1>>$TOOL_DATA_DIR/ChristieBot.out; exec 2>>$TOOL_DATA_DIR/ChristieBot.err; $TOOL_DATA_DIR/www/python/src/GANbot.py"' christiebot

I had to change ChristieBot to christiebot -- it apparently doesn't like uppercase letters in the job name.

This gives me

sh: 1: /data/project/ganfilter/www/python/src/GANbot.py: Permission denied

Is it asking for +x on the perms for the .py? That seems odd. And it already has 644 permissions so it doesn't look like a read issue. Do you know what the problem is?

@dcaro, what is much worse is that now I've reverted the code to the old version, it no longer runs. It's complaining about importlib.metadata:

Traceback (most recent call last):

File "/data/project/shared/pywikibot/stable/pywikibot/backports.py", line 204, in <module>
  import importlib.metadata as importlib_metadata

ModuleNotFoundError: No module named 'importlib.metadata'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):

File "www/python/src/GANbot.py", line 20, in <module>
  import pywikibot
File "/data/project/shared/pywikibot/stable/pywikibot/__init__.py", line 21, in <module>
  from pywikibot import config as _config
File "/data/project/shared/pywikibot/stable/pywikibot/config.py", line 60, in <module>
  from pywikibot.backports import (
File "/data/project/shared/pywikibot/stable/pywikibot/backports.py", line 206, in <module>
  import importlib_metadata

ModuleNotFoundError: No module named 'importlib_metadata'

I'm going to post at the enwiki VPT to see if there's someone there who knows what the issue is but any help would be much appreciated on this. This is a bot with high visibility. Thanks.

@dcaro, I made the code changes (just to GANbot.py so far) and tried running the submit command as you give it:

toolforge jobs run --image tool-pywikibot/pywikibot-scripts-stable:latest --mount=all --command 'sh -c "exec 1>>$TOOL_DATA_DIR/ChristieBot.out; exec 2>>$TOOL_DATA_DIR/ChristieBot.err; $TOOL_DATA_DIR/www/python/src/GANbot.py"' christiebot

I had to change ChristieBot to christiebot -- it apparently doesn't like uppercase letters in the job name.

This gives me

sh: 1: /data/project/ganfilter/www/python/src/GANbot.py: Permission denied

Is it asking for +x on the perms for the .py? That seems odd. And it already has 644 permissions so it doesn't look like a read issue. Do you know what the problem is?

Yes, you'll need execution permission or add python right before the script like python $TOOL_DATA_DIR/www/python/src/GANbot.py.

I tried that, and it's now giving me the same importlib.metadata error as the old code which I suppose at least means I have the syntax right and it's trying to run. So it's the same problem either way -- can you tell me what I need to do to fix it? Thanks

I tried that, and it's now giving me the same importlib.metadata error as the old code which I suppose at least means I have the syntax right and it's trying to run. So it's the same problem either way -- can you tell me what I need to do to fix it? Thanks

I'll have to look at it more closely, but I'm not near a laptop.

Are you sure it's complaining the same exact way? It should not be using the pywikibot installation from /data/project, but from /workspace (in the pywikibot image).

@dcaro, what is much worse is that now I've reverted the code to the old version, it no longer runs. It's complaining about importlib.metadata:

Traceback (most recent call last):
  File "/data/project/shared/pywikibot/stable/pywikibot/backports.py", line 204, in <module>
    import importlib.metadata as importlib_metadata
ModuleNotFoundError: No module named 'importlib.metadata'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "www/python/src/GANbot.py", line 20, in <module>
    import pywikibot
  File "/data/project/shared/pywikibot/stable/pywikibot/__init__.py", line 21, in <module>
    from pywikibot import config as _config
  File "/data/project/shared/pywikibot/stable/pywikibot/config.py", line 60, in <module>
    from pywikibot.backports import (
  File "/data/project/shared/pywikibot/stable/pywikibot/backports.py", line 206, in <module>
    import importlib_metadata
ModuleNotFoundError: No module named 'importlib_metadata'

I'm going to post at the enwiki VPT to see if there's someone there who knows what the issue is but any help would be much appreciated on this. This is a bot with high visibility. Thanks.

A new version of pywikibot was recently released that requires importlib_metadata for python3.7.

You are modifying PYTHONPATH for the Grid Engine, where you use the shared pywikibot files. You need to remove that for toolforge-jobs.

And it appears from the bot's logs that it actually stopped running just a few hours before, not long after midnight Pacific time. So when I began working on the upgrade from the grid engine it was already running into this problem, so it is presumably unrelated to that change.

Thanks, @JJMC89, I thought I'd responded to your post yesterday but apparently not. I removed the PYTHONPATH setting from my bash profile, and checked it is unset, and now it's complaining about pywikibot:

Traceback (most recent call last):

File "www/python/src/GANbot.py", line 20, in <module>
  import pywikibot

ModuleNotFoundError: No module named 'pywikibot'PW

PYWIKIBOT_DIR is set to /data/project/ganfilter/.pywikibot -- I tried unsetting it but that didn't help. Is it PYWIKIBOT_DIR that I need to fix? Or is it something else?

Thanks, @JJMC89, I thought I'd responded to your post yesterday but apparently not. I removed the PYTHONPATH setting from my bash profile, and checked it is unset, and now it's complaining about pywikibot:

Traceback (most recent call last):

File "www/python/src/GANbot.py", line 20, in <module>
  import pywikibot

ModuleNotFoundError: No module named 'pywikibot'PW

PYWIKIBOT_DIR is set to /data/project/ganfilter/.pywikibot -- I tried unsetting it but that didn't help. Is it PYWIKIBOT_DIR that I need to fix? Or is it something else?

Found the issue for the k8s runs, you have to add this to your script (before importing pywikibot, it turns out it's not in the PYTHONPATH, and including all the other changes I mentioned before):

import sys
from pathlib import path

# Add the pywikibot installation to the path
WORKDIR = Path(".").absolute()
sys.path.append(str(WORKDIR))

import pywikibot
...

That allows pywikibot to be imported. Then you'll have to change the command to:

toolforge jobs run --image tool-pywikibot/pywikibot-scripts-stable:latest --mount=all --filelog --command 'sh -c "git config --global --add safe.directory /workspace; python3 $TOOL_DATA_DIR/www/python/src/GANbot.py"' ganbot

Note that the exec 1>>... stuff is not needed anymore, as the support for filelog was added this week :)

I would recommend though putting all the code in a public git repository, and building your own image:

  • Adding a requirements.txt file at the top with pywikibot and any other dependencies you might have in it
  • (optional) Adding a Procfile with an entry like ganbot: ./ganbot.sh and a ganbot.sh script with python3 GANbot.py "$@"
  • Then build the image: toolforge build start https://url.to.git/repo
  • Then run the cron: toolforge run --image tool-ganfilter/tool-ganfilter:latest --command ganbot --filelog --mount=all ganbot

That will remove the need for the extra path setting, and the git config, and you'll have control over the version of pywikibot that's installed in your image.

The test I did:

tools.dcaro-test11@tools-sgebastion-10:~$ toolforge jobs run --image tool-pywikibot/pywikibot-scripts-stable:latest --mount=all --command 'sh -c "git config --global --add safe.directory /workspace; python $TOOL_DATA_DIR/test.py"' --filelog christiebot.test1

tools.dcaro-test11@tools-sgebastion-10:~$ tail -f christiebot.test1.*
==> christiebot.test1.err <==
family and mylang are not set.
Defaulting to family='wikipedia' and mylang='test'.

==> christiebot.test1.out <==
Starting!
['/data/project/dcaro-test11', '/layers/heroku_python/python/lib/python39.zip', '/layers/heroku_python/python/lib/python3.9', '/layers/heroku_python/python/lib/python3.9/lib-dynload', '/layers/heroku_python/dependencies/lib/python3.9/site-packages', '/layers/heroku_python/python/lib/python3.9/site-packages', '/workspace']
/workspace/pywikibot/__init__.py
Finished

tools.dcaro-test11@tools-sgebastion-10:~$ cat test.py 
#!/usr/bin/env python3

import os
import sys
from pathlib import Path

sys.path.append(str(Path(".").absolute()))

print("Starting!")
print(sys.path)
import pywikibot
print(pywikibot.__file__)
print("Finished")

See https://en.wikipedia.org/wiki/Wikipedia:Village_pump_(technical)#Bot_that_maintains_GA_nominations_page_is_down -- @SD0001 offered to be a maintainer and has managed to get a run to work. I've added the lines you suggested to the k8s version of the code and tried running it via the toolforge command you give but it failed on pywikibot again. I'm a bit concerned that the advice you're giving me may conflict with what SD0001 has done -- they resolved the issue with importlib.metadata, which was independent of the code changes I was making to switch from the grid. As you can probably tell I'm not very knowledgeable about the infrastructure here so I appreciate the help from both of you.

The version of the code on toolforge at the time this comment was written worked fine on the k8s backend.

I believe none of the code changes suggested above are required. Instead of using the shared pywikibot installation, I installed pywikibot via pip and ran the bot on the standard python3.11 image, not the pywikibot-specific image. This helps avoid all the weird sys path manipulation hacks to get the code to locate pywikibot.

Also, replica.my.cnf file is accessible fine in k8s. Using the toolforge envvars may be an improvement, but not essential.

OK, I've reverted to the code without dcaro's suggested changes -- that should be what you ran successfully.

Re the venv command per your comment at VPT: I only have bin, include, lib, lib64 and pyvenv.cfg in www/python/venv. I tried exiting ganfilter and become ganfilter again in case there was some initialization taking place but there's still no "activate" file in that directory that I can see. I did run the webservice command as you give it and it succeeded, but I assume that makes no difference to what I can see in the directory?

That should be source ~/www/python/venv/bin/activate, sorry.

The bot is now running under the toolforge command suggested at the VPT thread. I'll work on some other clean up issues such as using the env vars. dcaro, do you see any remaining issues that would prevent this from running once the grid shuts down next week?

The bot is now running under the toolforge command suggested at the VPT thread. I'll work on some other clean up issues such as using the env vars. dcaro, do you see any remaining issues that would prevent this from running once the grid shuts down next week?

If it's currently working under k8s (using toolforge jobs) then that's enough :)

I do recommend though the following improvements to make your tool easier to maintain (no rush with any of them, I would start with the control versioning system, to track changes):

  • Using a control versioning system, that will help people contributing to your code (it was kind of confusing for me to know what was happening or not), and keep track of the changes
  • Avoid using NFS if you don't need to, this is using your own image will put all your code in the image (including dependencies), that helps with:
    • make your tool easier to restart/run, as it will not depend on anything except the image, that same image will always work not taking into account if it's an old OS, any python library was deprecated, etc.
    • not depend on external factors (ex. pywikibot being upgraded, python being upgraded on the bastions, nfs misbehaving, ...)
    • not need to maintain a venv from the bastion (gets created for you when building the image)
    • less coupling with the flakiest service in toolforge, NFS, that benefits both your tool, and NFS, as the load on it will decrease
  • Using the provided environment variables, this helps avoid NFS too, and also not depend on hardcoded values so if they ever change, all it's needed is a reload of your tool and no code changes (ex. the current working directory/$HOME/$PWD, use $TOOL_DATA_DIR instead, database users and passwords, etc.)

I might write in the wiki about that, that might help others too

taavi subscribed.

The grid engine has been shut down, so I'm closing any remaining migration tasks as Declined. If you're still planning to migrate this tool, please re-open this task and add one or more active project tags to it. (If you need a project tag for your tool, those can be created via the Toolforge admin console.)

SD0001 changed the task status from Declined to Resolved.Mar 14 2024, 12:30 PM

This tool was migrated alright.