Page MenuHomePhabricator

Migrate dplbot from Toolforge GridEngine to Toolforge Kubernetes
Closed, ResolvedPublic

Description

Kindly migrate your tool(https://grid-deprecation.toolforge.org/t/dplbot) from Toolforge GridEngine to Toolforge Kubernetes.

Toolforge GridEngine is getting deprecated.
See: https://techblog.wikimedia.org/2022/03/14/toolforge-and-grid-engine/

Please note that a volunteer may perform this migration if this has not been done after some time.
If you have already migrated this tool, kindly mark this as resolved.

If you would rather shut down this tool, kindly do so and mark this as resolved.

Useful Resources:
Migrating Jobs from GridEngine to Kubernetes
https://wikitech.wikimedia.org/wiki/Help:Toolforge/Jobs_framework#Grid_Engine_migration
Migrating Web Services from GridEngine to Kubernetes
https://wikitech.wikimedia.org/wiki/News/Toolforge_Stretch_deprecation#Move_a_grid_engine_webservice
Python
https://wikitech.wikimedia.org/wiki/News/Toolforge_Stretch_deprecation#Rebuild_virtualenv_for_python_users

Event Timeline

My apologies if this ticket comes as a surprise to you. In order to ensure WMCS can provide a stable, secure and supported platform, it’s important we migrate away from GridEngine. I want to assure you that while it is WMCS’s intention to shutdown GridEngine as outlined in the blog post https://techblog.wikimedia.org/2022/03/14/toolforge-and-grid-engine/, a shutdown date for GridEngine has not yet been set. The goal of the migration is to migrate as many tools as possible onto kubernetes and ensure as smooth a transition as possible for everyone. Once the majority of tools have migrated, discussion on a shutdown date is more appropriate. See T314664: [infra] Decommission the Grid Engine infrastructure.

As noted in https://techblog.wikimedia.org/2022/03/16/toolforge-gridengine-debian-10-buster-migration/ some use cases are already supported by kubernetes and should be migrated. If your tool can migrate, please do plan a migration. Reach out if you need help or find you are blocked by missing features. Most of all, WMCS is here to support you.

However, it’s possible your tool needs a mixed runtime environment or some other features that aren't yet present in https://techblog.wikimedia.org/2022/03/18/toolforge-jobs-framework/. We’d love to hear of this or any other blocking issues so we can work with you once a migration path is ready. Thanks for your hard work as volunteers and help in this migration!

This is a reminder that the tool for which this ticket is created is still running on the Grid.
The grid is deprecated and all remaining tools need to migrate to Toolforge Kubernetes.

We've sent several emails to maintainers as we continue to make the move away from the Grid.
Many of the issues that have held users back from moving away from the Grid have been addressed in
the latest updates to Build Service. See: https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Changelog

You might find the following resources helpful in migrating your tool:

  1. https://wikitech.wikimedia.org/wiki/Help:Toolforge/Build_Service#Migrating_an_existing_tool
  2. https://wikitech.wikimedia.org/wiki/Help:Toolforge/Build_Service#Tutorials_for_popular_languages

Don't hesitate to reach out to us using this ticket or via any of our support channels

If you have already migrated this tool, kindly mark this ticket as 'resolved'
To do this, click on the 'Add Action' dropdown above the comment text box, select 'Change Status', then 'Resolved'.
Click 'Submit'

Thank you!

Hi. Please do not shut down this tool in the first stage of the Grid phaseout. (FYI, although this tool has three maintainers, I am the only one who is active.) I am aware of the need to transition to the Build service, but haven't had time yet to address it.

I would appreciate some confirmation that this tool's jobs will not be deleted on 14 December. Working on migration.

russblau added a subscriber: Dispenser.
russblau removed a subscriber: Dispenser.

@russblau Thanks for reaching out! We won't shutdown the tool on December 14th, as we've been able to establish contact with at least one maintainer (yourself). It's ok if you're not entirely sure of your plans for the tool yet or the timeline. We want as seamless a transition as possible for you. We are here to help with any blocking issues you find in the migration, as well as talk through any technical challenges you encounter. Don't hesitate to reach out if you have a question or concern. Thanks again for responding!

I do have some questions. For background, this tool consists of a PHP webserver that relies on a Toolsdb database, with a number of jobs (PHP scripts) run at varying intervals to populate the database. But there are also several Python scripts, relying on Pywikibot, that write to Wikipedia. So I believe I need to use the new Build service to generate a container that can run both PHP and Python. In reviewing the documentation at https://wikitech.wikimedia.org/wiki/Help:Toolforge/Build_Service, I have the following questions (and probably will have more as I proceed):

  1. I take it I would need to include both PHP and Python3 in my Aptfile; is this correct?
  2. How do I install pywikibot (or other required Python packages) inside the built image?
  3. From https://wikitech.wikimedia.org/wiki/Help:Toolforge/My_first_Buildpack_PHP_tool, I gather that I can start an Apache webserver by putting "web: heroku-php-apache2" in my Procfile. This server apparently looks for index.php in the root directory of my project's Git repository. How do I tell it to look for webpages in a different directory? (This is so I can keep the web tools in one directory and the database scripts in a different one.)

See the usage of the pywikibot image here

Is your repo, public?

I do have some questions. For background, this tool consists of a PHP webserver that relies on a Toolsdb database, with a number of jobs (PHP scripts) run at varying intervals to populate the database. But there are also several Python scripts, relying on Pywikibot, that write to Wikipedia. So I believe I need to use the new Build service to generate a container that can run both PHP and Python. In reviewing the documentation at https://wikitech.wikimedia.org/wiki/Help:Toolforge/Build_Service, I have the following questions (and probably will have more as I proceed):

  1. I take it I would need to include both PHP and Python3 in my Aptfile; is this correct?
  2. How do I install pywikibot (or other required Python packages) inside the built image?

I have some questions too :)

  1. Does the php code call the python code?
  2. If not, can they be separated in two different repositories?

If they can be split in different repositories, things are a bit easier as you would not need to have a multi-stack image.
In that case you can deploy the php webservice and php jobs image using the standard php buildpack (toolforge build <php-repo-url>), and adding extra entries to the Procfile (one for each job, see https://wikitech.wikimedia.org/wiki/Help:Toolforge/Build_Service#Job).
And then run the python scripts using the pywikibot image (see https://wikitech.wikimedia.org/wiki/Help:Toolforge/Running_Pywikibot_scripts, same as @komla linked).

  1. From https://wikitech.wikimedia.org/wiki/Help:Toolforge/My_first_Buildpack_PHP_tool, I gather that I can start an Apache webserver by putting "web: heroku-php-apache2" in my Procfile. This server apparently looks for index.php in the root directory of my project's Git repository. How do I tell it to look for webpages in a different directory? (This is so I can keep the web tools in one directory and the database scripts in a different one.)

You should be able to pass the path to the document root to the heroku-php-apache2 command, this is, in your Procfile have an entry like:

web: heroku-php-apache2 path/to/dir/with/index/file/

And that will load the index.php from that path, I'll add to the docs :)

See the usage of the pywikibot image here

Thanks; I wasn't aware that was available.

Is your repo, public?

It doesn't exist yet. This exercise gave me a good reason to finally get around to setting one up.

I have some questions too :)

  1. Does the php code call the python code?
  2. If not, can they be separated in two different repositories?

No, but there are some shell scripts that call php code and then call python code after the php is finished. The order of execution is significant. I suppose I can work around this by checking timestamps and so forth, but that might be a lot easier than trying to build a custom image.

If they can be split in different repositories, things are a bit easier as you would not need to have a multi-stack image.
In that case you can deploy the php webservice and php jobs image using the standard php buildpack (toolforge build <php-repo-url>), and adding extra entries to the Procfile (one for each job, see https://wikitech.wikimedia.org/wiki/Help:Toolforge/Build_Service#Job).
And then run the python scripts using the pywikibot image (see https://wikitech.wikimedia.org/wiki/Help:Toolforge/Running_Pywikibot_scripts, same as @komla linked).

If I'm doing that, can't I just use the existing Kubernetes job framework (toolforge-jobs run --image [php7.4 | tool-pywikibot/pywikibot-scripts-stable:latest] ...) instead of using the Build service?

  1. From https://wikitech.wikimedia.org/wiki/Help:Toolforge/My_first_Buildpack_PHP_tool, I gather that I can start an Apache webserver by putting "web: heroku-php-apache2" in my Procfile. This server apparently looks for index.php in the root directory of my project's Git repository. How do I tell it to look for webpages in a different directory? (This is so I can keep the web tools in one directory and the database scripts in a different one.)

You should be able to pass the path to the document root to the heroku-php-apache2 command, this is, in your Procfile have an entry like:

web: heroku-php-apache2 path/to/dir/with/index/file/

And that will load the index.php from that path, I'll add to the docs :)

Thanks!

If they can be split in different repositories, things are a bit easier as you would not need to have a multi-stack image.
In that case you can deploy the php webservice and php jobs image using the standard php buildpack (toolforge build <php-repo-url>), and adding extra entries to the Procfile (one for each job, see https://wikitech.wikimedia.org/wiki/Help:Toolforge/Build_Service#Job).
And then run the python scripts using the pywikibot image (see https://wikitech.wikimedia.org/wiki/Help:Toolforge/Running_Pywikibot_scripts, same as @komla linked).

If I'm doing that, can't I just use the existing Kubernetes job framework (toolforge-jobs run --image [php7.4 | tool-pywikibot/pywikibot-scripts-stable:latest] ...) instead of using the Build service?

The pywikibot image there was created using the buildservice :)
The difference here would be that you can install any packages you like (using the Aptfile) and any dependencies you might need (python/php), instead of relying on the ones that come by default with the php7.4 image.

Note that the flow would be:

  • Use the buildservice to build the image with your custom dependencies (would be <tool-name>/<tool-name>:latest)
  • Start a php/script job using that image (something like toolforge jobs run --image <tool-name>/<tool-name>:latest ...)
  • Start pywikibot jobs using the pywikibot image (toolforge-jobs run --image tool-pywikibot/pywikibot-scripts-stable:latest ...)

The issue is that the pywikibot image does not include php, and using pywikibot might not be as straight-forward as just adding it to the requirement.txt

I might look a bit into creating a buildpack for piwikibot-scripts, but that might take some time.

I've got another problem. I'm running my webservice with "--backend kubernetes php7.4". I've rewritten a number of web tools that access ToolsDB to use the envvars service instead of hard-coded paths and reading ~/.my.cnf directly to get the database user name and password. But the environment variables apparently don't propagate into the web server. The odd thing is that if I run webservice shell, the environment variables are accessible both from bash and from php:

tools.dplbot@tools-sgebastion-11:~$ webservice shell
tools.dplbot@shell-1703361825:~$ cat test.sh
#!/usr/bin/bash
echo "bash: " $TOOL_TOOLSDB_USER
php -r 'echo "PHP: ".getenv("TOOL_TOOLSDB_USER")."\n";'
tools.dplbot@shell-1703361825:~$ ./test.sh
bash: s51290
PHP: s51290
tools.dplbot@shell-1703361825:~$

However, if I create a webpage (https://dplbot.toolforge.org/test.php) that calls the same getenv("TOOL_TOOLSDB_USER"), the function returns null. Same thing happens with getenv("TOOL_DATA_DIR").

How do I get my web tools to access these environment variables?

I can reproduce yes, the vars are set in the container that runs the application:

tools.jupytest@tools-sgebastion-10:~$ kubectl get pods -o yaml | grep -A 2 TOOL
      - name: TOOL_REPLICA_PASSWORD
        valueFrom:
          secretKeyRef:
            key: TOOL_REPLICA_PASSWORD
            name: toolforge.envvar.v1.tool-replica-password
      - name: TOOL_REPLICA_USER
        valueFrom:
          secretKeyRef:
            key: TOOL_REPLICA_USER
            name: toolforge.envvar.v1.tool-replica-user
      - name: TOOL_TOOLSDB_PASSWORD
        valueFrom:
          secretKeyRef:
            key: TOOL_TOOLSDB_PASSWORD
            name: toolforge.envvar.v1.tool-toolsdb-password
      - name: TOOL_TOOLSDB_USER
        valueFrom:
          secretKeyRef:
            key: TOOL_TOOLSDB_USER
            name: toolforge.envvar.v1.tool-toolsdb-user

But it seems they are scrubbed somewhere before running the php script. Looking

No jobs running on the grid at this time, and crontab has been blanked.

No jobs running on the grid at this time, and crontab has been blanked.

Awesome!!!

Fyi. now envvars are passed correctly to the php images, just rolled out the fix.