Page MenuHomePhabricator

Migrate ERANBOT project off of Grid Engine
Closed, ResolvedPublic

Description

Hello,

We are reaching out to you because you are listed as one of the maintainers of ERANBOT project.

We announced earlier[0] that long term support for Debian Stretch will cease in June, 2022.
We therefore need to shut down all Stretch hosts including Stretch Grid Engine before the end of support date to ensure that Toolforge remains a secure platform.

You should move the ERANBOT project away from Stretch Grid Engine before the deadline[1]
You have two options:

  • migrate from Toolforge Stretch Grid Engine to Toolforge Kubernetes[2].
  • migrate from Toolforge Stretch Grid Engine to Toolforge Buster Grid Engine.[3]

You should be aware that our ultimate goal is to deprecate Grid Engine
entirely and replace it with Kubernetes, so we encourage you to move to Kubernetes if you can.

We have also published a series of blogposts explaining further the reasoning behind this action[4]

If you have any peculiar challenges that prevent you from migrating away from Stretch, kindly share this here.
You can also reach out via any of our communication channels[5]

[0] https://lists.wikimedia.org/hyperkitty/list/cloud-announce@lists.wikimedia.org/thread/EPJFISC52T7OOEFH5YYMZNL57O4VGSPR/
[1] https://wikitech.wikimedia.org/wiki/News/Toolforge_Stretch_deprecation#Timeline
[2] https://wikitech.wikimedia.org/wiki/News/Toolforge_Stretch_deprecation#Move_a_grid_engine_webservice
[3] https://wikitech.wikimedia.org/wiki/Help:Toolforge/Jobs_framework#Grid_Engine_migration
[4] https://techblog.wikimedia.org/2022/03/14/toolforge-and-grid-engine/
[5] https://wikitech.wikimedia.org/wiki/Portal:Toolforge/About_Toolforge#Communication_and_support


Kindly migrate your tool(https://grid-deprecation.toolforge.org/t/eranbot) from Toolforge GridEngine to Toolforge Kubernetes.

Toolforge GridEngine is getting deprecated.
See: https://techblog.wikimedia.org/2022/03/14/toolforge-and-grid-engine/

Please note that a volunteer may perform this migration if this has not been done after some time.
If you have already migrated this tool, kindly mark this as resolved.

If you would rather shut down this tool, kindly do so and mark this as resolved.

Useful Resources:
Migrating Jobs from GridEngine to Kubernetes
https://wikitech.wikimedia.org/wiki/Help:Toolforge/Jobs_framework#Grid_Engine_migration
Migrating Web Services from GridEngine to Kubernetes
https://wikitech.wikimedia.org/wiki/News/Toolforge_Stretch_deprecation#Move_a_grid_engine_webservice
Python
https://wikitech.wikimedia.org/wiki/News/Toolforge_Stretch_deprecation#Rebuild_virtualenv_for_python_users

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
Restricted Application added a subscriber: Aklapper. ยท View Herald Transcript
kostajh added a subscriber: eranroz.

@eranroz do you have plans/capacity to work on this?

I just did this for another Python project and am confident we can take this one on, too. Unless someone beats us to it, CommTech is going to pair on this tomorrow, April 27 :)

Sorry, I forgot about T306391: [jobs-api] Allow Toolforge scheduled jobs to have a maximum runtime, which in the case of EranBot is rather critical if we want to migrate to k8s. For the time being we could stay on the grid engine, I suppose. Note also the bot needs to be reworked to use Python 3 at some point (T293688), but that can wait until after the Toolforge Stretch deprecation. I'm assuming the Python 2.7.13 -> 2.7.16 upgrade won't pose many problems.

After realizing the above, we decided to work on other things today, but I'm happy to do the migration in the coming days unless someone else wants to.

Please be aware that we currently have only limited quota with the upstream service used by the bot (T305318), so try to limit any "experiments" with the bot making actual uploads to iThenticate. We only have it running for English Wikipedia right now. For the Buster migration, I was planning on enabling it for Turkish or some other less-busy wiki to confirm things are working properly, before doing the full migration (and only enabling for English until we acquire more credits). Hopefully that makes sense.

Another thing that just occurred to me is EranBot has several other tasks than just "plagiabot"! @eranroz, I'm happy to copy over the cron jobs (or k8s scheduled jobs equivalents, if we end up going that route) as they are written now, but I'm not sure what all of the hewiki and "report on rewrite articles" scripts are supposed to do. This begs the question... with such broad and multi-team interest in the plagiabot task, would it make sense to move it to the copypatrol Toolforge account? That way everything CopyPatrol-related would live under the same roof. Unless I'm missing anything, there's no technical reason to keep it under eranbot.

Another thing that just occurred to me is EranBot has several other tasks than just "plagiabot"! @eranroz, I'm happy to copy over the cron jobs (or k8s scheduled jobs equivalents, if we end up going that route) as they are written now, but I'm not sure what all of the hewiki and "report on rewrite articles" scripts are supposed to do. This begs the question... with such broad and multi-team interest in the plagiabot task, would it make sense to move it to the copypatrol Toolforge account? That way everything CopyPatrol-related would live under the same roof. Unless I'm missing anything, there's no technical reason to keep it under eranbot.

Yes this is good idea to move the plagiabot activities from eranbot project to copypatrol project.

As of migration of to k8s - I haven't spent time on it yet, but planning to migrate different jobs (in eranbot and other projects I'm working on at some point. Please fill free to migrate plagiabot to copypatrol.

Yes this is good idea to move the plagiabot activities from eranbot project to copypatrol project.

As of migration of to k8s - I haven't spent time on it yet, but planning to migrate different jobs (in eranbot and other projects I'm working on at some point. Please fill free to migrate plagiabot to copypatrol.

Thanks! We'll get to work on this and will update here once it's complete or if we run into issues.

MusikAnimal added a subscriber: JJMC89.

I've simply set the cron job to run on buster and all seems to be working fine. I'm going to hold off on moving it to the copypatrol account for now. The current installation seems to have lots of relics and unused code, etc., and it's unclear what we need and what we don't. @JJMC89 has expressed interest in taking on T293688: Write new CopyPatrol backend to replace Plagiabot by rewriting the bot from scratch. I think that would be the better time to move it to the copypatrol account.

Note you may still get emails about the eranbot tool still needing migration to Buster, but this would be for the other cron jobs unrelated to Plagiabot / CopyPatrol. As such I'm not going to close this task, but I will untag CommTech and unassign myself.

Is the python 3 upgrade necessary to move the plagiabot API to Buster? Wiki Education Dashboard and Programs & Events Dashboard both rely on the plagiabot API (see T312790), and it's been down since the Stretch Grid Engine EOL.

If it's possible to get the current version back up and running independently of a Python 3 port, I would really appreciate it.

I've been added to the eranbot tool and I've been poking around, but I can't figure out how to set up the plagiabot API webservice. The code seems to be all in order in terms of requirements, as the scripts that run against the same repo work, but I'm not sure how the webservice was previously configured.

@eranroz any advice for how I should proceed to get the API back up (without breaking any of the other things eranbot is doing)?

I'm going to build a new tool to replace the plagiabot api; it's pretty simple and I can access the copyright diffs database independently of the plagiabot codebase, so this way I can hack on a webservice without messing with all the other things that the eranbot tool is still doing via cron jobs.

I'm in process of slowly migrating from cron jobs to tool forge jobs framework.

For the copyright detection bot (plagiabot) this was already done by @MusikAnimal who migrated it to copypatrol tool from eranbot tool and handle it there.
There are some other jobs, not related to copyrights, mostly hewiki jobs - which I'm not migrating to jobs framework, and also updating to python3 and generally newer tools.

@Ragesoss For the API part (webservice) - it is a self contained very simple python API exposing the database.
The code is in
https://github.com/valhallasw/plagiabot/tree/master/webservice
and I think it would make most sense to either run it from copypatrol account e.g to have something like https://copypatrol.toolforge.org/plagiabot/api.py
or to run it in a separate tool, as long as the copyright database is accessible (I'm almost sure I made the database public, so it should be available for all tools)

Thanks @eranroz. I didn't have any trouble connecting to the database from another tool, and I got a replacement for the API up and running.

The API is a little different and I only implemented the parts that Wiki Education Dashboard / Programs & Events Dashboard use, but it would be pretty easy to extend it if there's anything else from that API that others were relying on.

JJMC89 renamed this task from Migrate ERANBOT project off of Strech Grid Engine to Migrate ERANBOT project off of Grid Engine.Oct 6 2022, 7:19 PM

Please do not disable this tool on 2024-02-14. Migrating the remaining jobs is blocked on T333724: Deploy new CopyPatrol bot and update frontend.

Beside copyright bot /copypatrol /plagia bot - all jobs of the bot were moved to new toolforge-jobs .
I think we can close this ticket and mark it as complete?

Beside copyright bot /copypatrol /plagia bot - all jobs of the bot were moved to new toolforge-jobs .
I think we can close this ticket and mark it as complete?

It was my intention to migrate to k8s (via toolforge-jobs), but T306391 was the blocker to that. There are hacky workarounds (T306391#8146790), but we're so close to deploying the new CopyPatrol backend that it's probably not worth the hassle.

I'm back from holiday and still catching up on everything, but I will prioritize finishing up T333724 this week, including helping resolve the blockers with respect to Turnitin relations. So we're looking at maybe another week or so, hopefully. Thanks for working with us on this, @komla!

It was my intention to migrate to k8s (via toolforge-jobs), but T306391 was the blocker to that.

Also apparently there's no Python 2 container available anymore, so I suppose we'd have to use the Toolforge Build Service to get around that, if that's even permissible.

It was my intention to migrate to k8s (via toolforge-jobs), but T306391 was the blocker to that.

Also apparently there's no Python 2 container available anymore, so I suppose we'd have to use the Toolforge Build Service to get around that, if that's even permissible.

I would be quite surprised if the Python buildpack allowed choosing a Python2 version. The Python2 upstream sunset date was 2020-01-01.

@komla Going by the timeline, I understand you're aiming to shutdown the Grid entirely on March 14.

Unfortunately, CopyPatrol can't be migrated until a legal agreement with Turnitin (the copyvio detection service we use) is signed. We are currently waiting on their legal team to write out the agreement. I certainly hope this is done come March 14, but if not, what are the chances we could delay this a little bit longer? I'm just asking hypothetically for right now :)

Unfortunately, CopyPatrol can't be migrated until a legal agreement with Turnitin (the copyvio detection service we use) is signed.

Does this mean that somehow the existing application is subject to a legal agreement requiring that the software run on Son of Grid Engine?

Unfortunately, CopyPatrol can't be migrated until a legal agreement with Turnitin (the copyvio detection service we use) is signed.

Does this mean that somehow the existing application is subject to a legal agreement requiring that the software run on Son of Grid Engine?

Never mind. I see in T333724: Deploy new CopyPatrol bot and update frontend that you have coupled a rewrite with the platform switch and now have a licensing complication with the rewrite.

@komla Going by the timeline, I understand you're aiming to shutdown the Grid entirely on March 14.

Unfortunately, CopyPatrol can't be migrated until a legal agreement with Turnitin (the copyvio detection service we use) is signed. We are currently waiting on their legal team to write out the agreement. I certainly hope this is done come March 14, but if not, what are the chances we could delay this a little bit longer? I'm just asking hypothetically for right now :)

@MusikAnimal we will work together to ensure the community does not lose CopyPatrol.
Has the team given you any timeline on this?

We briefly discussed this in our team meeting today. From our perspective the original deadline was February 14th and the current one month period was supposed to be the buffer available for issues like you're facing, so we'd really rather not delay it any further. (Any delays will at least ruin my day on the 14th, but thankfully that's pretty much the worst issue it would cause :-)

Also apparently there's no Python 2 container available anymore, so I suppose we'd have to use the Toolforge Build Service to get around that, if that's even permissible.

There actually is a deprecated, unlisted python2 container available for the jobs framework. It's anchient and I have no clue if it still works, but that's still our preference over extending the lifetime of the grid.

@MusikAnimal Do you need any help getting this working? Would providing the python2 image work for you wile you rewrite the tool?

Note that the grid is being shut off on the 14th of March (next week).

TheresNoTime added a subscriber: JWheeler-WMF.

FYI @JWheeler-WMF โ€” this may need new prioritisation. Our project CopyPatrol ~depends on this.

Has the team given you any timeline on this?

We haven't been given a timeline. We did receive the first draft of the agreement, and there were concerns. I foresee this easily lasting through March 14 :(

(Any delays will at least ruin my day on the 14th, but thankfully that's pretty much the worst issue it would cause :-)

Relieving yet still makes me sad. You deserve to enjoy your March 14!

There actually is a deprecated, unlisted python2 container available for the jobs framework. It's anchient and I have no clue if it still works, but that's still our preference over extending the lifetime of the grid.

Do you need any help getting this working? Would providing the python2 image work for you wile you rewrite the tool?

We will give the python2 image a try. I hope to work on this maybe tomorrow or early next week, unless @JJMC89 is interested and beats me to it.


Thank you all for working with us on this! It has been an unfortunate sequence of events that led us here. We had hoped to ship the new CopyPatrol last December, but such is life.

Fyi. using toolforge jobs run --image python2 ... should work (just tested a silly script), even if it does not show in the list of images.

Okay, bear with me as I'm not very Python fluent.

I'm starting by trying out the eswiki job using the k8s python2 container. Here's the bash script:

eswiki.sh
#!/bin/bash
python ./gitPlagiabot/plagiabot/plagiabot.py -lang:es -blacklist:User:EranBot/Copyright/Blacklist -live:on -reportlogger

I first tried just running it as-is:

toolforge-jobs run --image python2 --mem 500Mi --continuous --command=./eswiki.sh eswiki

This produced the error ImportError: No module named MySQLdb. So next, I tried going by the docs to first create a virtual environment (note we intentionally use the name pyenv2 to not overwrite the Python 3 jobs relying on pyenv):

bootstrap_venv_python2.sh
#!/bin/bash

# Python2 apparently doesn't like strict mode (?)
# set -euo pipefail

# delete the venv, if it already exists
rm -rf pyvenv2

# create the virtualenv (venv is for Python 3)
python -m virtualenv pyvenv2

# activate it
source pyvenv2/bin/activate

# upgrade pip inside the venv and add support for the wheel package format
pip install -U pip wheel

# Install MySQLdb
pip install mysqlclient

After running this:

toolforge jobs run bootstrap-venv-python2 --command "cd $PWD && ./bootstrap_venv_python2.sh" --image python2 --wait

It errors out with:

New python executable in pyvenv2/bin/python2
Also creating executable in pyvenv2/bin/python
Installing setuptools, pip...done.
Running virtualenv with interpreter /usr/bin/python2
Downloading/unpacking pip from https://files.pythonhosted.org/packages/94/59/6638090c25e9bc4ce0c42817b5a234e183872a1129735a9330c472cc2056/pip-24.0.tar.gz#sha256=ea9bd1a847e8c5774a5777bb398c19e80bcd4e2aa16a4b301b718fe6f593aba2
  Running setup.py (path:/tmp/pip-build-g6C2fN/pip/setup.py) egg_info for package pip
    Traceback (most recent call last):
      File "<string>", line 17, in <module>
      File "/tmp/pip-build-g6C2fN/pip/setup.py", line 9
        f"pip{sys.version_info[0]}=pip._internal.cli.main:main",
                                                              ^
    SyntaxError: invalid syntax
    Complete output from command python setup.py egg_info:
    Traceback (most recent call last):

  File "<string>", line 17, in <module>

  File "/tmp/pip-build-g6C2fN/pip/setup.py", line 9

    f"pip{sys.version_info[0]}=pip._internal.cli.main:main",

                                                          ^

SyntaxError: invalid syntax

Any ideas how to solve this? Also pinging @eranroz in case you are around to assist.

That sounds like Pip is trying to upgrade itself to a version that does not support Python versions this anchient. Seems like just using the initial Python version is fine, so you need to remove the pip install -U pip wheel line for the init script.

Similarly, for mysqlclient, by default it will try to install a version that no longer supports Python 2. Based on https://github.com/PyMySQL/mysqlclient/blob/main/HISTORY.rst version 1.4.6 was the last one with Python 2 support, so you need to replace pip install mysqlclient with pip install mysqlclient==1.4.2.

And last, the eswiki.sh script needs to use the venv-managed Python somehow, either by referencing it directly (replace python with pyvenv2/bin/python) or by adding source pyvenv2/bin/activate before the Python command.

I ended up installing packages manually inside webservice python2 shell. Things were looking promising, but ultimately I'm stuck on a pywikibot / sseclient dependency.

Pywikibot complains with:

ImportError: sseclient is required for EventStreams;
install it with "pip install sseclient"

yet I have it installed:

(pyvenv2)tools.eranbot@shell-1709943357:~$ pip2 list
argparse (1.2.1)
certifi (2018.8.24)
chardet (3.0.4)
idna (2.8)
ipaddr (2.2.0)
mysqlclient (1.3.10)
pip (1.5.6)
pywikibot (3.0-dev)
requests (2.21.0)
setuptools (5.5.1)
six (1.9.0)
sseclient (0.0.18)
urllib3 (1.24.3)
wsgiref (0.1.2)

I tried different versions of both pywikibot and sseclient, but kept running into the same error.

I've spent a whole day working on this and am now forced to give up. I'll revisit later. If anyone has any ideas on what might be causing the above issue, please enlighten :)

PYTHONPATH shenanigans

tools.eranbot@tools-sgebastion-11:~$ webservice python2 shell
tools.eranbot@shell-1709950645:~$ echo $PYTHONPATH
/data/project/eranbot/six:/data/project/eranbot/pywikibot/core/:/data/project/eranbot/pywikibot/core/externals/httplib2:/data/project/eranbot/pywikibot/core/scripts
tools.eranbot@shell-1709950645:~$ . pyvenv2/bin/activate
(pyvenv2)tools.eranbot@shell-1709950645:~$ pip2 freeze > gitPlagiabot/plagiabot/requirements.txt
(pyvenv2)tools.eranbot@shell-1709950645:~$ cat gitPlagiabot/plagiabot/requirements.txt
argparse==1.2.1
certifi==2018.8.24
chardet==3.0.4
idna==2.8
ipaddr==2.2.0
mysqlclient==1.3.10
pywikibot==3.0-dev
requests==2.21.0
six==1.9.0
sseclient==0.0.18
urllib3==1.24.3
wsgiref==0.1.2
(pyvenv2)tools.eranbot@shell-1709950645:~$ vi gitPlagiabot/plagiabot/requirements.txt  # removed pywikibot
(pyvenv2)tools.eranbot@shell-1709950645:~$ unset PYTHONPATH
(pyvenv2)tools.eranbot@shell-1709950645:~$ pip2 install -U -r gitPlagiabot/plagiabot/requirements.txt -e pywikibot/core
Obtaining file:///data/project/eranbot/pywikibot/core
  Running setup.py (path:/data/project/eranbot/pywikibot/core/setup.py) egg_info for package from file:///data/project/eranbot/pywikibot/core
    /usr/lib/python2.7/distutils/dist.py:267: UserWarning: Unknown distribution option: 'python_requires'
      warnings.warn(msg)

Requirement already up-to-date: argparse==1.2.1 in /usr/lib/python2.7 (from -r gitPlagiabot/plagiabot/requirements.txt (line 1))
Requirement already up-to-date: certifi==2018.8.24 in ./pyvenv2/lib/python2.7/site-packages (from -r gitPlagiabot/plagiabot/requirements.txt (line 2))
Requirement already up-to-date: chardet==3.0.4 in ./pyvenv2/lib/python2.7/site-packages (from -r gitPlagiabot/plagiabot/requirements.txt (line 3))
Requirement already up-to-date: idna==2.8 in ./pyvenv2/lib/python2.7/site-packages (from -r gitPlagiabot/plagiabot/requirements.txt (line 4))
Requirement already up-to-date: ipaddr==2.2.0 in ./pyvenv2/lib/python2.7/site-packages (from -r gitPlagiabot/plagiabot/requirements.txt (line 5))
Requirement already up-to-date: mysqlclient==1.3.10 in ./pyvenv2/lib/python2.7/site-packages (from -r gitPlagiabot/plagiabot/requirements.txt (line 6))
Requirement already up-to-date: requests==2.21.0 in ./pyvenv2/lib/python2.7/site-packages (from -r gitPlagiabot/plagiabot/requirements.txt (line 7))
Downloading/unpacking six==1.9.0 (from -r gitPlagiabot/plagiabot/requirements.txt (line 8))
  Downloading six-1.9.0-py2.py3-none-any.whl
Requirement already up-to-date: sseclient==0.0.18 in ./pyvenv2/lib/python2.7/site-packages (from -r gitPlagiabot/plagiabot/requirements.txt (line 9))
Requirement already up-to-date: urllib3==1.24.3 in ./pyvenv2/lib/python2.7/site-packages (from -r gitPlagiabot/plagiabot/requirements.txt (line 10))
Requirement already up-to-date: wsgiref==0.1.2 in /usr/lib/python2.7 (from -r gitPlagiabot/plagiabot/requirements.txt (line 11))
Installing collected packages: six, pywikibot
  Running setup.py develop for pywikibot
    /usr/lib/python2.7/distutils/dist.py:267: UserWarning: Unknown distribution option: 'python_requires'
      warnings.warn(msg)

    Creating /data/project/eranbot/pyvenv2/lib/python2.7/site-packages/pywikibot.egg-link (link to .)
    Adding pywikibot 3.0.20211020.dev0 to easy-install.pth file

    Installed /data/project/eranbot/pywikibot/core
Successfully installed six pywikibot
Cleaning up...
(pyvenv2)tools.eranbot@shell-1709950645:~$ pip2 list
argparse (1.2.1)
certifi (2018.8.24)
chardet (3.0.4)
idna (2.8)
ipaddr (2.2.0)
mysqlclient (1.3.10)
pip (1.5.6)
pywikibot (3.0.20211020.dev0, /data/project/eranbot/pywikibot/core)
requests (2.21.0)
setuptools (5.5.1)
six (1.9.0)
sseclient (0.0.18)
urllib3 (1.24.3)
wsgiref (0.1.2)
(pyvenv2)tools.eranbot@shell-1709950645:~$ vi eswiki.sh
(pyvenv2)tools.eranbot@shell-1709950645:~$ cat eswiki.sh
#!/bin/bash
unset PYTHONPATH

pyvenv2/bin/python ./gitPlagiabot/plagiabot/plagiabot.py -lang:es -blacklist:User:EranBot/Copyright/Blacklist -live:on -reportlogger
tools.eranbot@tools-sgebastion-11:~$ toolforge-jobs run --image python2 --mem 500Mi --continuous --command=./eswiki.sh eswiki --emails onfailure

Ping - is there anything blocking the migration of the remaining tasks/wikis?

@MusikAnimal @JJMC89 One assumes that, following the above template, we could create (eg) enwiki.sh with

#!/bin/bash
unset PYTHONPATH

pyvenv2/bin/python ./gitPlagiabot/plagiabot/plagiabot.py -lang:en -blacklist:User:EranBot/Copyright/Blacklist -live:on -reportlogger -pagetriagetag

and then run

toolforge-jobs run --image python2 --mem 500Mi --continuous --command=./enwiki.sh enwiki --emails onfailure

and then remove/comment

# run on enwiki  - replace appear above /data/project/eranbot/gitPlagiabot/plagiabot/plagiabotOnline.sh
*/10 * * * * jsub -release buster -N enwiki -mem 500m -l h_rt=4:05:00 -once -quiet -o /data/project/eranbot/outs python /data/project/eranbot/gitPlagiabot/plagiabot/plagiabot.py -lang:en -blacklist:User:EranBot/Copyright/Blacklist -live:on -reportlogger -pagetriagetag

from the crontab?

Ideally we'd add these jobs to jobs.yaml, right?

I've done the above (bar adding anything to jobs.yaml) for frwiki and will check if its working as expected โ€” if so, and notwithstanding any comments here, I will create and prepare the rest of the enabled projects

Question for @MusikAnimal and/or @JJMC89: I note that these jobs in the crontab are running every 10 minutes (*/10 * * * *) but are set as --continuous above. I've copied that for frwiki, but am now wondering if that is correct.

If so, we only have a quota of 3 continuous jobs:

tools.eranbot@tools-sgebastion-10:~$ toolforge-jobs quota
Running jobs                                  Used    Limit
--------------------------------------------  ------  -------
Total running jobs at once (Kubernetes pods)  2       16
Running one-off and cron jobs                 0       15
CPU                                           1       8
Memory                                        1000Mi  8Gi

Per-job limits    Limit
----------------  -------
CPU               3
Memory            6Gi

Job definitions                             Used    Limit
----------------------------------------  ------  -------
Cron jobs                                      8       50
Continuous jobs (including web services)       2        3

Request for @taavi: to save time if it turns out these should be continuous jobs, could you do us a favour and facilitate a quota increase to 10?

Question for @MusikAnimal and/or @JJMC89: I note that these jobs in the crontab are running every 10 minutes (*/10 * * * *) but are set as --continuous above. I've copied that for frwiki, but am now wondering if that is correct.

Ah, figured that the every ten minutes crontab and -once was akin to a --continuous job, so we definitely will need a quota increase. Will file a request in Toolforge (Quota-requests)

Have moved arwiki, enwiki, eswiki, frwiki, simplewiki to jobs following the template above.

tools.eranbot@tools-sgebastion-10:~$ toolforge-jobs list
Job name:            Job type:                Status:
-------------------  -----------------------  ----------------------------------------
draft-watch          schedule: 45 1 * * *     Last schedule time: 2024-03-12T01:45:00Z
he-ancient-pages     schedule: 15 21 * * 2    Last schedule time: 2024-03-05T21:15:00Z
he-missing-labels    schedule: 15 21 * * 2    Last schedule time: 2024-03-05T21:15:00Z
rewrite-easy         schedule: 15 16 * * 2    Last schedule time: 2024-03-05T16:15:00Z
rewrite-importants   schedule: 15 16 * * 2    Last schedule time: 2024-03-05T16:15:00Z
sandbox-move         schedule: 15 1,13 * * *  Last schedule time: 2024-03-12T01:15:00Z
weekly-collabration  schedule: 15 21 * * 6    Last schedule time: 2024-03-09T21:15:00Z
work-template        schedule: 30 1,13 * * *  Last schedule time: 2024-03-12T01:30:00Z
arwiki               continuous               Running
enwiki               continuous               Running
eswiki               continuous               Running
frwiki               continuous               Running
simplewiki           continuous               Running

Of possible concern is:

==> arwiki.err <==
  File "/data/project/eranbot/pywikibot/core/pywikibot/page.py", line 2265, in __init__
    super(Page, self).__init__(source, title, ns)
  File "/data/project/eranbot/pywikibot/core/pywikibot/page.py", line 192, in __init__
    self._link = Link(title, source=source, default_namespace=ns)
  File "/data/project/eranbot/pywikibot/core/pywikibot/tools/__init__.py", line 1488, in wrapper
    return obj(*__args, **__kw)
  File "/data/project/eranbot/pywikibot/core/pywikibot/page.py", line 5517, in __init__
    "Title contains illegal char (\\uFFFD 'REPLACEMENT CHARACTER')")
Error: Title contains illegal char (\uFFFD 'REPLACEMENT CHARACTER')
WARNING: Empty message found.

and

==> simplewiki.err <==
Checking uploads (4). Traceback (most recent call last):
  File "/data/project/eranbot/gitPlagiabot/plagiabot/plagiabot.py", line 864, in <module>
    main()
  File "/data/project/eranbot/gitPlagiabot/plagiabot/plagiabot.py", line 859, in main
    bot.run()
  File "/data/project/eranbot/gitPlagiabot/plagiabot/plagiabot.py", line 636, in run
    while not self.uploads_ready(): continue
  File "/data/project/eranbot/gitPlagiabot/plagiabot/plagiabot.py", line 200, in uploads_ready
    raise Exception('Error retreving document {}. Response status: {}'.format(upload_id, document_get_response['status']))
Exception: Error retreving document 107507947. Response status: 401

Grabbing this to get it across the line hopefully :-)

Backed up jobs.yaml to jobs.backup.20240312.yaml and made the following change to jobs.yaml:

diff --git a/./jobs.old.yaml b/./jobs.yaml
index 5046b2c..e80dfb2 100644
--- a/./jobs.old.yaml
+++ b/./jobs.yaml
@@ -47,4 +47,35 @@
   image: tf-python39
   command: pyvenv/bin/python reports/reporter.py ื•ื™ืงื™ืคื“ื™ื”:ืชื—ื–ื•ืงื”/ืฉื›ืชื•ื‘/ืงืœ /data/project/eranbot/reports/rewrite/rewrite_easy.sql
   emails: onfailure
-  schedule: "15 16 * * 2"
\ No newline at end of file
+  schedule: "15 16 * * 2"
+# plagiabot jobs
+- name: enwiki
+  image: python2
+  mem: 500Mi
+  continuous: true
+  command: ./enwiki.sh
+  emails: onfailure
+- name: eswiki
+  image: python2
+  mem: 500Mi
+  continuous: true
+  command: ./eswiki.sh
+  emails: onfailure
+- name: arwiki
+  image: python2
+  mem: 500Mi
+  continuous: true
+  command: ./arwiki.sh
+  emails: onfailure
+- name: frwiki
+  image: python2
+  mem: 500Mi
+  continuous: true
+  command: ./frwiki.sh
+  emails: onfailure
+- name: simplewiki
+  image: python2
+  mem: 500Mi
+  continuous: true
+  command: ./simplewiki.sh
+  emails: onfailure
\ No newline at end of file

the simplewiki/arwiki errors seem to have disappeared, all jobs are running, nothing is on the grid engine โ€” would appreciate confirmation that a) everything is working as expected from a bot/copypatrol point of view and b) that nothing remains on the grid engine :-)

the simplewiki/arwiki errors seem to have disappeared, all jobs are running, nothing is on the grid engine โ€” would appreciate confirmation that a) everything is working as expected from a bot/copypatrol point of view and b) that nothing remains on the grid engine :-)

Thank youuuu!

All seems to be working great! Thank you @JJMC89 and @TheresNoTime for tending to this, and to Cloud Services for their assistance and patience.

I think the only issue we may see near-term are the jobs dying and needing a restart, as they typically do. T306391 isn't a thing yet, so if it becomes a problem we can look into a shell script (i.e. T306391#8146790). But, I'm hoping it won't be too long before we retire the Python 2 bot entirely. I'll update over at T333724 whenever we hear back from Turnitin.

b) that nothing remains on the grid engine :-)

As far as I can tell eranbot running nothing on the grid engine anymore. Thank you!!!!