Page MenuHomePhabricator

Migrate ERANBOT project off of Strech Grid Engine
Open, Needs TriagePublic

Description

Hello,

We are reaching out to you because you are listed as one of the maintainers of ERANBOT project.

We announced earlier[0] that long term support for Debian Stretch will cease in June, 2022.
We therefore need to shut down all Stretch hosts including Stretch Grid Engine before the end of support date to ensure that Toolforge remains a secure platform.

You should move the ERANBOT project away from Stretch Grid Engine before the deadline[1]
You have two options:
* migrate from Toolforge Stretch Grid Engine to Toolforge Kubernetes[2].
* migrate from Toolforge Stretch Grid Engine to Toolforge Buster Grid Engine.[3]

You should be aware that our ultimate goal is to deprecate Grid Engine
entirely and replace it with Kubernetes, so we encourage you to move to Kubernetes if you can.

We have also published a series of blogposts explaining further the reasoning behind this action[4]

If you have any peculiar challenges that prevent you from migrating away from Stretch, kindly share this here.
You can also reach out via any of our communication channels[5]

[0] https://lists.wikimedia.org/hyperkitty/list/cloud-announce@lists.wikimedia.org/thread/EPJFISC52T7OOEFH5YYMZNL57O4VGSPR/
[1] https://wikitech.wikimedia.org/wiki/News/Toolforge_Stretch_deprecation#Timeline
[2] https://wikitech.wikimedia.org/wiki/News/Toolforge_Stretch_deprecation#Move_a_grid_engine_webservice
[3] https://wikitech.wikimedia.org/wiki/Help:Toolforge/Jobs_framework#Grid_Engine_migration
[4] https://techblog.wikimedia.org/2022/03/14/toolforge-and-grid-engine/
[5] https://wikitech.wikimedia.org/wiki/Portal:Toolforge/About_Toolforge#Communication_and_support

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript
kostajh added a subscriber: eranroz.

@eranroz do you have plans/capacity to work on this?

I just did this for another Python project and am confident we can take this one on, too. Unless someone beats us to it, CommTech is going to pair on this tomorrow, April 27 :)

Sorry, I forgot about T306391: Allow Toolforge scheduled jobs to have a maximum runtime, which in the case of EranBot is rather critical if we want to migrate to k8s. For the time being we could stay on the grid engine, I suppose. Note also the bot needs to be reworked to use Python 3 at some point (T293688), but that can wait until after the Toolforge Stretch deprecation. I'm assuming the Python 2.7.13 -> 2.7.16 upgrade won't pose many problems.

After realizing the above, we decided to work on other things today, but I'm happy to do the migration in the coming days unless someone else wants to.

Please be aware that we currently have only limited quota with the upstream service used by the bot (T305318), so try to limit any "experiments" with the bot making actual uploads to iThenticate. We only have it running for English Wikipedia right now. For the Buster migration, I was planning on enabling it for Turkish or some other less-busy wiki to confirm things are working properly, before doing the full migration (and only enabling for English until we acquire more credits). Hopefully that makes sense.

Another thing that just occurred to me is EranBot has several other tasks than just "plagiabot"! @eranroz, I'm happy to copy over the cron jobs (or k8s scheduled jobs equivalents, if we end up going that route) as they are written now, but I'm not sure what all of the hewiki and "report on rewrite articles" scripts are supposed to do. This begs the question... with such broad and multi-team interest in the plagiabot task, would it make sense to move it to the copypatrol Toolforge account? That way everything CopyPatrol-related would live under the same roof. Unless I'm missing anything, there's no technical reason to keep it under eranbot.

Another thing that just occurred to me is EranBot has several other tasks than just "plagiabot"! @eranroz, I'm happy to copy over the cron jobs (or k8s scheduled jobs equivalents, if we end up going that route) as they are written now, but I'm not sure what all of the hewiki and "report on rewrite articles" scripts are supposed to do. This begs the question... with such broad and multi-team interest in the plagiabot task, would it make sense to move it to the copypatrol Toolforge account? That way everything CopyPatrol-related would live under the same roof. Unless I'm missing anything, there's no technical reason to keep it under eranbot.

Yes this is good idea to move the plagiabot activities from eranbot project to copypatrol project.

As of migration of to k8s - I haven't spent time on it yet, but planning to migrate different jobs (in eranbot and other projects I'm working on at some point. Please fill free to migrate plagiabot to copypatrol.

Yes this is good idea to move the plagiabot activities from eranbot project to copypatrol project.

As of migration of to k8s - I haven't spent time on it yet, but planning to migrate different jobs (in eranbot and other projects I'm working on at some point. Please fill free to migrate plagiabot to copypatrol.

Thanks! We'll get to work on this and will update here once it's complete or if we run into issues.

MusikAnimal added a subscriber: JJMC89.

I've simply set the cron job to run on buster and all seems to be working fine. I'm going to hold off on moving it to the copypatrol account for now. The current installation seems to have lots of relics and unused code, etc., and it's unclear what we need and what we don't. @JJMC89 has expressed interest in taking on T293688: Write new CopyPatrol backend to replace Plagiabot by rewriting the bot from scratch. I think that would be the better time to move it to the copypatrol account.

Note you may still get emails about the eranbot tool still needing migration to Buster, but this would be for the other cron jobs unrelated to Plagiabot / CopyPatrol. As such I'm not going to close this task, but I will untag CommTech and unassign myself.

Is the python 3 upgrade necessary to move the plagiabot API to Buster? Wiki Education Dashboard and Programs & Events Dashboard both rely on the plagiabot API (see T312790), and it's been down since the Stretch Grid Engine EOL.

If it's possible to get the current version back up and running independently of a Python 3 port, I would really appreciate it.

I've been added to the eranbot tool and I've been poking around, but I can't figure out how to set up the plagiabot API webservice. The code seems to be all in order in terms of requirements, as the scripts that run against the same repo work, but I'm not sure how the webservice was previously configured.

@eranroz any advice for how I should proceed to get the API back up (without breaking any of the other things eranbot is doing)?

I'm going to build a new tool to replace the plagiabot api; it's pretty simple and I can access the copyright diffs database independently of the plagiabot codebase, so this way I can hack on a webservice without messing with all the other things that the eranbot tool is still doing via cron jobs.

I'm in process of slowly migrating from cron jobs to tool forge jobs framework.

For the copyright detection bot (plagiabot) this was already done by @MusikAnimal who migrated it to copypatrol tool from eranbot tool and handle it there.
There are some other jobs, not related to copyrights, mostly hewiki jobs - which I'm not migrating to jobs framework, and also updating to python3 and generally newer tools.

@Ragesoss For the API part (webservice) - it is a self contained very simple python API exposing the database.
The code is in
https://github.com/valhallasw/plagiabot/tree/master/webservice
and I think it would make most sense to either run it from copypatrol account e.g to have something like https://copypatrol.toolforge.org/plagiabot/api.py
or to run it in a separate tool, as long as the copyright database is accessible (I'm almost sure I made the database public, so it should be available for all tools)

Thanks @eranroz. I didn't have any trouble connecting to the database from another tool, and I got a replacement for the API up and running.

The API is a little different and I only implemented the parts that Wiki Education Dashboard / Programs & Events Dashboard use, but it would be pretty easy to extend it if there's anything else from that API that others were relying on.