Page MenuHomePhabricator

Switch OAbot to Python3
Closed, ResolvedPublicBUG REPORT

Description

OAbot last made edits in October 2021 and I suspect it's because of some Python compatibility issue. Let's switch to Python3, there isn't that much to do: https://github.com/dissemin/oabot/pull/82

The only thing to replace is probably multiprocessing.

At the moment I can't test the changes on bastion (which is on python 2.7 or 3.5), because I can't recreate a virtual environment:

  • with python2, pywikibot 3 fails to install;
  • with python3, pywikibot installation complains about setuptools incompatibility in all versions I tried from 5 to 6.

Event Timeline

Nemo_bis created this task.

It helped to follow https://wikitech.wikimedia.org/wiki/Help:Toolforge/Web/Python#Virtual_Environments_and_Packages which recommends to open a shell pod and create the virtualenv from there. The default venv directory is now on Python 3.9, so this is fixed. The web tool automatically loads on Python 3.

So far I've only made manual prefill/bot runs because it takes over a week with the current single threading (going through about 550k pages with a DOI).

The cronjob had not been working for a long time (last scheduled in 2022), I believe due to a stuck job. Now that I've deleted that old job, it should work again:

Last Schedule Time:  Sat, 07 May 2022 02:17:00 +0000
Active Jobs:         <none>
Events:
  Type    Reason      Age   From                Message
  ----    ------      ----  ----                -------
  Normal  MissingJob  20m   cronjob-controller  Active job went missing: oabotrefresh-1651889820

I'm not sure why I thought multiprocessing wasn't available. I've restored it in https://github.com/dissemin/oabot/pull/88 with slightly different settings.

The 400+ edits today (with about 1300 more cached) were the outcome of a regularly scheduled oabot refresh which took about 57 hours to prefill at 10 parallel threads. With one thread it would presumably take at least 3 weeks, but perhaps a monthly update is enough. I'll revisit the multiprocessing after the next run.