Page MenuHomePhabricator

Set up a regular job to refresh suggestions and run the bot
Closed, ResolvedPublic

Description

Now that we add doi-access=free, there are many edits to do regularly: the latest run made about 14k edits only 3 months after the previous. It also becomes more important to have fresh suggestions for users of the tool.

I think it's time to just automate the regular runs. While at it, we can use the Kubernetes cronjobs:
https://wikitech.wikimedia.org/wiki/Help:Toolforge/Kubernetes#Kubernetes_cronjobs

Usually I would check the queue manually and move away suggestions I considered lower priority in the moment, depending on the circumstances, while making sure there were always a few hundreds suggestions. For the moment we have over 2000 from various "safe" repositories, so we can just move away a couple of big domains like SemanticScholar. I prefer to keep producing those suggestions in case we run low on the others and need to replenish the queue quickly.

I have a pull request at https://github.com/dissemin/oabot/pull/75 , already being tested in production.

Wasn't that a nice fast service? :-)
https://en.wikipedia.org/w/index.php?title=Controlled_digital_lending&diff=next&oldid=997929255

Event Timeline

Nemo_bis triaged this task as Medium priority.Jan 4 2021, 7:00 AM
Nemo_bis created this task.
Nemo_bis updated the task description. (Show Details)

The cronjob is set for every Saturday (should take at least 2-3 days to complete though) and the current test seems to be going fine, so with 56164d1ea7a7023886cd0330d2eaec2a111ebedd this should be fixed.

Concurrency was reduced so that the run now takes approximately 60 hours. This is the first set of User:OAbot edits made automatically without any manual intervention, as part of the cronjob:
https://en.wikipedia.org/w/index.php?title=Special:Contributions&dir=prev&offset=20210106153526&limit=100&target=OAbot

Hmpf, the job was marked as failed and a new pod appeared even though the cronjob was marked as never to be restarted. I'll have to check the syntax.

$ kubectl get pod
NAME                            READY   STATUS    RESTARTS   AGE
oabot-5d788d549b-m5gp7          1/1     Running   0          32d
oabotrefresh-1610158620-4pj7t   1/1     Running   0          12h
oabotrefresh-1610158620-jf5pw   0/1     Error     0          2d18h

Note: work is continuing at T272006. I'm in talk with Unpaywall/Our Research about our API usage.