Page MenuHomePhabricator

Create Docker image for Toolforge that is purpose built to run pywikibot scripts
Open, MediumPublic

Description

Many grid engine jobs are executions of pywikibot scripts. An unknown but presumably large number of these are using some variation of the recommended pywikibot process.

The WMCS team would like to find ways to reduce folks dependence on the grid engine. Making a simple way for folks to run pywikibot on the Kubernetes grid seems like a good place to start on this larger goal.

In response to a random musing on IRC, @JJMC89 reported that pywikibot makes a stable release ~4-7 times per year [ 1 ]. This seems like an easy pace to keep up with even without a CI/CD system for updating our Docker images.

I'm thinking a first attempt at this could look something like a docker image based on whatever our latest py3 base container is with pywikibot, its direct dependencies, and most/all of the currently globally installed python3 packages included. This would be similar to the upstream Dockerfile with a few other bells and whistles.

This image should also do as many convention over configuration things as possible with the ultimate goal being that running a pywikibot workload on the Kubernetes cluster looks something like (fake commands ahead!):

$ become my-cool-pwb-tool
$ pwb-k8s init
Checking for local scripts directory...
Local scripts directory not found. Creating $HOME/pwb
Checking for user files...
User files not found. Running generate_user_files.py
[whatever generate_user_files.py does happens here]
$ pwb-k8s run version.py
Pywikibot: [https] r-pywikibot-core.git (df69134, g1, 2020/03/30, 11:17:54, OUTDATED)
Release version: 3.1.dev0
requests version: 2.12.4
  cacerts: /etc/ssl/certs/ca-certificates.crt
    certificate test: ok
Python: 3.5.3 (default, Sep 27 2018, 17:25:39)
[GCC 6.3.0 20170516]
$ vim pwb/my_cool_script.py
$ pwb-k8s cron --hour 3 --minute 17 my_cool_script.py
CronJob created
$ kubectl get cronjob
NAME                    SCHEDULE     SUSPEND   ACTIVE   LAST SCHEDULE   AGE
my_cool_script.py       17 3 * * *   False     0        0m              0m
$ kubectl describe cronjob my_cool_script.py
Name:                          my_cool_script.py
Namespace:                     tool-my-cool-pwb-tool
Labels:                        name=my-cool-pwb-tool.my_cool_script.py
                               toolforge=tool
Annotations:                   <none>
Schedule:                      17 3 * * *
Concurrency Policy:            Allow
Suspend:                       False
Successful Job History Limit:  3
Failed Job History Limit:      1
Starting Deadline Seconds:     <unset>
Selector:                      <unset>
Parallelism:                   <unset>
Completions:                   <unset>
Pod Template:
  Labels:  toolforge=tool
  Containers:
   bot:
    Image:      docker-registry.tools.wmflabs.org/toolforge-python37-sssd-pwb:latest
    Port:       <none>
    Host Port:  <none>
    Args:
      python3
      /data/project/my-cool-pwb-tool/pwb/my_cool_script.py
    Environment:
      PYWIKIBOT_DIR:  /data/project/my-cool-pwb-tool/pwb
      HOME:           /data/project/my-cool-pwb-tool
    Mounts:           <none>
  Volumes:            <none>
Last Schedule Time:   <none>
Active Jobs:          <none>
Events:               <none>

Event Timeline

bd808 created this task.Apr 9 2020, 5:13 AM
bd808 updated the task description. (Show Details)Apr 9 2020, 5:27 AM
JHedden triaged this task as Medium priority.Apr 21 2020, 4:18 PM
Legoktm added a subscriber: Legoktm.May 9 2020, 6:29 PM

That all looks awesome, especially the cron part.

generate_user_files.py will generate a user-config.py file which needs to be in $PYWIKIBOT2_DIR.

Also, if we could autogenerate the https://www.mediawiki.org/wiki/Manual:Pywikibot/OAuth/Wikimedia part that would be nice.

And the cron thing could automatically redirect logs (both stdout and stderr) to somewhere and rotate them.

Xqt added a subscriber: Xqt.May 9 2020, 7:47 PM

Just for the record: a stable release is deployed monthly after tests pass. Sometimes it needs some additional time to fix all known issues for that release. Every stable release is marked with the „stable“ tag. This is useful if the local repository is checked out from the remote repository (which also includes common scripts to be used) instead as a side package. The last stable release was published yesterday.

bd808 added a comment.May 13 2020, 8:11 PM

Also, if we could autogenerate the https://www.mediawiki.org/wiki/Manual:Pywikibot/OAuth/Wikimedia part that would be nice.

I think we could easily output a link to those instructions. I'm not sure that there is any reasonable way to actually automate submitting the grant request. If there was, I think that would be an XSS attack vector that we would need to fix on the MediaWiki side.

Change 603652 had a related patch set uploaded (by BryanDavis; owner: Bryan Davis):
[operations/docker-images/toollabs-images@master] Pywikibot container

https://gerrit.wikimedia.org/r/603652