Page MenuHomePhabricator

bd808's big pile of refactoring ideas
Closed, ResolvedPublicFeature

Description

I would like to do some refactoring/rearranging of the code base with a goal of having the bot present as a single python program using a sub-command pattern to expose each major task (gerrit watcher, phab watcher, irc emitter, etc). I used this general pattern when creating the gitlab account approval bot where it has proven rather nice for local testing and exposing actions to Toolforge Jobs framework via a Toolforge Build Service managed image.

Changes envisioned:

  • Move all python code into a module (src/wikibugs2)
  • Create a top level entrypoint at wikibugs2.__main__ so that it can be executed as python3 -m wikibugs2 ...
  • Move the entrypoint of grrrrit.py to python3 -m wikibugs2 gerrit
  • Move the entrypoint of wikibugs.py to python3 -m wikibugs2 phorge
  • Move the entrypoint of redis2irc.py to python3 -m wikibugs2 irc
  • Move the entrypoint of tools/update_contributors.py to python3 -m wikibugs2 update-credits
  • Remove manage.py script
  • Remove log_to_irc.py script
  • Remove tools/test_tags.py script
  • Convert config.json / configfetcher.py configuration system to envvars
  • Convert gerrit ssh key to envvar setting
  • Introduce common Python linters & black for code formatting
  • Update k8s-jobs.yaml for new entrypoints

Remaining blockers to Toolforge Build Service conversion:

Details

TitleReferenceAuthorSource BranchDest Branch
bd808's move all the things refactoring and fixup MRtoolforge-repos/wikibugs2!6bd808work/bd808/big-pile-o-refactoringmain
Customize query in GitLab

Event Timeline

@Legoktm and @valhallasw: do either of you have concerns or fears about my proposed changes being implemented? If you do not have major objections I think I would kick this off by doing the repo creation and history import steps for T357850: Move wikibugs git hosting from Gerrit to GitLab and then working in a feature branch in the new gitlab repo to implement the giant pile of refactoring changes.

bd808 triaged this task as Medium priority.Feb 17 2024, 10:29 PM
bd808 changed the subtype of this task from "Task" to "Feature Request".

Sounds like a good way to bring wikibugs up to date with what has been learned in the software engineering world over the last 10 years :-)

There is some benefit of reading the current config from NFS vs requiring a restart to apply, primarily that some messages might get lost in the restart period otherwise. But there is also value in reducing/removing the NFS dependency.

One option could be to store the config in redis? That gets rid of NFS but keeps the live-update logic.

There is some benefit of reading the current config from NFS vs requiring a restart to apply, primarily that some messages might get lost in the restart period otherwise. But there is also value in reducing/removing the NFS dependency.

One option could be to store the config in redis? That gets rid of NFS but keeps the live-update logic.

Yeah, we could look at either redis or ToolsDB for storing the channel mapping config. I like that we have a "free" audit log for those config changes in the git repo though too. I think we can design an update process such that git is the canonical location but a web hook can trigger updating the running config without requiring a bot restart too. Pushing an updated structure into Redis could be one way to do that.

A ConfigMap would be the most Kubernetes native way to deal with the channel mappings, but right now Toolforge jobs & webservices don't have any built-in magic for using ConfigMaps so that would mean using Kubernetes directly to manage our Deployments. This is certainly possible (I run several direct Kubernetes tools), but it makes things a bit more involved for long term maintenance.

All sounds good to me, agreed that there are better ways to do automatic deploys without doing git pulls and all the internal detection logic. I am not sure whether this is in your scope or not, but from what I recall with your usage of ZNC to front other IRC bots, that would be nice to have for wikibugs too.

Otherwise I'm happy to consider myself "maintainer emeritus" - no need to consult with me before mass changing things :)

I am not sure whether this is in your scope or not, but from what I recall with your usage of ZNC to front other IRC bots, that would be nice to have for wikibugs too.

ZNC does do a really nice job of taking care of the hard parts of staying connected to an IRC network. That could be a "cheap" way to try to close T357729: wikibugs having a hard time staying connected to libera.chat IRC network. I've been hoping to make the ZNC setup that Tool-bridgebot uses easier to maintain by setting up a Toolforge Build Service maintained container to replace NFS loading of a hand compiled binary, so this could have some synergy with that work.

Otherwise I'm happy to consider myself "maintainer emeritus" - no need to consult with me before mass changing things :)

I'd love to have you around for helping think about hard bits, but I very much appreciate this vote of confidence in my stewardship of the project. You, @valhallasw, and @yuvipanda did good things with the project that have made following bugs and patches easier for a whole lot of us in the Wikimedia technical community over the last ~10 years. Thank you!

bd808 changed the task status from Open to In Progress.Feb 29 2024, 3:26 PM
bd808 claimed this task.
bd808 moved this task from Ready to Go to Doing on the Wikibugs board.

Code has been merged, but deploy did not go as hoped. The working copy on toolforge is rolled back to pre-refactor state for now.

The test deploy in the wikibugs-testing tool's namespace seems to be working now, including with a znc between the bot and libra.chat. I'm going to leave it running for a while before I try to cut the main service over again.

Mentioned in SAL (#wikimedia-cloud) [2024-03-03T14:00:51Z] <bd808> Preping for another attempt at migrating to newer codebase (T357851)

Clone the repo and create a venv for use with the python3.9 runtime container:

$ git clone https://gitlab.wikimedia.org/toolforge-repos/wikibugs2.git
$ webservice --backend=kubernetes python3.9 shell
$ python3 -m venv venv-wikibugs2-39
$ venv-wikibugs2-39/bin/python3 -m pip install -U pip wheel
$ cd wikibugs2
$ ../venv-wikibugs2-39/bin/python3 -m pip install -U -r requirements.txt
$ ../venv-wikibugs2-39/bin/python3 -m pip freeze
asyncio-redis==0.16.0
certifi==2024.2.2
charset-normalizer==3.3.2
click==8.1.7
coloredlogs==15.0.1
django-environ==0.11.2
docopt==0.6.2
fab==3.1.0
humanfriendly==10.0
idna==3.6
irc3==1.1.10
pkg_resources==0.0.0
PyYAML==6.0.1
redis==2.10.3
requests==2.31.0
urllib3==2.2.1
venusian==3.1.0
-e git+https://gitlab.wikimedia.org/toolforge-repos/wikibugs2.git@085fa909fb1d98526aad176ac45e2ad6921d6567#egg=wikibugs2
$ exit

Setup needed envvars to connect via znc:

$ toolforge envvars create IRC_SERVER znc
$ toolforge envvars create IRC_PORT 6667
$ toolforge envvars create IRC_USE_SSL false
$ toolforge envvars create IRC_USE_PASSWORD true
$ toolforge envvars create IRC_USER wikibugs/libera
$ toolforge envvars list
name                   value
GERRIT_SSH_KEY         ...REDACTED...
IRC_PASSWORD           ...REDACTED...
IRC_PORT               6667
IRC_SERVER             znc
IRC_USE_PASSWORD       true
IRC_USE_SSL            false
IRC_USER               wikibugs/libera
PHAB_TOKEN             ...REDACTED...
PYTHONDONTWRITEBYTECODE  1
PYTHONIOENCODING         utf8:backslashreplace
PYTHONUNBUFFERED         1
TOOL_REPLICA_PASSWORD  ...REDACTED...
TOOL_REPLICA_USER      s51894
TOOL_TOOLSDB_PASSWORD  ...REDACTED...
TOOL_TOOLSDB_USER      s51894

Setup Service to expose znc to namespace:

$ cat <<EOF >znc-service.yaml
kind: Service
apiVersion: v1
metadata:
  name: znc
  labels:
    app.kubernetes.io/name: znc
spec:
  selector:
    app.kubernetes.io/name: znc
  ports:
    - name: tcp-irc-svc
      protocol: TCP
      port: 6667
      targetPort: 6667
EOF
$ kubectl apply --validate=true -f znc-service.yaml
service/znc created
$ kubectl describe service znc
Name:              znc
Namespace:         tool-wikibugs
Labels:            app.kubernetes.io/name=znc
Annotations:       <none>
Selector:          app.kubernetes.io/name=znc
Type:              ClusterIP
IP Family Policy:  SingleStack
IP Families:       IPv4
IP:                10.106.117.253
IPs:               10.106.117.253
Port:              tcp-irc-svc  6667/TCP
TargetPort:        6667/TCP
Endpoints:         <none>
Session Affinity:  None
Events:            <none>

Mentioned in SAL (#wikimedia-cloud) [2024-03-03T14:40:13Z] <bd808> Replace redis2irc job with znc + irc jobs (T357851)

$ toolforge jobs delete redis2irc
$ toolforge jobs load --job znc toolforge-jobs.yaml
$ toolforge jobs load --job irc toolforge-jobs.yaml
$ toolforge jobs list
Job name:      Job type:    Status:
-------------  -----------  ---------
grrrrit        continuous   Running
irc            continuous   Running
wikibugs-phab  continuous   Running
znc            continuous   Running
$ kubectl logs --all-containers=true --ignore-errors --since=10m -f --prefix=true --selector 'app.kubernetes.io/name in (znc, irc)'

The pods seem to be running as expected, but the irc bot is not picking up messages from the redis queue. Rolling back and then will inspect configuration again.

Mentioned in SAL (#wikimedia-cloud) [2024-03-03T14:49:43Z] <bd808> Rolled back to redis2irc (T357851)

The pods seem to be running as expected, but the irc bot is not picking up messages from the redis queue. Rolling back and then will inspect configuration again.

gah. The legacy group of services is using REDIS_QUEUE_NAME=libera and not REDIS_QUEUE_NAME=wikibugs2 as I expected. That's why the new irc bot didn't see anything in the queue from the older feed sources.

ok. new plan: I've set a secret queue name for the new code to use via toolforge envvars create REDIS_QUEUE_NAME. Now I will switch the gerrit and phorge feeds to the new code and that new queue. I can use redis-cli to verify things being added to the queue. The last thing I will switch over is the irc bot + znc.

Mentioned in SAL (#wikimedia-cloud) [2024-03-03T15:28:03Z] <bd808> Replacing legacy grrrrit job with new gerrit job (T357851)

$ toolforge jobs delete grrrrit
$ toolforge jobs load --job gerrit wikibugs2/toolforge-jobs.yaml
$ kubectl logs --all-containers=true --ignore-errors --since=10m -f --prefix=true --selector app.kubernetes.io/name=gerrit
redis.svc.tools.eqiad1.wikimedia.cloud:6379> LLEN ...REDACTED...
(integer) 2
redis.svc.tools.eqiad1.wikimedia.cloud:6379> LRANGE ...REDACTED... 0 -1
1) "{\"raw\": true, \"msg\": \"(\\u000303PS1\\u000f) \\u000310\\u0002Rockingpenny4\\u000f: Uses rawElement for class names [extensions/AdminLinks] - \\u000310https://gerrit.wikimedia.org/r/1008105\\u000f\", \"channels\": [\"#wikimedia-dev\", \"#mediawiki-feed\"]}"
2) "{\"raw\": true, \"msg\": \"(\\u000303CR\\u000f) \\u000310\\u0002BryanDavis\\u000f: [C: \\u000304-1\\u000f] \\\"Test\\\" [labs/tools/wikibugs2] - \\u000310https://gerrit.wikimedia.org/r/1008016\\u000f (owner: \\u000310\\u0002BryanDavis\\u000f)\", \"channels\": [\"#wikimedia-cloud-feed\", \"#mediawiki-feed\"]}"
redis.svc.tools.eqiad1.wikimedia.cloud:6379>

Mentioned in SAL (#wikimedia-cloud) [2024-03-03T15:32:32Z] <bd808> Replacing legacy wikibugs-phab job with new phorge job (T357851)

$ toolforge jobs delete wikibugs-phab
$ toolforge jobs load --job phorge wikibugs2/toolforge-jobs.yaml
$ kubectl logs --all-containers=true --ignore-errors --since=10m -f --prefix=true --selector app.kubernetes.io/name=phorge
redis.svc.tools.eqiad1.wikimedia.cloud:6379> LLEN ...REDACTED...
(integer) 5
redis.svc.tools.eqiad1.wikimedia.cloud:6379> LRANGE ...REDACTED... 0 -1
1) "{\"raw\": true, \"msg\": \"(\\u000303PS1\\u000f) \\u000310\\u0002Rockingpenny4\\u000f: Uses rawElement for class names [extensions/AdminLinks] - \\u000310https://gerrit.wikimedia.org/r/1008105\\u000f\", \"channels\": [\"#wikimedia-dev\", \"#mediawiki-feed\"]}"
2) "{\"raw\": true, \"msg\": \"(\\u000303CR\\u000f) \\u000310\\u0002BryanDavis\\u000f: [C: \\u000304-1\\u000f] \\\"Test\\\" [labs/tools/wikibugs2] - \\u000310https://gerrit.wikimedia.org/r/1008016\\u000f (owner: \\u000310\\u0002BryanDavis\\u000f)\", \"channels\": [\"#wikimedia-cloud-feed\", \"#mediawiki-feed\"]}"
3) "{\"raw\": true, \"msg\": \"(\\u000303PS4\\u000f) \\u000310\\u0002TheDJ\\u000f: [DNM] Set a CSP header to sandbox uploaded files [core] - \\u000310https://gerrit.wikimedia.org/r/547930\\u000f (https://phabricator.wikimedia.org/T239069) (owner: \\u000310\\u0002Brian Wolff\\u000f)\", \"channels\": [\"#wikimedia-dev\", \"#mediawiki-feed\"]}"
4) "{\"url\": \"https://phabricator.wikimedia.org/T90594#9594178\", \"projects\": {\"User-bd808\": {\"shade\": \"checkered\", \"disabled\": false, \"tagtype\": \"user\", \"uri\": \"https://phabricator.wikimedia.org/tag/user-bd808/\"}, \"Wikibugs\": {\"shade\": \"blue\", \"disabled\": false, \"tagtype\": \"briefcase\", \"uri\": \"https://phabricator.wikimedia.org/tag/wikibugs/\"}}, \"user\": \"bd808\", \"title\": \"wikibugs test bug part II\", \"comment\": \"test\", \"status\": {\"old\": \"stalled\", \"new\": \"progress\", \"anchor\": \"9594178\"}}"
5) "{\"url\": \"https://phabricator.wikimedia.org/T1152#9594180\", \"projects\": {\"Wikibugs\": {\"shade\": \"blue\", \"disabled\": false, \"tagtype\": \"briefcase\", \"uri\": \"https://phabricator.wikimedia.org/tag/wikibugs/\"}}, \"user\": \"bd808\", \"title\": \"wikibugs test bug\"}"
redis.svc.tools.eqiad1.wikimedia.cloud:6379>

Mentioned in SAL (#wikimedia-cloud) [2024-03-03T15:35:59Z] <bd808> Replace redis2irc job with znc + irc jobs take 2 (T357851)

$ toolforge jobs delete redis2irc
$ toolforge jobs load --job znc wikibugs2/toolforge-jobs.yaml
$ toolforge jobs load --job irc wikibugs2/toolforge-jobs.yaml
$ kubectl logs --all-containers=true --ignore-errors --since=10m -f --prefix=true --selector 'app.kubernetes.io/name in (znc, irc, gerrit, phorge)'

Things are looking good so far.

Docs have been updated at https://www.mediawiki.org/wiki/Wikibugs for the new deployment. I think we can call this {{done}}.