Page MenuHomePhabricator

Set up CI for mwcli
Closed, ResolvedPublic

Description

Currently there is no way to automatically validate changes to docker-compose.yml or the images it uses upstream.

We should figure out some way to do implement tests for the configuration YAML and relevant dev images so we can catch regressions.

One idea would be to have a job in CI that looks to see if a patch is touching docker-compose.yml. If so the job could ping a tool on a Cloud-VPS instance, and that tool would pull the patch, run docker-compose up -d and generally run through the install steps and assert that a MediaWiki site instance is running correctly at the end.

Event Timeline

Maybe this is more trouble than it's worth though, and an alternative would be to provide the tooling so that MediaWiki-Docker can easily be set up in the various ways that Quibble wants (e.g. swap PHP version, database type, etc), and then reuse the same docker-compose stack for Quibble (see also T234902)

@Krinkle would you be open to using core's docker-compose.yml to provide the PHP and MySQL environment for core's Travis CI tests? We would first need to do T245444: Create PHP 7.3 and PHP 7.4 variants of docker-registry.wikimedia.org/dev/stretch-php72-fpm-apache2-xdebug, but after that it would be pretty straightforward to drop in as a replacement to the Travis PHP/MySQL environment.

I don't think Travis CI should be the main point of testing for MediaWiki-Docker (because external, and only post-merge). Do we have a Jenkins job on our end for it already and/or is there one in the making?

Apart from that, I don't really mind one way or the other what we use in the Travis CI builds. I midly prefer it to be as minimal as possible, but if there is some value or preference from using it there, that seems fine :)

CI for dev environments is one of the reasons that I use my own fork of mwcli on github for dev of mwcli for example.
Github Actions allows pretty easy testing of things that make use of docker-compose and docker, that would be much harder in Jenkins.
https://github.com/addshore/mwcli/blob/775ff0ad3a23d8766d91d8ba9770ba42098f2c26/.github/workflows/go-ci.yml#L117-L159
It should be possible to setup some CI connection between Gerrit and Github Actions that would work pre merge that could fill this sort of case?

Gerrit change -> Jenkins job (triggers webhook) & waits for result? -> Github (runs a job)

I havn't tried this but all of the moving parts exist and should be possible, as long as the github webhooks can take data with them that can be used in jobs
https://blog.s1h.org/github-actions-webhook/

CI for dev environments is one of the reasons that I use my own fork of mwcli on github for dev of mwcli for example.
Github Actions allows pretty easy testing of things that make use of docker-compose and docker, that would be much harder in Jenkins.
https://github.com/addshore/mwcli/blob/775ff0ad3a23d8766d91d8ba9770ba42098f2c26/.github/workflows/go-ci.yml#L117-L159
It should be possible to setup some CI connection between Gerrit and Github Actions that would work pre merge that could fill this sort of case?

Gerrit change -> Jenkins job (triggers webhook) & waits for result? -> Github (runs a job)

I havn't tried this but all of the moving parts exist and should be possible, as long as the github webhooks can take data with them that can be used in jobs
https://blog.s1h.org/github-actions-webhook/

@Addshore since this task was filed, we started using Qemu for testing for Quibble's end-to-end tests and for Fresh (see T250808: Decide how to run a test involving docker inside WMF CI). I think we could do that for mwcli and its docker sub-commands. cc @hashar

@Addshore since this task was filed, we started using Qemu for testing for Quibble's end-to-end tests and for Fresh (see T250808: Decide how to run a test involving docker inside WMF CI). I think we could do that for mwcli and its docker sub-commands. cc @hashar

I've started working on this a couple of weeks ago but haven't pushed anything to Gerrit yet. I'll do it this week.

@Addshore since this task was filed, we started using Qemu for testing for Quibble's end-to-end tests and for Fresh (see T250808: Decide how to run a test involving docker inside WMF CI). I think we could do that for mwcli and its docker sub-commands. cc @hashar

I've started working on this a couple of weeks ago but haven't pushed anything to Gerrit yet. I'll do it this week.

Nice.

If it helps I can make a WIP gerrit change with some of the integration tests that I currently use on GIthub Actions for the mwdd stuff that should be easy enough to then run in such an environment?

The current HEAD of my dev branch as 2 integration tests, which run as separate jobs on Github Actions.
They each also need a clone of mediawiki that has been composer installed, and a build mwcli binary.
https://github.com/addshore/mwcli/tree/da49f72fd27af07a86e7dcb56e9b1f2200e6d406/.github/workflows/go-ci-integration
The step they run in can ben seen at https://github.com/addshore/mwcli/blob/da49f72fd27af07a86e7dcb56e9b1f2200e6d406/.github/workflows/go-ci.yml#L112-L156 but there isn't all that much going on it in really, just fetching the needed things and then running the script.

Change 683750 had a related patch set uploaded (by Jeena Huneidi; author: Jeena Huneidi):

[mediawiki/tools/cli@master] Add system test for CI

https://gerrit.wikimedia.org/r/683750

If it helps I can make a WIP gerrit change with some of the integration tests that I currently use on GIthub Actions for the mwdd stuff that should be easy enough to then run in such an environment?

The current HEAD of my dev branch as 2 integration tests, which run as separate jobs on Github Actions.
They each also need a clone of mediawiki that has been composer installed, and a build mwcli binary.
https://github.com/addshore/mwcli/tree/da49f72fd27af07a86e7dcb56e9b1f2200e6d406/.github/workflows/go-ci-integration
The step they run in can ben seen at https://github.com/addshore/mwcli/blob/da49f72fd27af07a86e7dcb56e9b1f2200e6d406/.github/workflows/go-ci.yml#L112-L156 but there isn't all that much going on it in really, just fetching the needed things and then running the script.

I think we could add it to the patch I uploaded (https://gerrit.wikimedia.org/r/683750)

Change 683753 had a related patch set uploaded (by Jeena Huneidi; author: Jeena Huneidi):

[integration/config@master] Add tests for mw-cli

https://gerrit.wikimedia.org/r/683753

Addshore renamed this task from Set up CI for MediaWiki-Docker to Set up CI for mwcli.May 9 2021, 10:13 AM
Addshore added a project: mwcli.

I changed the name to mwcli for now, as it seems that's what the patches are aiming to test.
Though there is perhaps also a case to be said for testing the docker-compose file in core, not only the CLI

Change 683750 merged by jenkins-bot:

[mediawiki/tools/cli@master] Add system test for CI

https://gerrit.wikimedia.org/r/683750

@jeena and I were doing some debugging to get this working in https://gerrit.wikimedia.org/r/c/mediawiki/tools/cli/+/690012 which included things like seting up .env but also installing docker-compose.

Most recent error is:

STDERR: Creating network '"mw-core_default"' with the default driver Pulling mediawiki '(docker-registry.wikimedia.org/dev/stretch-php72-fpm:2.0.0)...' failed to register layer: Error processing tar 'file(exit' status '1):' write /usr/bin/mbstream: no space left on device

It looks like the image doesn't have enough disk to store all of the things?

/dev/sda1       3.9G

I think we will end up needing something a bit bigger.
~10GB should be safe for docker integration tests
~20GB should be safer for mwdd integration tests

I'm not sure how to make a change like this to the qmeu image being used

Reading guides etc I think I managed to figure out the process for resizing

# Install the needed tool on the VM
apt-get install libguestfs-tools

# Create a new disk, and resize the old one into the new one
cp /srv/vm-images/qemu-debian10buster-2020_05_04b.img vm.img
truncate -s 20G ./out.img
virt-resize --expand /dev/sda1 ./vm.img ./out.img

# Verify the new disk size
virt-filesystems --long --parts --blkdevs -h -a ./out.img

I'll look into trying out this new image for the mwcli tests today.
Either I could make qemu-run.bash use an env var for the image to be used, then the larger image could be used only for the mwcli jobs. OR i could just use the new image for all jobs?
I'm not sure if this VM and image is used for anything other than mwcli at this point.

Change 690407 had a related patch set uploaded (by Addshore; author: Addshore):

[integration/config@master] qemu-run: use new 20GB root disk image

https://gerrit.wikimedia.org/r/690407

Change 690407 merged by jenkins-bot:

[integration/config@master] qemu-run: use new 20GB root disk image

https://gerrit.wikimedia.org/r/690407

I re ran the job now using the newer image.
It no longer complains about running out of disk space, but it does just unhelpfully timeout

https://integration.wikimedia.org/ci/job/mw-cli-test/22/consoleFull

12:51:38 /usr/local/bin/docker-compose --project-directory /root/src/core --project-name mw-core ps
12:51:38 
12:51:38 STDOUT:
12:51:38 Name   Command   State   Ports
12:51:38 ------------------------------
12:51:38 
12:51:38 
12:51:38 test: Start Mediawiki...
12:51:38 + title 'Start Mediawiki'
12:51:38 + echo
12:51:38 + echo 'test: Start Mediawiki...'
12:51:38 ++ /root/src/bin/mw docker -v2 start -y
13:08:52 Build timed out (after 30 minutes). Marking the build as failed.
13:08:52 Build was aborted
13:08:52 Archiving artifacts
13:08:52 Connection to localhost closed by remote host.
13:08:52 [WS-CLEANUP] Deleting project workspace...
13:08:52 [WS-CLEANUP] Deferred wipeout is used...
13:08:52 [WS-CLEANUP] done
13:08:52 Finished: FAILURE

I tweaked the test so that we can see the output of the start command (as before it was just being assigned to a var)
https://gerrit.wikimedia.org/r/c/mediawiki/tools/cli/+/690012/5..6/test

It looks like it does continue to run, but takes quite some time etc
I guess we are going to need to increase the timeout of the job as its slow.

The last job i ran @ https://integration.wikimedia.org/ci/job/mw-cli-test/23/consoleFull started running thigns relating to the tests after 13:28 then timing out at 13:46 (18 mins later)

Installing Composer dependencies (this may take a few minutes) / 
Installing Composer dependencies (this may take a few minutes) - 
Installing Composer dependencies (this may take a few minutes) \ 
Installing Composer dependencies (this may take a few minutes) | Build timed out (after 30 minutes). Marking the build as failed.
13:46:21 Connection to localhost closed by remote host.
13:46:21 Build was aborted
13:46:21 Archiving artifacts

13:46:21 [WS-CLEANUP] Deleting project workspace...
13:46:21 [WS-CLEANUP] Deferred wipeout is used...
13:46:22 [WS-CLEANUP] done
13:46:22 Finished: FAILURE

I still think using github actions would make this project much more appealing to work with.
On my fork currently the whole test suite only takes 4 minuites https://github.com/addshore/mwcli/actions/runs/826066548
This includes 3 end to end tests of the mwdd command, including fetching core, composer dependencies etc, and it is also able to run on a matrix of docker and docker-compose versions.
I think it is unlikely that we will be able to get to that level of assurance within the Jenkins / Cloud VPS CI infrastructure right now.

Change 690012 had a related patch set uploaded (by Addshore; author: Addshore):

[mediawiki/tools/cli@master] Make the "test" script pass

https://gerrit.wikimedia.org/r/690012

First "success"
https://integration.wikimedia.org/ci/job/mw-cli-test/24/consoleFull
With https://gerrit.wikimedia.org/r/690012 applied, more disk space in the image and a 60 min timeout
The test took 10 mins to install dependencies (docker-compose, go etc), 1 min to build the binary, 1 min to clone core, 10 mins to run the start command (which includes pulling docker images), 7 mins to install composer dependencies, 2 mins to run the rest of the tests?

But right now there is still some issue it seems?

00:31:30.088 Success! View MediaWiki-Docker at http://
00:31:30.116 ++ curl -s -L -N http://localhost:8080
00:31:36.324 + CHECK_RESULT='<!DOCTYPE html>
00:31:36.324 <html><head><title>MediaWiki</title><style>body { font-family: sans-serif; margin: 0; padding: 0.5em 2em; }</style></head><body><h1>Sorry! This site is experiencing technical difficulties.</h1><p>Try waiting a few minutes and reloading.</p><p><small>(<span dir="ltr">Cannot access the database: Cannot return last error, no db connection ()</span>)</small></p><p>Backtrace:</p><pre>#0 /var/www/html/w/includes/libs/rdbms/loadbalancer/LoadBalancer.php(999): Wikimedia\Rdbms\LoadBalancer-&gt;reportConnectionError()
00:31:36.324 #1 /var/www/html/w/includes/libs/rdbms/loadbalancer/LoadBalancer.php(964): Wikimedia\Rdbms\LoadBalancer-&gt;getServerConnection(0, '\''my_wiki'\'', 0)

The github actions run of the same file on my fork looked much happier

Success! View MediaWiki-Docker at http://
++ curl -s -L -N http://localhost:8080
+ CHECK_RESULT='<!DOCTYPE html>
<html class="client-nojs" lang="en" dir="ltr">
<head>
<meta charset="UTF-8"/>
<title>MediaWiki</title>
<script>document.documentElement.className="client-js";RLCONF={"wgBreakFrames":!1,"wgSeparatorTransformTable":["",""],"wgDigitTransformTable":["",""],"wgDefaultDateFormat":"dmy","wgMonthNames":["","January","February","March","April","May","June","July","August","September","October","November","December"],"wgRequestId":"08d2f492bf6b47882e17dfe4","wgCSPNonce":!1,"wgCanonicalNamespace":"","wgCanonicalSpecialPageName":!1,"wgNamespaceNumber":0,"wgPageName":"Main_Page","wgTitle":"Main Page","wgCurRevisionId":1,"wgRevisionId":1,"wgArticleId":1,"wgIsArticle":!0,"wgIsRedirect":!1,"wgAction":"view","wgUserName":null,"wgUserGroups":["*"],"wgCategories":[],"wgPageContentLanguage":"en","wgPageContentModel":"wikitext","wgRelevantPageName":"Main_Page","wgRelevantArticleId":1,"wgIsProbablyEditable":!0,"wgRelevantPageIsProbablyEditable":!0,"wgRestrictionEdit":[],"wgRestrictionMove":[],"wgIsMainPage":!0};RLSTATE={"site.styles":"ready","noscript":"ready","user.styles":"ready","user":"ready"

I shaved off about 10 minutes by adding the relevant Debian packages to the preset and preloading the used docker image layers. (docs)
Next run was 22 min instead of 32min: https://integration.wikimedia.org/ci/job/mw-cli-test/26/consoleFull.

There's probably more we can do. Right now we don't yet use Castor in this job, for example, which you could use to do additional caching of any artefact directories.

Our job didn't specify any before, so despite the Jenkins agent having 8 cores and running only one job at once, the VM was only using the default of 1 core. I shaved off another 1-2 minutes by enabling multiple cores on qemu:

https://integration.wikimedia.org/ci/job/mw-cli-test/28/console

20min instead of 22min. Not as much as I'd hoped, but also not that surprising given most steps are serial. It does eliminate one difference with GitHub Actions (which provides 2 cores).

20min instead of 22min. Not as much as I'd hoped, but also not that surprising given most steps are serial. It does eliminate one difference with GitHub Actions (which provides 2 cores).

And of that 20 mins:

  • 1:30 - Waiting for the VM to start (not done on GHA)
  • 5:30 - Installing more packages (not done on GHA)
  • 1:15 - Installing go and docker-compose (not done on GHA)
  • 1:00 - Checkout MediaWiki (0:13 on GHA)
  • 1:00 - Starting Dev Env (0:17 on GHA)
  • 6:00 - Installing composer dependencies (0:30 on GHA)
  • 0:10 - Clone Vector
  • 0:35 - MW Install (0:02 on GHA)

I guess we could save quite some time still by including a composer cache somehow, and also installing the other needed package dependencies?
This could shave off another 10 mins?

I just spent some time looking into how easy it would be to get CI for mwcli to run on Github Actions and report back to gerrit.

I have a working action that can be triggered from a webhook, and will checkout the correct patch and run specified commands, in a job matrix, reporting overall success back to Gerrit & voting.
https://github.com/addshore/testing/blob/4a939317175098a066baf52f036ebf1426585442/.github/workflows/mwcli_gerrit.yml
All this would need on the Jenkins side is for the webhook to be triggered on new commit / patchset.
This can be done easily, and using the github cli it looks like:

gh workflow run mwcli_gerrit.yml -f fetch-repo="https://gerrit.wikimedia.org/r/mediawiki/extensions/WikibaseManifest" -f fetch-ref="refs/changes/58/691258/1" -f report-change="691258,1" -f run-scripts='["echo success1", "echo success2"]'

An example run can be seen @ https://github.com/addshore/testing/actions/runs/844927340

As mwcli is primarily a non critical path piece of software I'd like us to consider this as an option to improve developer experience while working on the CLI & also to allow us to maximise test coverage.
If feedback is generally positive I'll work on adding this to the real repo.

Change 691752 had a related patch set uploaded (by Jforrester; author: Jeena Huneidi):

[integration/config@master] Zuul: [mediawiki/tools/cli] Add new bespoke job

https://gerrit.wikimedia.org/r/691752

Change 683753 merged by jenkins-bot:

[integration/config@master] jjb: Provide mw-cli-test to test docker commands in a VM

https://gerrit.wikimedia.org/r/683753

Change 691752 merged by jenkins-bot:

[integration/config@master] Zuul: [mediawiki/tools/cli] Add new bespoke job

https://gerrit.wikimedia.org/r/691752

Mentioned in SAL (#wikimedia-releng) [2021-05-15T21:40:14Z] <James_F> Zuul: [mediawiki/tools/cli] Add new bespoke job T248779

Change 691753 had a related patch set uploaded (by Jforrester; author: Jforrester):

[integration/config@master] Zuul: [mediawiki/tools/cli] Make mw-cli-test experimental for now

https://gerrit.wikimedia.org/r/691753

Change 691753 merged by jenkins-bot:

[integration/config@master] Zuul: [mediawiki/tools/cli] Make mw-cli-test experimental for now

https://gerrit.wikimedia.org/r/691753

Mentioned in SAL (#wikimedia-releng) [2021-05-15T21:51:22Z] <James_F> Zuul: [mediawiki/tools/cli] Make mw-cli-test experimental for now T248779

I just spent some time looking into how easy it would be to get CI for mwcli to run on Github Actions and report back to gerrit.

I don't want to have GitHub report back to gerrit:

  1. This adds complexity to the mental model of our already tangled gerrit and CI system
  2. Release-Engineering-Team would be called on to fix it when it breaks/maintain it. Even if we had permission, we can't.
  3. This is a strange developer experience for a project on Gerrit: (1) submit to gerrit (2) nothing happens in zuul (3) an unknown bot comments/possibly votes on this patchset (4) link to test results on github

As mwcli is primarily a non critical path piece of software I'd like us to consider this as an option to improve developer experience while working on the CLI & also to allow us to maximise test coverage.
If feedback is generally positive I'll work on adding this to the real repo.

GitHub actions are amazing and their presentation is nice. I don't have opinions about how well this works (it seems to work well). I do have opinions about setting a new precedent for gerrit/CI: I don't want to do that ad hoc.

  • 1:30 - Waiting for the VM to start (not done on GHA)
  • 5:30 - Installing more packages (not done on GHA)
  • 1:15 - Installing go and docker-compose (not done on GHA)

Possible to add those to VM or container base (unclear from my remove) so these steps are not done here?

  • 1:00 - Checkout MediaWiki (0:13 on GHA)
  • 1:00 - Starting Dev Env (0:17 on GHA)
  • 6:00 - Installing composer dependencies (0:30 on GHA)
  • 0:10 - Clone Vector
  • 0:35 - MW Install (0:02 on GHA)

I guess we could save quite some time still by including a composer cache somehow, and also installing the other needed package dependencies?
This could shave off another 10 mins?

That comparison with GHA is no good :( Is it realistic that we'll approach their performance? Is it acceptable to have a smaller targeted subset of gating tests at the commit stage and a larger acceptance suite run post-merge? Does that change the performance picture?

Some thoughts

I don't want to have GitHub report back to gerrit:

  1. This adds complexity to the mental model of our already tangled gerrit and CI system
  2. Release-Engineering-Team would be called on to fix it when it breaks/maintain it. Even if we had permission, we can't.
  3. This is a strange developer experience for a project on Gerrit: (1) submit to gerrit (2) nothing happens in zuul (3) an unknown bot comments/possibly votes on this patchset (4) link to test results on github

It would be possible to have this appear to be running in zuul, and have a jenkins job fire on gerrit change and wait for the result of a github actions run.
I didn't try doing this as IMO this adds extra complexity (in terms of code for the solution to maintain) and a more complex interaction.
In terms of not appearing in zuul, this is also the pattern that the sonarqube stuff uses currently.

This could be done entirely on a per project basis, and I think that mwcli has a strong case for doing something like this.
I also don't think this is something that the releng team should have to "own" or be responsible for when it breaks.
I would say it could easily be owned by the people that own the code repository that it runs in
(That's actually more complicated here, as perhaps for mwcli that is infect releng.)

Anyway, in terms of a maintenance cost I personally believe the thing is not really that much of a maintenance burden.
It's probably less of a maintenance burden than maintaining, scaling, updating, and poking the qmeu setup for efficiency.
Infact, I have already spent much more time trying to make the qmeu setup more performant, and tweaking things so that it even works for the use case, vs the couple of hours that I spent writing up this action with fairly standard bash steps that ultimately we end up using and supporting in the general day to day (sshing to gerrit, setting up a gerrit key, using gerrit CLI, fetching code from gerrit)

The TLDR of the whole setup is a single github action file that is 72 lines long. Roughly split up this action files ends up being:

<edit>
The one additional part of this puzzle that I didn't write yet would be the trigger on jenkins.
This should end up being a single shell command (using the github CLI) and a github token.
This could also be done without the github cli, just sending a POST

This is also the only part of the puzzle that with the current implementation would / could need to be maintained by releng IMO vs the owners of the project that the action gets used for (which in this case as noted perhaps is also releng, but not for other projects)
</edit>

As mwcli is primarily a non critical path piece of software I'd like us to consider this as an option to improve developer experience while working on the CLI & also to allow us to maximise test coverage.
If feedback is generally positive I'll work on adding this to the real repo.

GitHub actions are amazing and their presentation is nice. I don't have opinions about how well this works (it seems to work well). I do have opinions about setting a new precedent for gerrit/CI: I don't want to do that ad hoc.

+1 If you think this would set a new precedent then I agree we shouldn't rush into it.
I'm obviously just seeing the shiney mwcli related side of this which would

  • Cut down CI maintenance
  • Enable us to write more tests for the mwcli project
  • Get CI feedback faster
  • 1:30 - Waiting for the VM to start (not done on GHA)
  • 5:30 - Installing more packages (not done on GHA)
  • 1:15 - Installing go and docker-compose (not done on GHA)

Possible to add those to VM or container base (unclear from my remove) so these steps are not done here?

Yes this can be done, but certainly adds a maintenance burden to the whole thing for the mwcli project maintainers.

  • 1:00 - Checkout MediaWiki (0:13 on GHA)
  • 1:00 - Starting Dev Env (0:17 on GHA)
  • 6:00 - Installing composer dependencies (0:30 on GHA)
  • 0:10 - Clone Vector
  • 0:35 - MW Install (0:02 on GHA)

I guess we could save quite some time still by including a composer cache somehow, and also installing the other needed package dependencies?
This could shave off another 10 mins?

Indeed a composer cache thing for qmeu could also save around 5 mins.
The dream here would be something like being able to mount a cache directory off the host so that it doesnt need to be built into the image and manually maintained?
The same also could be said for the docker image storage.
Maintaining this manually would be a pain, every time a mediawiki core (or some extension that is used in tests) has an updated composer dependency then you would need to reupdate the qmeu image in the current setup.
The same goes for every time a new docker image version is used by any part of the mwcli system. The mwdd command uses far more docker images than the current tests for mw docker.

That comparison with GHA is no good :( Is it realistic that we'll approach their performance?

I remember talking at some point about our CI nodes being able to use VMs from outside of the WMF cloud infrastructure.
I'm 90% sure if we did that we would see a big performance gain across the board, including for this specific usecase.
I'd be very pro this happening, and perhaps this ticket is a good starting point for that conversation again.

But as well as the performance there is the maintenance burden.
Arguably a lot of that probably goes away if we do not need to manually maintain a whole bunch of tweaks etc within a qmeu image to try and get around a performance problem that may not exist if the thing runs somewhere else.
In this case a generic qmeu image would just contain go, docker, docker-compose, but would not need to concern itself with docker image caching or composer caching etc.

Is it acceptable to have a smaller targeted subset of gating tests at the commit stage and a larger acceptance suite run post-merge? Does that change the performance picture?

I did think about this.
This is actually already a pattern that we use for Wikibase development.
We have jobs that run in Github Actions that provide us with a "secondary CI".
It is however a pain that we currently only get the feedback for those after we have merged the thing on gerrit.
On occasion in the past jobs failing there have lead to us then going back and reverting things on gerrit etc, which all ends up being painful. I imagine the mwcli situation would be the same.

If we didn't manage to get to a point where we could have a nice large test suite for mwcli on gerrit, then I'd probably just do the bulk of my development on Github, slowly cherry picking changes to Gerrit for review.

(happy to have a call etc too, but figured there was quite some content, numbers and thoughts that would be beneficial to get down on paper)
As this would allow me to quickly develop with quick CI feedback, and not be blocked so much either waiting, or having the feeling that I can't write the tests that I feel I want to for the project.

Some thoughts

I don't want to have GitHub report back to gerrit:

  1. This adds complexity to the mental model of our already tangled gerrit and CI system
  2. Release-Engineering-Team would be called on to fix it when it breaks/maintain it. Even if we had permission, we can't.
  3. This is a strange developer experience for a project on Gerrit: (1) submit to gerrit (2) nothing happens in zuul (3) an unknown bot comments/possibly votes on this patchset (4) link to test results on github

It would be possible to have this appear to be running in zuul, and have a jenkins job fire on gerrit change and wait for the result of a github actions run.
I didn't try doing this as IMO this adds extra complexity (in terms of code for the solution to maintain) and a more complex interaction.
In terms of not appearing in zuul, this is also the pattern that the sonarqube stuff uses currently.

Indeed. I'm also not sure if it's actually useful to see it in zuul. Sonarqube was a project I really wanted us to have to gain insights into our code, but I don't necessarily think it's a pattern I want to copy—strategic one-off.

This could be done entirely on a per project basis, and I think that mwcli has a strong case for doing something like this.
I also don't think this is something that the releng team should have to "own" or be responsible for when it breaks.
I would say it could easily be owned by the people that own the code repository that it runs in
(That's actually more complicated here, as perhaps for mwcli that is infect releng.)

Hopefully this will infect us—mwcli is a project we're interested in continuing to working on :)

Anyway, in terms of a maintenance cost I personally believe the thing is not really that much of a maintenance burden.
It's probably less of a maintenance burden than maintaining, scaling, updating, and poking the qmeu setup for efficiency.
Infact, I have already spent much more time trying to make the qmeu setup more performant, and tweaking things so that it even works for the use case, vs the couple of hours that I spent writing up this action with fairly standard bash steps that ultimately we end up using and supporting in the general day to day (sshing to gerrit, setting up a gerrit key, using gerrit CLI, fetching code from gerrit)

The TLDR of the whole setup is a single github action file that is 72 lines long. Roughly split up this action files ends up being:

<edit>
The one additional part of this puzzle that I didn't write yet would be the trigger on jenkins.
This should end up being a single shell command (using the github CLI) and a github token.
This could also be done without the github cli, just sending a POST

This is also the only part of the puzzle that with the current implementation would / could need to be maintained by releng IMO vs the owners of the project that the action gets used for (which in this case as noted perhaps is also releng, but not for other projects)
</edit>

This ^ sounds right. There is maintenance and it's about as much as other jobs, but it's different than other jobs: that's my concern. Zuul doesn't really fit this model very well: i.e., a central place to manage your CI jobs does not adapt well to having every job be slightly different.

As mwcli is primarily a non critical path piece of software I'd like us to consider this as an option to improve developer experience while working on the CLI & also to allow us to maximise test coverage.
If feedback is generally positive I'll work on adding this to the real repo.

GitHub actions are amazing and their presentation is nice. I don't have opinions about how well this works (it seems to work well). I do have opinions about setting a new precedent for gerrit/CI: I don't want to do that ad hoc.

+1 If you think this would set a new precedent then I agree we shouldn't rush into it.
I'm obviously just seeing the shiney mwcli related side of this which would

  • Cut down CI maintenance
  • Enable us to write more tests for the mwcli project
  • Get CI feedback faster

I do want to support a self-service (rather than a centralized) CI model that would allow things like this most likely. Our plan for The Future™ in CI is GitLab (cf: T282842 ). Do you have thoughts about how that factor into all of this 🤔 ? (it may not, but I'm curious).

(/me cuts response short, will reply to remainder post-meetings, wanted to get ^ posted)

This ^ sounds right. There is maintenance and it's about as much as other jobs, but it's different than other jobs:

Arguably the bit in this case that would land in relengs laps to maintain would be a call to a webhook firing off a job in github actions (if such a job is configured).
This could be generic for any repo, and also the payload could probably be generic.

But I also get the idea of wanting to not have to use github actions, and I'm all for that!

I do want to support a self-service (rather than a centralized) CI model that would allow things like this most likely. Our plan for The Future™ in CI is GitLab (cf: T282842 ). Do you have thoughts about how that factor into all of this 🤔 ? (it may not, but I'm curious).

I think mwcli could be a good candidate for whatever the self serve CI might look like in T282842

Right now I'm just going to continue developing on github, and cherry picking changes to gerrit and linking to the full CI run on github actions, so I'm not blocked on this really, in terms of developing the thing.
I can't say the same for the folks reviewing most of this code right now though @kostajh & @jeena .

I did some testing on gitlab CI today and it looks and feels promising https://gitlab.wikimedia.org/addshore/test

But most recently my gitlab CI tests ran into some issues due to rate limits

image.png (217×2 px, 58 KB)

I guess I could add a secret for the CI runner / jobs to be able to pull the images more frequently.
Or perhaps this is a reason to move even further away from docker hub, right now though mwcli still makes use of some docker hub hosted images.