Had some time to play with this today.
I added the small blurb of documentation that I wish I had when attempting to resolve T203246.
I was able to generate the image docker-registry.wikimedia.org/wikimedia/mediawiki-services-zotero:20181019165254-production which runs, and listens on port 1969. Not sure how to exercise the image. Also, we'll need to add a .pipeline/helm.yaml chart to point it to the correct helm chart whenever that is finalized. For now, I've triggered a pipeline build that skips helm test to create the above image.
Wed, Oct 17
I don't see a service_name in /srv/deployment/proton/deploy/.git/DEPLOY_HEAD. Currently service_name is commented out in /srv/deployment/proton/deploy/scap/scap.cfg.
Tue, Oct 16
Seems like this is something we could do in a few different ways. We could probably work out a way to do this in blubber using a variant. We could create an intermediate docker-pkg image.
Mon, Oct 15
Fri, Oct 12
Thu, Oct 11
Wed, Oct 10
Tue, Oct 9
Thu, Oct 4
The pthread_create failure sounds like we ran out of memory on that machine. @dduvall killed a whole bunch of left-over containers today that CI hadn't cleaned recently.
Can we use docker kill and docker run --name to fix this? That is, could we name each container after $JOB_NAME and $BUILD_NUMBER and then create a post-build job for all docker jobs that runs docker kill to stop the container for a particular job?
Tue, Oct 2
Added test .pipeline files for zotero.
Since there were no databases on the affected boxes that had any schema, I've backed up the ibdata1 file and restarted mysql as there were no hung transactions that were going to be able to finish without their underlying database.
Concept looks good to me. A few problems and questions inline.
Mon, Oct 1
Wed, Sep 26
Tue, Sep 25
Mon, Sep 24
That'll happen when you do a --delete, i.e. the branch directory will sit there empty until then next scap sync is run.
Fri, Sep 21
lgtm! Let's land try this in beta!
Looks good! I did some basic testing in beta and it seems like it'll work. Couple of nitpicks inline. Nice quick work on this :)
I've got a stretch instance called deployment-mwmaint01 running in beta with role::mediawiki_maintenance. I made a couple patches to make this happen: one because we don't have conftool in beta and another because we don't have the ldap-admins group in beta (and openldap::maintenance isn't probably needed on this machine).
https://integration.wikimedia.org/ci/job/maintenance-disconnect-full-disks/4319/console got stuck in the same way I've observed in the period between when I first closed this task and when I deployed the second patch, but this time it aborted on its own.
Sep 20 2018
I ran a no-op scap sync using the patch above to use php7.0
Sep 18 2018
Takes a while on beta because of all the extensions (plus disks are slower than in production where it takes like 20 seconds). IIRC we haven't done much to parallelize any of this, it serially walks the extensions directory. There are probably some easy wins here, but, again, it's been a beta-only problem so it hasn't been a high priority.
Sep 17 2018
Sep 14 2018
Deployed this Thursday and actually did have a build timeout yesterday and nobody had to abort \o/ :
Sep 13 2018
Sep 12 2018
FYI, I've upstreamed my commit to git-fat: https://github.com/jedbrown/git-fat/commit/2fad58fd0631ed8dcb77358bb9b80e4cd091d3fe
Will try to move to php7.0 per discussion on T191921
Sep 11 2018
I've merged a patch to scap to allow setting php_version in scap.cfg that will be used in any call to mwscript.
Poking this for ETA.
Sep 10 2018
Friday I removed the integration-slave-docker-1026 node since it was constantly running out of disk-space and then self-recovering (concurrent running containers eating into /).