Session
- Track: Deploying and Hosting
- Topic: Continuous Delivery/Deployment in Wikimedia: The Future of the Deployment Pipeline
Description
Quick check in on the state of affairs in the Pipeline work, and more interestingly what are the plans, upcoming challanges and expected timelines
Post-event summary:
Important set of requirements:
- Should be fast
- Make the various tests configurable/gateable so that only a subset can be run if required.
- Mediawiki is missing and is needed.
Post-event action items:
- Investigate how to approach the mediawiki thing.
- Identify which parts of the Add a wiki process are related to the deployment pipeline
- Integration tests support should be added.
Session Attendees
Piotr, James, Amir A., Lars, Jeena, Brennen, Nick, Florian, Guisseppe, [name], [name], ...
Notes:
- T: Check-in on state of affairs on the deployment pipeline work; walk-through on what exists, and we want your feedback on what we haven't considered.
- [1] What it is.
- Repeatable way to build, test, promote, release software - currently implemented using Jenkins, Groovy, and a lot of duct-tape®
- Insufficient for self-serve, but component for self-serve CI and continuous delivery/deployment
- Goals: Get people familiar with current state. Get feedback
- Stats: 15 projects (services) in production from the pipeline (and testing), 4 more (19 total) using for testing
- Encouraging, but since almost all are in a single language / environment, a lot remains to be done.
- Kask (session management service) is written in Golang.
- Blubber itself is written in Golang and tested in its own pipeline.
- So, overall, we're using it for projects in Node and Golang.
- CI Abstractions:
- .pipeline/blubber.yaml - requirements, tests, artifacts
- .pipeline/config.yaml - how tests run
- e.g. run linting stage in parallel with testing
- define tests you want to run and
- helm
- deployment-charts
- A: Based on k8s. Created "YAML Engineering" though... Instead we've adopted Helm. Template things only once, using if/else/for to make somewhat re-usable
- deployment-charts is in gerrit, releases are at https://releases.wikimedia.org/charts
- helmfile.d contains things like non-secret keys, API keys to external services, e.g. Google Translations
- The glue between what is being built, and what is being deployed, is not yet created, and that's what we want your input on.
- T: Wanted to give an overview of moving a service to the pipeline
- [SEE SLIDES OF CODE EXAMPLES]
- 1. define a test entrypoint- .pipeline/blubber.yml
- take our base node.js image, run npm install on package.json in place, and finally npm test
- similar to how travis works, etc.
- 2. tell the pipeline to test - .pipeline/config.yml
- 3. Let's add a linting as well
- 4. Execution graph - run in parallel, for example. Could run a directed graph of dependencies to build, test, and publish your artifacts.
- Today:
- Everything in this example can be done right now
- Gettig it into CI involves poking Service Ops
- Future: Future of Continuous Integration WG; has picked ArgoCI
- Shortcomings / Known unknowns:
- Integration tests
- Language support
- Security embargoes/patches - known issue
- MediaWiki support
- What's needed? What are the unknown unknowns? "You have a project, what's missing in the pipeline to make it happen?"
Group Stage Left
- Postits:
- LZ: SSH into build
- AS: Build images from base images other than those not in WMF registry? -- *No*. It's a security issue
- AS: Can I publish to other docker registries (than wmf?) -- Probably
- Does this affect new wiki creation? -- because someone said it's related, and i don't know how... Considerations about sharding perhaps?
- Does this affect configuration changes such as CommonSettings.php? -- yes, it'l affect how they're done and deployed. Ask James.
- Can this help restore daily location updates? -- Yes!
- Great documentation
- Comprehensibility
- Speed
- Can the pipelines be triggered from places/event other than gerrit merges?
- Is the pattern 1 pipeline per Git repo? (multiple artifacts)??? -- No
- How does a non-English speakers get an idea deployed? -- That's more about the social/political decision of whether or not to deploy something. But kinda relevant because making this process better known.
- How do we define a simple way to understand what is needed for a new feature/tool to be deployed? -- a human-readable page, translated, will make it more possible.
- New languages and wm-projects: we need a way more simple path to be live.
- I want it to deploy when i merge to master
- I want it to deploy my change to a test environment after running tests?
- Is pipeline multi branch? or only master or configurable?
- How are we going to deploy faster and more efficiently?
- How do we rollback automatically?
- Are there one or multiple pipelines per git repo? -- multiple
- Can we build our own pipeline? -- ... in theory?
- Pipeline for stuff to Toolforge would be nice. -- But doesn't use the same images. :-/
- AS: At the moment in the production docker registry that are only used in production. localcharts etc. Whereas labs has an image for multiple PHP versions, etc.
- Liw: [clarifications?]
- AS: either the pipeline has to open up to using those other images, or something else has to open up to using the same process
- JH: blubber in the mediawii docker registry ...
- How many docker registries are there? -- just 2.
- ISSUE: There's no CI build system for Toolforge
- we tried to do that for the query service UI, but had to use images in the registry, and tried to make nginx, but got a bit stuck making that image for our pipeline. But maybe we could do the nginx within the blubber etc?
Group Stage Right
- "MediaWiki is the big one"
- config
- define expectations or requirements
- Documentation for how the pipeline works
- Integration tests vs. deployment?
- Ability to choose which tests run
- Selective test running based on patch
- Selective extensions
- Speed in general
- Config injection without rebuilding containers / deploying config
- Build steps in MediaWiki-land code. Built assets that are fed to ResourceLoader (via WebPack), removing them from the repo (so easier cherry-picks, easier development, easier rebasing).
- Proper canary deployments - not just blue/green testing.
- To be able to easily build with different containers (i.e. for different language versions).
- A group here:
- Easy possibility to run pipeline locally
- ...and/or ability to SSH into the container and inspect the situation
- Ability to run the tests in the same way that CI is running them
- "If something works on my machine but fails in CI / the pipeline, I need to be able to figure out why"
- Some form of end-to-end testing environment
- Temporary environments that can be shared to QA / testers / etc. A link you can send to, for example, a designer to show them you've implemented their design.
- Ability to create a test environment before merge
- nginx can be done in blubber -- Q: if it's in the prod image registry, does Ops maintain it?
- (Discussion of use of pipeline just for testing.)
- Being able to test using the pipeline several parts of one git repo.
- Pipeline defs inside a single repo where some aren't published
- Several pipelines per repo for different purposes
- Exercise: Divide post-its between "nice to have" and "must have".
- [Group discussion]
- GG: Speed came up. Localization. Speed of builds.
- Lars: All the tests need to be run before it goes into production. But when just *trying* something as a dev, you might want to only test a single aspect, quickly.
- GL: Average test run time?
- JF: ::heavy sigh:: about 13 minutes when testing just MW and 30 selected repos. But complicated. Some integration tests actively break each other. Need to test all 200 Wikimedia production repos together but slow and break each other.
- Piotr: size of build artefact file?
- … what is the concern? Pulling locally? Too much network bandwidth?
- R: auditing of package-lock.json
- GL: would it be ok if the pipeline would build something and *submit* a patch?
- P: log the node version and commit patch
- What can we prioritze on the Must Haves board?
- GG: integration tests?
- GL: the ability to test more than one repo together?
- JF: the gate needs to cover all of production
- Leszek: the whole MediaWiki thing is kinda missing.. WMDE perspective, it'd be nice to have a planned strategy for how to get there. If we just stop at this point we're just ditching the whole pipeline idea.
- GL: Our plan for the year was to go on with that, but it was removed from the annual plan.
- JF: we kind of know roughly the steps, but it's a lot of experimental trial and error. Needs resourcing to actually do it, otherwise we're blocked.
- 1. Build containers (configured with the 'right' extensions)
- 2. Inject config into them
- 3. Deploy them [somehow] to a k8s cluster
- 4. Point prod traffic at the new cluster
- AA: Identify which parts of wiki adding procedure are related to pipeline [None of them. That's just a special case of config and deployment dependency management.]