Page MenuHomePhabricator

Set up Pipeline Configuration in WDQS repo
Closed, ResolvedPublic

Description

We need to define the pipeline configuration for WDQS. This is done by creating a config file that tells the pipeline what blubber file to use and what tests to run and what container image to use in production. The pipeline library does not explicitly support Java, so it might be necessary to reach out to Release Engineering to get things working with Java.

Acceptance Criteria:
The WDQS repo has a config.yaml file that defines the pipeline

Event Timeline

Gehel triaged this task as High priority.Oct 28 2020, 1:28 PM

Change 642599 had a related patch set uploaded (by Mstyles; owner: Mstyles):
[wikidata/query/rdf@master] add pipeline config

https://gerrit.wikimedia.org/r/642599

Mstyles added a subscriber: akosiaris.

@akosiaris it was unclear to me whether we need the promote section in the pipeline config. I'm referring to this: https://wikitech.wikimedia.org/wiki/PipelineLib/Reference#Promote and I saw it in a couple of configs here: https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/services/mathoid/+/refs/heads/master/.pipeline/config.yaml#34. Additionally, just so I'm clear, we don't need to do the Jenkins configuration unless we want this to run on every commit (we do not want that). I'm referring to what I saw in the docs: https://wikitech.wikimedia.org/wiki/PipelineLib/Guides/How_to_configure_CI_for_your_project. We just want to be able to rebuild the image whenever we have a new release of the service (on average, once a week)

@akosiaris it was unclear to me whether we need the promote section in the pipeline config. I'm referring to this: https://wikitech.wikimedia.org/wiki/PipelineLib/Reference#Promote and I saw it in a couple of configs here: https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/services/mathoid/+/refs/heads/master/.pipeline/config.yaml#34.

Promote just creates a gerrit change for review after the image has been built. It automates away the step of pushing to gerrit the version bump, essentially. Up to you, whether you want it or not. You can always start without it of course.

Additionally, just so I'm clear, we don't need to do the Jenkins configuration unless we want this to run on every commit (we do not want that). I'm referring to what I saw in the docs: https://wikitech.wikimedia.org/wiki/PipelineLib/Guides/How_to_configure_CI_for_your_project. We just want to be able to rebuild the image whenever we have a new release of the service (on average, once a week)

Hm, that's a first, I am not sure we fully support it. We definitely support triggered building by pushing a new tag, not sure we support skipping CI on every commit. @jeena, @dduvall, input please.

Instead of trying to skip CI for every commit, the easiest thing to do would be to move the pipeline directory into its own repo. It's not using any of the code in the current repo anyways.

Change 642599 abandoned by Mstyles:
[wikidata/query/rdf@master] add pipeline config

Reason:
going to use different repo instead, see 643090

https://gerrit.wikimedia.org/r/642599

Just now getting up to speed on this, so please bear with me if I'm not fully understanding the problem.

PipelineLib and Blubber are primarily meant for managing CI workflows (building images from project source, executing tests, etc.) that respond to changes to the project repo (Gerrit events) in which their configuration files live and result in production deployable images. So this use case may be a bit incongruent with that model. It could also be workable albeit in a way slightly against the grain. Just something to keep in mind.

In general, it seems like you're wanting to:

  1. Keep a published base image for Flink up to date, tracking some stable version from upstream.
  2. Build and verify streaming-updater-producer from source, tracking changes to your project repo.
  3. Build and publish a production deployable image that has Flink installed via the base image and has streaming-updater-producer installed into /opt/flink/usrlib.

Is that right? If so, I suggest:

Managing the Flink base image some other way

Since it seems like you want to bump the version only periodically and based on a version number, I think maybe operations/docker-images/production-images would be right for that, though I see that @akosiaris already mentioned that as a possibility with the downside that every change would require SRE review.

Alternatively, you could move the parts related to building the Flink base image to a separate repo (like you mentioned) and continue to use PipelineLib/Blubber. Again, Blubber is currently very opinionated and geared toward runnable applications built from local source, not for base images that rely on a lot of externally download stuff, but it could work. If you got this route, you'd probably want to create a periodic or manual CI job that invokes the PipelineLib pipeline since it's not responding to change particulate repo changes in Gerrit.

Building and testing streaming-updater-producer via Blubber/PipelineLib

Looks like you're currently fetching the jar from Archiva which has been built by some previous CI job. This is actually the part I would suggest using Blubber for.

Something like:

version: v4
base: docker-registry.wikimedia.org/releng/maven-java8

variants:
  build:
    builder:
      command: [mvn, clean, compile]
      requirements: [.]
    entrypoint: [mvn] # allow for generic use of this variant in running tests, etc.

A .pipeline/config.yaml that defines a test pipeline for this would look something like:

pipelines:
  test: # scheduled to run during test (CR submission) and gate-and-submit (CR+2)
    blubberfile: blubber.yaml
    stages:
      - name: build-and-verify
        build: build
        run:
          arguments: verify # mvn verify
Integrating and verifying the production deployable image using Blubber/PipelineLib

Expand your .pipeline/blubber.yaml to include a production variant that integrates the Flink base image and your packaged JAR.

version: v4
base: docker-registry.wikimedia.org/releng/maven-java8

variants:
  # [build variant from before]
  prep:
    includes: [build]
    builder:
      command: [mvn, clean, package]
  production:
    base: docker-registry.wikimedia.org/flink:3.2.1 # the base image for flink that you've published elsewhere
    copies:
      - from: prep
        source: /srv/app/path/to/your/jar
        destination: /opt/flink/usrlib/

And expand the .pipeline/config.yaml to include publishing:

pipelines:
  # [omitted test pipeline from before]
  publish: # scheduled to run post-merge
    blubberfile: blubber.yaml
    stages:
     - name: publish
       build: production
       publish:
         image: true

This is where you could also include a test deployment to staging with end-to-end tests and/or an automated version bump in deployment-charts via promote.

For #1, that's is correct. We are currently downloading Flink from the internet. I don't think that's the best idea long term, but it seems fine for now.

Currently for #2, "Build and verify streaming-updater-producer from source, tracking changes to your project repo."
That already happens in a different process, via a weekly deploy to Archiva. I don't think we need to replicate that process in the pipeline. We can include it here, but that would mean having to separate the particular streaming-updater-producer project from the rest of the rdf projects. I think that's possible, but not necessary at this time.

In regards to #3, "Build and publish a production deployable image that has Flink installed via the base image and has streaming-updater-producer installed into /opt/flink/usrlib.", that's correct, this is the image we want to use in the helm chart.

I think the separate repository approach should be fine. We would still need to update the streaming-updater-producer jar version number in the blubber file, and that alone should be enough to trigger a change and produce a new image. Ideally, we wouldn't change any of the current build processes, unless we really need to. I don't think we need to. It seems that the solutions that you are recommending, hinge on having a Flink base image, either via a separate repository that we (the search team) create or via the official SRE base images.

The part that is really important to our team is getting our docker image into the WMF docker registry so that we can use it to deploy our Flink cluster to the WMF Kubernetes cluster. Including the Java development process is a nice to have, but there are many Java projects in the same RDF repository that will still be using our established Java build and deploy process. Changing our Java process just for this particular Java project (the streaming-updater-producer) could lead to more complications down the line.

Currently for #2, "Build and verify streaming-updater-producer from source, tracking changes to your project repo."
That already happens in a different process, via a weekly deploy to Archiva. I don't think we need to replicate that process in the pipeline. We can include it here, but that would mean having to separate the particular streaming-updater-producer project from the rest of the rdf projects. I think that's possible, but not necessary at this time.

What makes it problematic to separate the streaming-updater-producer part from the other rdf projects? As I understand it, they are all distinct projects that don't have inter-dependencies, and this project is only to be deployed via docker, so I don't think current build process for other wikidata projects is relevant, but please correct me if I'm wrong here. I also don't think that the weekly deployment cadence is an issue. Ideally, every change to the code in the master repo should trigger a build so that we have a deployable artifact, whether or not we choose to deploy it.

We'd not be replicating the Archiva process because we wouldn't be uploading anything to Archiva.

Besides the reasons for using the pipeline that @dduvall mentioned above, having this manually triggered(?) build step to upload to archiva is going out of our way to avoid automation, which we already have tools to do, and then adding extra manual steps to get the actual deployable artifact.

Finally, the jar file needs to end up in the docker image anyway, so instead of building it, uploading it, and then downloading it into the docker image, which is resource consuming, we should simply build it at image build time. This will also allow us to use caching image layers to our advantage.

The part that is really important to our team is getting our docker image into the WMF docker registry so that we can use it to deploy our Flink cluster to the WMF Kubernetes cluster. Including the Java development process is a nice to have, but there are many Java projects in the same RDF repository that will still be using our established Java build and deploy process. Changing our Java process just for this particular Java project (the streaming-updater-producer) could lead to more complications down the line.

What complications are you thinking of?

Change 643297 had a related patch set uploaded (by Mstyles; owner: Mstyles):
[integration/config@master] add pipeline config to ci

https://gerrit.wikimedia.org/r/643297

Change 643297 merged by jenkins-bot:
[integration/config@master] [rdf-streaming-updater] Add pipeline CI

https://gerrit.wikimedia.org/r/643297

Mentioned in SAL (#wikimedia-releng) [2020-11-24T17:11:35Z] <James_F> Zuul: Install pipeline CI for rdf-streaming-updater T265512

Change 643312 had a related patch set uploaded (by Jforrester; owner: Jforrester):
[integration/config@master] Follow-up 823287e4f: Fix name of repo to flink-rdf-streaming-updater

https://gerrit.wikimedia.org/r/643312

Change 643312 merged by jenkins-bot:
[integration/config@master] Follow-up 823287e4f: Fix name of repo to flink-rdf-streaming-updater

https://gerrit.wikimedia.org/r/643312

The projects are all related and there are dependencies. Separating the streaming-updater-producer would actually be a significant task. I think that would be a separate project to possibly consider in the future.

A few additional notes / context in no particular order:

streaming-updater-producer is strongly coupled with the rest of the wikidata/query/rdf project. It might be possible to externalize it to it's own repo, but it will still have dependencies on the rest of the project. So for most changes in the wikidata/query/rdf project, we'll need to update that streaming-updater project as well. It seems to me that it will be more cumbersome than helpful.

The streaming updater is likely going to require multiple independent deployments, for WDQS, but also for WCQS and potentially other future query services. My (limited) understanding of the usual pipelines is that we expect a 1-to-1 mapping between project source code, packaged artifact and deployment. This assumption is broken in our case, where the project source code generates multiple packaged artifacts, and each artifact generates potentially multiple independent deployments.

We will need to continue to publish jars to Archiva anyway, as those jars are also consumed outside of WMF, by people running their own query service.

Our use case seems to be different enough from the expectations that we probably need to discuss this more synchronously, so that we understand the various constraints and design principles better.

My (limited) understanding of the usual pipelines is that we expect a 1-to-1 mapping between project source code, packaged artifact and deployment.

That is true for usual pipelines, but having met this already for the deployment pipeline, there has been work already to not have that 1-to-1 mapping. Examples of the fact we don't have a 1-to-1 mapping between packaged artifact and deployment is eventgate, which has 4 different installations[1]. Also, depending on the support from the framework used, there isn't necessarily a 1-1 mapping between repo and packaged artifact. It is true that 1 repo needs to be the umbrella one for which configuration is setup, but otherwise that repo can fetch in dependencies. Finally there has been work on the possibility to create more than 1 docker container (the artifact part) from a single repo. However, our testbed's (ORES) team does not have the capacity to test that out and the ORES migration to kubernetes has been declined (for reasons mostly unrelated to the pipeline).

This assumption is broken in our case, where the project source code generates multiple packaged artifacts, and each artifact generates potentially multiple independent deployments.

I think not, at least for the latter as the eventgate example shows. Depending on what exactly the multiple packaged artifacts means, that might work fine as well.

[1] https://gerrit.wikimedia.org/r/plugins/gitiles/operations/deployment-charts/+/refs/heads/master/helmfile.d/services/

The part that is really important to our team is getting our docker image into the WMF docker registry so that we can use it to deploy our Flink cluster to the WMF Kubernetes cluster. Including the Java development process is a nice to have, but there are many Java projects in the same RDF repository that will still be using our established Java build and deploy process. Changing our Java process just for this particular Java project (the streaming-updater-producer) could lead to more complications down the line.

That's important to us, too. We want a smooth process for you all and a setup that's maintainable. In our experience getting other projects into the pipeline, having everything in band is an important requirement for a reliable delivery pipeline. I understand the hesitancy though, and we definitely don't want to complicate the Java processes themselves, just figure out how to containerize them in a way that fits with our pipeline model.

Is losing the publishing of jars one of the concerns? We should be able to containerize the build and publish processes and move it into the pipeline as well. In other words, you can easily define a Blubber variant for the publishing step and we can configure Jenkins to expose the credentials for Archive to your project's pipeline job. You'd no longer have to manage that process manually, and the build/publish jar/publish image processes would all be managed in one place, under your team's control/discretion, and contiguous.

My (limited) understanding of the usual pipelines is that we expect a 1-to-1 mapping between project source code, packaged artifact and deployment.

That is true for usual pipelines, but having met this already for the deployment pipeline, there has been work already to not have that 1-to-1 mapping. Examples of the fact we don't have a 1-to-1 mapping between packaged artifact and deployment is eventgate, which has 4 different installations[1]. Also, depending on the support from the framework used, there isn't necessarily a 1-1 mapping between repo and packaged artifact. It is true that 1 repo needs to be the umbrella one for which configuration is setup, but otherwise that repo can fetch in dependencies. Finally there has been work on the possibility to create more than 1 docker container (the artifact part) from a single repo. However, our testbed's (ORES) team does not have the capacity to test that out and the ORES migration to kubernetes has been declined (for reasons mostly unrelated to the pipeline).

As @akosiaris mentions, the .pipeline/config.yaml allows for handling multiple projects in the same repo. The actual config for this would look something like:

pipelines:
  foo-test:
    directory: foo
    # stages for testing/building/publishing foo
  bar-test:
    directory: bar
    # stages for testing/building/publishing bar

We should probably add a guide for this.

Also, since the directory is actually traversed per stage in the current implementation, we could easily modify pipelinelib's configuration to allow for that directory: directive to be given for different stages in the same pipeline, if that would be helpful in this case. It might be more generally useful. Something like:

pipelines:
  test:
    stages:
      - name: test-foo
        directory: foo
        build: test
        run: true
      - name: test-bar
        directory: bar
        build: test
        run: true
      # ...

Anyway, that's just to say we're up for making changes that would make a pipelinelib migration easier, smoother, and more maintainable. :)

I definitely agree that pipeline lib could be used for Java projects, but adding that current functionality with this project is out of scope. We are moving forward with downloading the jar from archiva as discussed in the meeting on 11/2/2020, but if a Java process for pipeline library is created, we would open to moving towards that in the future.

To summarize the rest of our meeting on 11/2/2020, we also briefly talked about how to manage the flink part of the image, which is currently downloaded when building with blubber. My and @dduvall's suggestion was to build flink into separate image to use as a base image, but the downsides were that SRE might not have the resources to maintain that image.