Page MenuHomePhabricator

Create Blubberfile in WDQS repo
Closed, ResolvedPublic

Description

We need to create a blubberfile in order to use the WMF pipeline to deploy to Kubernetes. Blubberfiles are higher level yaml files that can be used to generate Dockerfiles. The output Dockerfile will be based on the Flink Dockerfile found here

Acceptance Criteria:
Base image is from the list of WMF base images (found here)
The Blubberfile generates a Dockerfile that can run the Flink Job manager with the streaming updater jar

Event Timeline

Mstyles created this task.Oct 14 2020, 4:54 PM

Change 635074 had a related patch set uploaded (by Mstyles; owner: Mstyles):
[wikidata/query/rdf@master] add pipeline directory

https://gerrit.wikimedia.org/r/635074

Mstyles added a comment.EditedOct 19 2020, 8:57 PM

In order to test the image created by the blubberfile, do the following in the rdf repo
cd .pipeline
blubber blubber.yaml production | docker build --tag blubber-flink-test-<version> --file - . -> this builds the image and tags it
docker run blubber-flink-test-<version> -> runs image

container arguments are here: https://ci.apache.org/projects/flink/flink-docs-stable/ops/deployment/docker.html#start-a-job-cluster

The original Flink dockerfile exposes ports, which is not an option in blubber. I'm hoping that any networking/ports issues can be resolved in the helm chart

Change 635743 had a related patch set uploaded (by Alexandros Kosiaris; owner: Alexandros Kosiaris):
[operations/docker-images/production-images@master] Add new java images

https://gerrit.wikimedia.org/r/635743

@akosiaris Can we base a blubber enabled project on a 3rd party docker image, provided on docker hub? I was wondering if we have to replicate original dockerfile here (I'd rather base of their image to reduce future maintenance).

@akosiaris Can we base a blubber enabled project on a 3rd party docker image, provided on docker hub? I was wondering if we have to replicate original dockerfile here (I'd rather base of their image to reduce future maintenance).

No, we don't want to base anything that's running in production on 3rd party images due to a variety of issues with them, ranging from security issues to supply chain attacks and integration with our auditing toolset. That's a decision taken long ago, but you can refer to https://www.mediawiki.org/wiki/Wikimedia_Technical_Talks#Episode_6:_A_Deployment_Pipeline_Overview for an overview on the why and hows.

@akosiaris I see, makes sense. I still would like to solve the issue with replicating the original dockerfile - can we deploy Flink images to our registry - even if we'd need to fork Flink docker repo?

@akosiaris I see, makes sense. I still would like to solve the issue with replicating the original dockerfile - can we deploy Flink images to our registry - even if we'd need to fork Flink docker repo?

Could you elaborate on that a bit? I am not sure I have understood the question. Specifically, what do you mean by "deploy Flink images to our registry"? Whose Flink images? Ours? Sure. 3rd party ones? No, for the aforementioned reasons.

By the way, I 'd strongly suggest to NOT try and replicate the original Dockerfile. We 've consciously and on purpose built Blubber for our infrastructure, to spare everyone from having to deal with Dockerfiles as they are very very easy to get wrong in a multitude of ways and end up creating insecure, misbehaving or non optimized images.

Could you elaborate on that a bit?

Sure, here goes: We are using Apache Flink[1] as a platform for our event processing we do to feed Wikidata Query Service. We've want to move to Flink deployment to Kubernetes, hence this ticket. Apache Flink provides it's own docker image[2] which, in other circumstances, we would build upon. What @Mstyles is doing now is basically replaying work original Flink contributors did for their docker image - which, according to our current knowledge is what we must do.
The actual docker file (with additional entry script) is here [3] - it would be great if we wouldn't need to make sure that we covered everything that is handled here with each Flink update.

I hope that clears it up, I'm terrible at explaining things via text. If you need more context, we could connect over Meet.

[1] https://flink.apache.org/
[2] https://ci.apache.org/projects/flink/flink-docs-stable/ops/deployment/docker.html
[3] https://github.com/apache/flink-docker/tree/master/1.11/scala_2.11-java8-debian

Could you elaborate on that a bit?

Sure, here goes: We are using Apache Flink[1] as a platform for our event processing we do to feed Wikidata Query Service. We've want to move to Flink deployment to Kubernetes, hence this ticket. Apache Flink provides it's own docker image[2] which, in other circumstances, we would build upon. What @Mstyles is doing now is basically replaying work original Flink contributors did for their docker image - which, according to our current knowledge is what we must do.
The actual docker file (with additional entry script) is here [3] - it would be great if we wouldn't need to make sure that we covered everything that is handled here with each Flink update.

Yeah, that's not needed. What I proposed, in the meeting back then (and I may have failed to communicate it clearly), is to use it as an inspiration to solve issues, but not to try and replicate it. That would be a waste of time and resources and just wouldn't work, as it is written with a different mindset, e.g. the use of gosu doesn't make sense in our environment, we don't use EXPOSE, user/group creation is handled by blubber, the docker-entrypoint.sh tries to write to what should be an immutable image etc.

I hope that clears it up, I'm terrible at explaining things via text. If you need more context, we could connect over Meet.

Actually that helped up a lot. Thanks for taking the time to explain it, I hope my answer helps as well.

[1] https://flink.apache.org/
[2] https://ci.apache.org/projects/flink/flink-docs-stable/ops/deployment/docker.html
[3] https://github.com/apache/flink-docker/tree/master/1.11/scala_2.11-java8-debian

Zbyszko added a comment.EditedOct 27 2020, 1:01 PM

I hope my answer helps as well.

Yes it did, thank you! It might've all came from the fact I wasn't present during your first meeting, but now I have much better perspective on how to review the code.

@akosiaris when you get some time, can you please take another look at https://gerrit.wikimedia.org/r/c/wikidata/query/rdf/+/635074

@akosiaris when you get some time, can you please take another look at https://gerrit.wikimedia.org/r/c/wikidata/query/rdf/+/635074

Yes, I will. This hasn't fallen through the cracks, it's just RL catching up with me.

Change 635743 merged by Alexandros Kosiaris:
[operations/docker-images/production-images@master] Add new java images

https://gerrit.wikimedia.org/r/635743

@akosiaris I started using the new Java images that you uploaded. I wasn't able to install gpg in the build process. There are some conflicts. We can skip gpg verification of the Flink tar, but I don't think that's a good idea. I will continue to do some debugging.

Error message:

The following packages have unmet dependencies:
 gpg : Depends: gpgconf (= 2.2.12-1+deb10u1~bpo9+1) but it is not going to be installed
       Depends: libassuan0 (>= 2.5.0) but 2.4.3-2 is to be installed
       Depends: libgpg-error0 (>= 1.35) but 1.26-2 is to be installed
E: Unable to correct problems, you have held broken packages.

@akosiaris I started using the new Java images that you uploaded. I wasn't able to install gpg in the build process. There are some conflicts. We can skip gpg verification of the Flink tar, but I don't think that's a good idea. I will continue to do some debugging.

Error message:

The following packages have unmet dependencies:
 gpg : Depends: gpgconf (= 2.2.12-1+deb10u1~bpo9+1) but it is not going to be installed
       Depends: libassuan0 (>= 2.5.0) but 2.4.3-2 is to be installed
       Depends: libgpg-error0 (>= 1.35) but 1.26-2 is to be installed
E: Unable to correct problems, you have held broken packages.

Yup, I 've left comments on the change about this already. TL;DR, package name is gnupg. I did manage to get a container build correctly with the proposed changes, albeit due to the downloading of flink be prepared for a slow process

Change 643090 had a related patch set uploaded (by Mstyles; owner: Mstyles):
[wikidata/query/flink-rdf-streaming-updater@master] move pipeline directory to separate repo

https://gerrit.wikimedia.org/r/643090

Change 635074 abandoned by Mstyles:
[wikidata/query/rdf@master] add pipeline directory

Reason:
going to use different repo instead, see https://gerrit.wikimedia.org/r/c/wikidata/query/flink-rdf-streaming-updater/ /643090

https://gerrit.wikimedia.org/r/635074

Change 643378 had a related patch set uploaded (by Mstyles; owner: Mstyles):
[integration/config@master] remove test stage

https://gerrit.wikimedia.org/r/643378

Change 643378 abandoned by Mstyles:
[integration/config@master] remove test stage

Reason:
@hashar thank you for explaining the necessity of the test step. I will put a test in!

https://gerrit.wikimedia.org/r/643378

Change 643090 merged by jenkins-bot:
[wikidata/query/flink-rdf-streaming-updater@master] move pipeline directory to separate repo

https://gerrit.wikimedia.org/r/643090

blubberfile is done and the docker image is present in the wikimedia docker repository

Gehel closed this task as Resolved.Dec 14 2020, 2:03 PM