Page MenuHomePhabricator

Create Flink Base Image
Closed, DeclinedPublic

Description

This is an alternative to https://phabricator.wikimedia.org/T266495. Flink release updates quite frequently and every time a new update is released, the previous versions are removed from the download servers. This means that when we do unrelated changes to our streaming updater, that also results in a Flink version upgrade that we might or might not want. Having a base image (or a debian package) allows us to control what version of Flink we use.

Base images are controlled by SRE but anyone can make patches (I think). The repo is here. The flink-rdf-streaming-updater blubberfile would then need to be updated to reference the base image once it's merged. Swift will remain in in the flink-rdf-streaming-updater project as it is specific to that use case. The downside of having a base image owned by SRE means that we are reliant on them to merge any Flink version updates.

AC

  • flink-rdf-streaming-updater project uses Flink base image from production images

Event Timeline

Mstyles renamed this task from Create Base Flink Image to Create Flink BaseImage.Jan 28 2021, 6:52 PM
Mstyles renamed this task from Create Flink BaseImage to Create Flink Base Image.

Given the very well put The downside of having a base image owned by SRE means that we are reliant on them to merge any Flink version updates. state in the task description, I 'd suggest that we don't create a base image but rather fetch the jars and put them in a repository Search controls, e.g. archiva.wikimedia.org. That would avoid unnecessary friction between the teams. That all is assuming T273097 is rejected as an approach.

The problem is that we need more than just the jars. We need the entire tar file. @dcausse suggested people.wikimedia.org but that was considered not a good place for CI to hit

Hmm, I see your point. In that case, if there isn't a good place that Search controls to upload the version of flink that they want, the most prudent way out is the debian package described in T266495

@akosiaris Is your concern with the idea of using a`flink` base image solution mainly just centered around the inefficiency/inconvenience of needing SRE to merge any flink version upgrades? Since we have an embedded SRE on search (me) and to a lesser extent Guillaume, I think it wouldn't be too much of a problem. In general having our dependencies managed by a docker image will make it easier for us to be explicit about what version we're using, and it seems like the default docker-y way of doing things. Is there a technical reason why a base image might not be a good idea?

@akosiaris Is your concern with the idea of using a`flink` base image solution mainly just centered around the inefficiency/inconvenience of needing SRE to merge any flink version upgrades?

I wouldn't call it merely an inconvenience. I would call it a source of potential friction between teams, see below for an elaboration.

Since we have an embedded SRE on search (me) and to a lesser extent Guillaume, I think it wouldn't be too much of a problem.

So, say a bus factor of 1.5 or so? Not ideal, but workable. That being said, it all depends on the urgency, see below.

In general having our dependencies managed by a docker image will make it easier for us to be explicit about what version we're using, and it seems like the default docker-y way of doing things. Is there a technical reason why a base image might not be a good idea?

Unfortunately, even being explicit on the level of docker images won't make it easer. And the reason for that is the source of the content (the flink tar.gz file) will still not be under Search's control (correct me if I am wrong, I might have misunderstood something) but instead on the flink project's servers. What this will mean is that the creation of the base image will succeed only for a short period of time after bumping the version of flink, since the flink project, per my undestanding, removes old versions from their servers.

However that is not the only time we build images. We semi-regularly have to rebuild the entire tree of docker images for some reason, the most usual ones being security updates in the base images, some misconfiguration or some newer features being added to our image building toolkit. What that means is that on the next shellshock/heartbleed/younameit the rebuild breaks. Then, SRE comes knocking on Search's door, asking for a bump the version of flink in that container cause we need to rebuild it. Now, you will obviously point out that Search will probably be relying on some version of the container they can still rely on the old one and upgrade on their own timeline. However, during that timeframe before the upgrade happens is:

  • SRE has an unbuildable image which it does not control and knows little about. So it does what comes naturally, which is complain.
  • Search is running an image that does not have the security upgrades.
  • SRE is pushing for having images without known security vulnerabilities.
  • Search is forced to alter plans and reprioritize upgrading the flink version cause SRE is complaining.

So, a source of friction between teams.

All of this can be solved by just moving the source of the problem (the flink .tar.gz) somewhere that Search controls so they can guarantee that a) it's the version Search wants b) fetching it during the building phase will not fail. But if that happens, most of this discussion becomes moot and the layer that docker image would be is just a potential (depending on whether the layer is present or not on the build host) speedup during building (which we can do, but for the right reasons).

Interestingly, the debian package approach discussed in T266495 does have the 2 attributes outlined above which is why it's preferable to me. It still relies on an SRE of course to upgrade it which is a minus, but it shares this minus with this proposal. Ideally we wouldn't even need that and any member of Search would be able to update it.

@akosiaris I just learned that there are archive links that have all of the Flink packages. I'm proposing that we close both this ticket and https://phabricator.wikimedia.org/T266495 and just use the Flink archive links where we won't have to worry about the packages no longer being available.

@akosiaris I just learned that there are archive links that have all of the Flink packages. I'm proposing that we close both this ticket and https://phabricator.wikimedia.org/T266495 and just use the Flink archive links where we won't have to worry about the packages no longer being available.

Oh, that's nice. +1 on my side.