Setup change-propagation service CI
Open, NormalPublic

Description

mediawiki/services/change-propagation and mediawiki/services/change-propagation/deploy lack Jenkins jobs on WMF CI.

hashar created this task.Dec 8 2016, 2:45 PM
Restricted Application removed a project: Patch-For-Review. · View Herald TranscriptDec 8 2016, 2:45 PM

Change 325933 had a related patch set uploaded (by Hashar):
Jobs for mediawiki/services/change-propagation

https://gerrit.wikimedia.org/r/325933

Change 325933 merged by jenkins-bot:
Jobs for mediawiki/services/change-propagation

https://gerrit.wikimedia.org/r/325933

Change 325934 had a related patch set uploaded (by Hashar):
Jenkins job validation (DO NOT SUBMIT)

https://gerrit.wikimedia.org/r/325934

Change 325935 had a related patch set uploaded (by Hashar):
Jenkins job validation (DO NOT SUBMIT)

https://gerrit.wikimedia.org/r/325935

hashar added a comment.Dec 8 2016, 3:12 PM

Both builds fail with:

> sh test/utils/clean_kafka.sh

test/utils/clean_kafka.sh: 1: test/utils/clean_kafka.sh: nc: not found

That is netcat.

hashar added a comment.Dec 8 2016, 3:21 PM

Trusty images have it because netcat-openbsd is part of the minimal install:

apt-cache rdepends --installed netcat-openbsd
netcat-openbsd
Reverse Depends:
  ubuntu-minimal
  ubuntu-minimal

Change 325937 had a related patch set uploaded (by Hashar):
dib: provision netcat-openbsd

https://gerrit.wikimedia.org/r/325937

Change 325937 merged by jenkins-bot:
dib: provision netcat-openbsd

https://gerrit.wikimedia.org/r/325937

Stashbot added a subscriber: Stashbot.

Mentioned in SAL (#wikimedia-releng) [2016-12-08T15:28:55Z] <hashar> Updating Nodepool Jessie image to ship netcat T151469 T152684

Change 325952 had a related patch set uploaded (by Hashar):
Move change-propagation jobs to experimental

https://gerrit.wikimedia.org/r/325952

Change 325952 merged by jenkins-bot:
Move change-propagation jobs to experimental

https://gerrit.wikimedia.org/r/325952

hashar added a comment.Dec 8 2016, 4:36 PM

I gave it a try on https://gerrit.wikimedia.org/r/#/c/325934/

npm test eventually wants to start kafka via npm run start-kafka. That can surely be done in the clean up script by starting Kafka whenever it is in the Jenkins environment (Jenkins set the $JENKINS_URL environment variable which is quite useful for that purpose).

I guess we will need Kafka provisioned, but maybe npm test installs it already. The $KAFKA_HOME env variable will have to be set when starting the server.

The npm job is in the experimental CI pipeline. You can trigger it for mediawiki/services/change-propagation by commenting in Gerrit: check experimental. See https://gerrit.wikimedia.org/r/#/c/325934/ for an example and the resulting Jenkins build https://integration.wikimedia.org/ci/job/npm-node-4/18905/console

I guess we will need Kafka provisioned, but maybe npm test installs it already.

No, npm test doesn't install kafka. In travis we do that by running the following:

export KAFKA_HOME=../kafka
wget http://www.us.apache.org/dist/kafka/0.9.0.1/kafka_2.10-0.9.0.1.tgz -O kafka.tgz
mkdir -p ${KAFKA_HOME} && tar xzf kafka.tgz -C ${KAFKA_HOME} --strip-components 1
sh test/utils/start_kafka.sh start
Nuria added a comment.Dec 8 2016, 6:43 PM

Sorry to be the party pooper here but this way of doing things seems (to me) very brittle. Imagine scenario of us wanting to verify things are working smoothly with next kafka upgrade (imaginary version 2.0). This is going to take a while cause the upgrade is a pretty major one. At the same time we need to keep our production system running and executing tests on a lower version of kafka, say 0.9.1.

Is there a way to encapsulate the CI config so you could run those two versions of kafka side by side and thus have proper testing of what is deployed in production versus what is upcoming?

@Nuria running tests against the production environment and the moving head of development is a legitimate request. In my experience CI tends to use development version and integration with production versions is checked on the beta cluster (which is a bit too late). I dont think we have any case in CI of running a suite of integration tests against both dev and prod.

What I thought of was to provision in the CI images whatever Kafka version is in apt.wikimedia.org (so that would be production?!). Then run the tests against that. On a second phase, download whatever newer version (we could even get the one from operations/debs/kafka @master) and run tests against it. Thus essentially running the test suite twice.

So a sequence could be:

npm install
npm test
  curl kafka2.0.1.tar.gz | tar xz ../kafka-2.0.1
  for KAFKA in [ '../kafka-2.0.1', '/usr/lib/kafka']
      sh test/utils/start_kafka.sh start
      sh test/utils/clean_kafka.sh
      mocha
      sh test/utils/start_kafka.sh stop

I've made a PR to make the process simpler: https://github.com/wikimedia/change-propagation/pull/147

I've moved the scripts to a separate package that would allow reusing them between change-prop and trendeing service, and after the PR is merged all we need to do in Change-Prop is: npm install; npm run install-kafka; npm run start-kafka; npm test How do I configure Jenkins to do that?

hashar added a comment.Dec 9 2016, 8:41 AM

The Jenkins job only does npm install && npm test.

Either we craft another set of jobs to handle kafka or we get the clean_kafka.sh to attempt to start kafka for us whenever it find Kafka to not be running/having wrong version and being run under Jenkins environment.

The Jenkins job only does npm install && npm test.

Either we craft another set of jobs to handle kafka or we get the clean_kafka.sh to attempt to start kafka for us whenever it find Kafka to not be running/having wrong version and being run under Jenkins environment.

Hm, I can make npm test condition on whether it runs under Jenkins or not and set up the env variables/install kafka/start it/run tests, but I honestly believe that conditioning like this is extremelly ugly. I think that at least ENV variables should be controlled by Jenkins config. Getting back to @Nuria point - what if we want to run tests on 2 kafka versions? I believe the best way to do that is to spawn one more separate job with different values of the KAFKA_VERSION env variable, not hacking up the test script. (Actually, hacking up the test script to run 2 times wouldn't work, because we need to properly clean up the environment before reinstalling kafka and it requires some non-trivial removals of data here and there, stopping kafka is also not a very easy task etc.)

Also, let's not forget that we want to test both node 4 and node 6. It's been an established practice within the services team and it helped us a lot during node upgrades because we're confident the next version will work since we're testing with it from the very beginning.

Nuria added a comment.Dec 9 2016, 6:59 PM

And (call me crazy) couldn't we make jenkins spawn docker containers to simplify this version management? So installed versions are managed inside containers and thus testing an update is testing a branch in which versions of software installed on container are different ones that the ones existing on master branch?

greg added a comment.Dec 9 2016, 7:33 PM

And (call me crazy) couldn't we make jenkins spawn docker containers to simplify this version management?

There's an implied "just" right before "make" in that sentence, and that "just" is quite large :)

But, yes, (spoiler) that is the current plan for CI: to support that very option. It won't be available in weeks, more like months, but yes. We're talking with Ops about the right way of going about this and how it ties in with production deploys (if at all).

/me dangles a carrot

GWicke added a comment.Dec 9 2016, 7:36 PM

@Nuria, we discussed using containers for CI several times before (large thread started by Antoine in June 2014 on the engineering list, for example). It has not happened so far, partly because it is a complex undertaking spanning several teams.

However, with the Kubernetes effort in ops the timing seems to be promising for developing a largely shared docker infrastructure for production, CI, development & third party use soon. I'm working on a proposal, and intend to share it in the next days.

Change 325935 abandoned by Hashar:
Jenkins job validation (DO NOT SUBMIT)

https://gerrit.wikimedia.org/r/325935

Change 325934 abandoned by Hashar:
Jenkins job validation (DO NOT SUBMIT)

https://gerrit.wikimedia.org/r/325934