Page MenuHomePhabricator

Migrate CI job search-xgboost-maven to use a Docker container
Closed, ResolvedPublic


The Jenkins job search-xgboost-maven runs on disposable virtual machines (via Nodepool) it should be migrated to use a Docker container.


jvm-packages/xgboost4j/src/main/java/ml/dmlc/xgboost4j/java/ shells out to python but does not handle a non zero exit code (eg: ImportError: No module named argparse).

Docker job:

The job fails though. Have to be investigated.

Maven is invoked with --file jvm-packages/pom.xml clean verify

NodepoolSuccesssearch-xgboost-maven #23
DockerFailuresearch-xgboost-maven-java8-docker #13

Some differences:


When the tests start, there is a major difference though. Under Docker environment variables seems to be missing:

--- nodepool
+++ docker
 [INFO] --- scalatest-maven-plugin:1.0:test (test) @ xgboost4j-spark ---
 Discovery starting.
 Discovery completed in X milliseconds.
 Run starting. Expected test count is: 43
 Using Spark's default log4j profile: org/apache/spark/
 18/03/19 10:MM:ss WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
 [Stage 0:>                                                          (0 + 0) / 2]
 - tracker should not affect execution result
 - tracker should throw exception if parallelism is not sufficient
+Tracker started, with env={}
+- test consistency and order preservation of dataframe-based model *** FAILED ***

Somehow DMLC_NUM_SERVER=0, DMLC_TRACKER_URI=10.x.y.z, DMLC_TRACKER_PORT=9091. Maybe that is a process that is started in the background and fails on Docker.

They are supposed to be in the environment:

 25 public class RabitTracker implements IRabitTracker {
166   public boolean start(long timeout) {
173     if (startTrackerProcess()) {
174       logger.debug("Tracker started, with env=" + envs.toString());
175       System.out.println("Tracker started, with env=" + envs.toString());

The environment is loaded via a python script:

134   private boolean startTrackerProcess() {
135     try {
136       String trackerExecString = this.addTrackerProperties("python " + tracker_py +
137           " --log-level=DEBUG --num-workers=" + String.valueOf(numWorkers));
139       trackerProcess.set(Runtime.getRuntime().exec(trackerExecString));
140       loadEnvs(trackerProcess.get().getInputStream());
141       return true;
142     } catch (IOException ioe) {
143       ioe.printStackTrace();
144       return false;
145     }
146   }

Using a local checkout of search/xgboost and the container:

$ cd projects/search/xgboost
$ docker run --pull --rm -it --entrypoint=/bin/bash -v "$(pwd):/src"
nobody:/src$ python ./dmlc-core/tracker/dmlc_tracker/ --log-level=DEBUG --=num-workers=1
Traceback (most recent call last):
  File "./dmlc-core/tracker/dmlc_tracker/", line 19, in <module>
    import argparse
ImportError: No module named argparse
$ echo $?

That is because the container has python-minimal installed. From /usr/share/doc/python2.7-minimal/README.Debian, it is stripped from a lot of modules

Event Timeline

hashar triaged this task as High priority.Mar 19 2018, 10:40 AM
hashar created this task.
hashar updated the task description. (Show Details)
hashar added a subscriber: EBernhardson.

Change 420311 had a related patch set uploaded (by Hashar; owner: Hashar):
[integration/config@master] Use a full python for xgboost

Change 420311 merged by jenkins-bot:
[integration/config@master] Use a full python for xgboost

Mentioned in SAL (#wikimedia-releng) [2018-03-19T11:56:01Z] <hashar> Creating docker container | | T190032

00:09:17.674 XGBoostDFSuite:
00:09:24.660 Tracker started, with env={DMLC_NUM_SERVER=0, DMLC_TRACKER_URI=, DMLC_TRACKER_PORT=9091, DMLC_NUM_WORKER=2}

search-xgboost-maven-java8-docker/ passed.

Change 420316 had a related patch set uploaded (by Hashar; owner: Hashar):
[integration/config@master] Migrate xgboost maven job to Docker

Change 420316 merged by jenkins-bot:
[integration/config@master] Migrate xgboost maven job to Docker

hashar claimed this task.

Tested on a dummy change and the build worked \o/