The Jenkins job search-xgboost-maven runs on disposable virtual machines (via Nodepool) it should be migrated to use a Docker container.
TLDR:
jvm-packages/xgboost4j/src/main/java/ml/dmlc/xgboost4j/java/RabitTracker.java shells out to python tracker.py but does not handle a non zero exit code (eg: ImportError: No module named argparse).
Docker job: https://integration.wikimedia.org/ci/job/search-xgboost-maven-java8-docker/
The job fails though. Have to be investigated.
Maven is invoked with --file jvm-packages/pom.xml clean verify
Type | Result | Console |
---|---|---|
Nodepool | Success | search-xgboost-maven #23 |
Docker | Failure | search-xgboost-maven-java8-docker #13 |
Some differences:
Nodepool | Docker | |
Maven | 3.5.0 | 3.5.2 |
gcc | 4.9.2 | 6.3.0 |
When the tests start, there is a major difference though. Under Docker environment variables seems to be missing:
--- nodepool +++ docker [INFO] --- scalatest-maven-plugin:1.0:test (test) @ xgboost4j-spark --- Discovery starting. Discovery completed in X milliseconds. Run starting. Expected test count is: 43 SparkParallelismTrackerSuite: Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties 18/03/19 10:MM:ss WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable [Stage 0:> (0 + 0) / 2] - tracker should not affect execution result - tracker should throw exception if parallelism is not sufficient XGBoostDFSuite: -Tracker started, with env={DMLC_NUM_SERVER=0, DMLC_TRACKER_URI=10.68.23.140, DMLC_TRACKER_PORT=9091, DMLC_NUM_WORKER=2} +Tracker started, with env={} +- test consistency and order preservation of dataframe-based model *** FAILED ***
Somehow DMLC_NUM_SERVER=0, DMLC_TRACKER_URI=10.x.y.z, DMLC_TRACKER_PORT=9091. Maybe that is a process that is started in the background and fails on Docker.
They are supposed to be in the environment:
25 public class RabitTracker implements IRabitTracker { ... 166 public boolean start(long timeout) { ... 173 if (startTrackerProcess()) { 174 logger.debug("Tracker started, with env=" + envs.toString()); 175 System.out.println("Tracker started, with env=" + envs.toString());
The environment is loaded via a python script:
134 private boolean startTrackerProcess() { 135 try { 136 String trackerExecString = this.addTrackerProperties("python " + tracker_py + 137 " --log-level=DEBUG --num-workers=" + String.valueOf(numWorkers)); 138 139 trackerProcess.set(Runtime.getRuntime().exec(trackerExecString)); 140 loadEnvs(trackerProcess.get().getInputStream()); 141 return true; 142 } catch (IOException ioe) { 143 ioe.printStackTrace(); 144 return false; 145 } 146 }
Using a local checkout of search/xgboost and the container:
$ cd projects/search/xgboost $ docker run --pull --rm -it --entrypoint=/bin/bash -v "$(pwd):/src" docker-registry.wikimedia.org/releng/java8-xgboost:0.1.0 nobody:/src$ python ./dmlc-core/tracker/dmlc_tracker/tracker.py --log-level=DEBUG --=num-workers=1 Traceback (most recent call last): File "./dmlc-core/tracker/dmlc_tracker/tracker.py", line 19, in <module> import argparse ImportError: No module named argparse $ echo $? 1 $
That is because the container has python-minimal installed. From /usr/share/doc/python2.7-minimal/README.Debian, it is stripped from a lot of modules