The Jenkins job [[ https://integration.wikimedia.org/ci/job/search-xgboost-maven/ | search-xgboost-maven ]] runs on disposable virtual machines (via Nodepool) it should be migrated to use a Docker container.
Docker job: https://integration.wikimedia.org/ci/job/search-xgboost-maven-java8-docker/
The job fails though. Have to be investigated.
Maven is invoked with `--file jvm-packages/pom.xml clean verify`
| Type | Result | Console
|--|--|--
| Nodepool | Success | [[ https://integration.wikimedia.org/ci/job/search-xgboost-maven/23/consoleFull | search-xgboost-maven #23 ]]
| Docker | Failure | [[ https://integration.wikimedia.org/ci/job/search-xgboost-maven-java8-docker/13/consoleFull | search-xgboost-maven-java8-docker #13 ]]
Some differences:
| | Nodepool | Docker
| Maven | 3.5.0 | 3.5.2
| gcc | 4.9.2 | 6.3.0
When the tests start, there is a major difference though. Under Docker environment variables seems to be missing:
```
lang=diff
--- nodepool
+++ docker
[INFO] --- scalatest-maven-plugin:1.0:test (test) @ xgboost4j-spark ---
Discovery starting.
Discovery completed in X milliseconds.
Run starting. Expected test count is: 43
SparkParallelismTrackerSuite:
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
18/03/19 10:MM:ss WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
[Stage 0:> (0 + 0) / 2]
- tracker should not affect execution result
- tracker should throw exception if parallelism is not sufficient
XGBoostDFSuite:
-Tracker started, with env={DMLC_NUM_SERVER=0, DMLC_TRACKER_URI=10.68.23.140, DMLC_TRACKER_PORT=9091, DMLC_NUM_WORKER=2}
+Tracker started, with env={}
+- test consistency and order preservation of dataframe-based model *** FAILED ***
```
Somehow `DMLC_NUM_SERVER=0`, `DMLC_TRACKER_URI=10.x.y.z`, `DMLC_TRACKER_PORT=9091`. Maybe that is a process that is started in the background and fails on Docker.