Page MenuHomePhabricator

Create a debian package for scikit-learn
Closed, ResolvedPublic

Description

A recent package exists, https://sources.debian.net/src/scikit-learn/0.16.1-2/debian/control/ It builds the enigmatically named python-sklearn package.

Adapt it for Python 3.

WIP patch, based on debian/0.16.1-2
https://github.com/adamwight/scikit-learn/tree/debian-python3-0.16.1

Event Timeline

awight claimed this task.
awight raised the priority of this task from to Needs Triage.
awight updated the task description. (Show Details)
awight added subscribers: awight, yuvipanda, Aklapper, Halfak.
awight set Security to None.

Mailing list thread: "Trying to build python3 version of scikit-learn, python3-* pkgs come out empty" https://lists.debian.org/debian-python/2014/11/msg00002.html

Aha! A PR for Python 3 support: https://github.com/yarikoptic/scikit-learn/pull/1

Still not out of the woods yet... My rebase of zackw's PR#1 fails some tests, fixture data is missing even though it's included in MANIFEST.in, and present in the build/ directories. Tests pass if run directly.

I like that his patch ports from python_distutils to pybuild, but that might be too big of a leap forward. If we wanted to move to pybuild, it would be more stable to port without adding py3 support. I don't even want to go there, though--my next thought is to split the difference between my more conservative patch and this other one, and borrow the minimum amount of stuff to get the package to build.

awight triaged this task as High priority.Aug 21 2015, 9:18 PM

One doctest and one test are still failing.

This test passes when I run it directly in the build directory. Unfortunate.

======================================================================
ERROR: Test that linear regression also works with sparse data
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/local/lib/python3.4/dist-packages/nose/case.py", line 198, in runTest
    self.test(*self.arg)
  File "/usr/local/lib/python3.4/dist-packages/sklearn/linear_model/tests/test_base.py", line 78, in test_linear_regression_sparse
    ols.fit(X, y.ravel())
  File "/usr/local/lib/python3.4/dist-packages/sklearn/linear_model/base.py", line 359, in fit
    out = lsqr(X, y)
  File "/usr/local/lib/python3.4/dist-packages/scipy/sparse/linalg/isolve/lsqr.py", line 436, in lsqr
    test3 = 1 / acond
ZeroDivisionError: float division by zero

This looks like a sys.path issue, but there's very explicit path magic in conf.py that I imagine is working correctly.

======================================================================
ERROR: Failure: ImportError (No module named github_link)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/nose/loader.py", line 414, in loadTestsFromName
    addr.filename, addr.module)
  File "/usr/local/lib/python2.7/dist-packages/nose/importer.py", line 47, in importFromPath
    return self.importFromDir(dir_path, fqname)
  File "/usr/local/lib/python2.7/dist-packages/nose/importer.py", line 94, in importFromDir
    mod = load_module(part_fqname, fh, filename, desc)
  File "/home/adamw/work/scikit-learn/doc/conf.py", line 26, in <module>
    from github_link import make_linkcode_resolve
ImportError: No module named github_link

Test failure is due to something in scipy, so maybe I'm testing with a different version... See https://github.com/scikit-learn/scikit-learn/issues/3648

Note that we have to build with python{,3}-scipy >= 0.14.1 and some distros will only have 0.14.0, so the custom scipy dev or pip will have to be used to install a newer version.

Current errors, persists with scipy 0.16.0 and sklearn from master:

======================================================================
ERROR: sklearn.gaussian_process.tests.test_gaussian_process.test_2d
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/nose/case.py", line 197, in runTest
    self.test(*self.arg)
  File "/X/scikit-learn/debian/python-sklearn/usr/lib/python2.7/dist-packages/sklearn/gaussian_process/tests/test_gaussian_process.py", line 64, in test_2d
    gp.fit(X, y)
  File "/X/scikit-learn/debian/python-sklearn/usr/lib/python2.7/dist-packages/sklearn/gaussian_process/gaussian_process.py", line 340, in fit
    self._arg_max_reduced_likelihood_function()
  File "/X/scikit-learn/debian/python-sklearn/usr/lib/python2.7/dist-packages/sklearn/gaussian_process/gaussian_process.py", line 732, in _arg_max_reduced_likelihood_function
    iprint=0)
  File "/usr/local/lib/python2.7/dist-packages/scipy/optimize/cobyla.py", line 177, in fmin_cobyla
    **opts)
  File "/usr/local/lib/python2.7/dist-packages/scipy/optimize/cobyla.py", line 244, in _minimize_cobyla
    f = c['fun'](x0, *c['args'])
  File "/X/scikit-learn/debian/python-sklearn/usr/lib/python2.7/dist-packages/sklearn/gaussian_process/gaussian_process.py", line 709, in <lambda>
    log10t[i] - np.log10(self.thetaL[0, i]))
IndexError: index 1 is out of bounds for axis 0 with size 1

======================================================================
ERROR: sklearn.gaussian_process.tests.test_gaussian_process.test_2d_2d
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/nose/case.py", line 197, in runTest
    self.test(*self.arg)
  File "/X/scikit-learn/debian/python-sklearn/usr/lib/python2.7/dist-packages/sklearn/gaussian_process/tests/test_gaussian_process.py", line 95, in test_2d_2d
    gp.fit(X, y)
  File "/X/scikit-learn/debian/python-sklearn/usr/lib/python2.7/dist-packages/sklearn/gaussian_process/gaussian_process.py", line 340, in fit
    self._arg_max_reduced_likelihood_function()
  File "/X/scikit-learn/debian/python-sklearn/usr/lib/python2.7/dist-packages/sklearn/gaussian_process/gaussian_process.py", line 732, in _arg_max_reduced_likelihood_function
    iprint=0)
  File "/usr/local/lib/python2.7/dist-packages/scipy/optimize/cobyla.py", line 177, in fmin_cobyla
    **opts)
  File "/usr/local/lib/python2.7/dist-packages/scipy/optimize/cobyla.py", line 244, in _minimize_cobyla
    f = c['fun'](x0, *c['args'])
  File "/X/scikit-learn/debian/python-sklearn/usr/lib/python2.7/dist-packages/sklearn/gaussian_process/gaussian_process.py", line 709, in <lambda>
    log10t[i] - np.log10(self.thetaL[0, i]))
IndexError: index 1 is out of bounds for axis 0 with size 1

======================================================================
ERROR: sklearn.gaussian_process.tests.test_gaussian_process.test_more_builtin_correlation_models
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/nose/case.py", line 197, in runTest
    self.test(*self.arg)
  File "/X/scikit-learn/debian/python-sklearn/usr/lib/python2.7/dist-packages/sklearn/gaussian_process/tests/test_gaussian_process.py", line 115, in test_more_builtin_correlation_models
    test_2d(regr='constant', corr=corr, random_start=random_start)
  File "/X/scikit-learn/debian/python-sklearn/usr/lib/python2.7/dist-packages/sklearn/gaussian_process/tests/test_gaussian_process.py", line 64, in test_2d
    gp.fit(X, y)
  File "/X/scikit-learn/debian/python-sklearn/usr/lib/python2.7/dist-packages/sklearn/gaussian_process/gaussian_process.py", line 340, in fit
    self._arg_max_reduced_likelihood_function()
  File "/X/scikit-learn/debian/python-sklearn/usr/lib/python2.7/dist-packages/sklearn/gaussian_process/gaussian_process.py", line 732, in _arg_max_reduced_likelihood_function
    iprint=0)
  File "/usr/local/lib/python2.7/dist-packages/scipy/optimize/cobyla.py", line 177, in fmin_cobyla
    **opts)
  File "/usr/local/lib/python2.7/dist-packages/scipy/optimize/cobyla.py", line 244, in _minimize_cobyla
    f = c['fun'](x0, *c['args'])
  File "/X/scikit-learn/debian/python-sklearn/usr/lib/python2.7/dist-packages/sklearn/gaussian_process/gaussian_process.py", line 709, in <lambda>
    log10t[i] - np.log10(self.thetaL[0, i]))
IndexError: index 1 is out of bounds for axis 0 with size 1

======================================================================
ERROR: sklearn.gaussian_process.tests.test_gaussian_process.test_ordinary_kriging
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/nose/case.py", line 197, in runTest
    self.test(*self.arg)
  File "/X/scikit-learn/debian/python-sklearn/usr/lib/python2.7/dist-packages/sklearn/gaussian_process/tests/test_gaussian_process.py", line 124, in test_ordinary_kriging
    test_2d(regr='linear', beta0=[0., 0.5, 0.5])
  File "/X/scikit-learn/debian/python-sklearn/usr/lib/python2.7/dist-packages/sklearn/gaussian_process/tests/test_gaussian_process.py", line 64, in test_2d
    gp.fit(X, y)
  File "/X/scikit-learn/debian/python-sklearn/usr/lib/python2.7/dist-packages/sklearn/gaussian_process/gaussian_process.py", line 340, in fit
    self._arg_max_reduced_likelihood_function()
  File "/X/scikit-learn/debian/python-sklearn/usr/lib/python2.7/dist-packages/sklearn/gaussian_process/gaussian_process.py", line 732, in _arg_max_reduced_likelihood_function
    iprint=0)
  File "/usr/local/lib/python2.7/dist-packages/scipy/optimize/cobyla.py", line 177, in fmin_cobyla
    **opts)
  File "/usr/local/lib/python2.7/dist-packages/scipy/optimize/cobyla.py", line 244, in _minimize_cobyla
    f = c['fun'](x0, *c['args'])
  File "/X/scikit-learn/debian/python-sklearn/usr/lib/python2.7/dist-packages/sklearn/gaussian_process/gaussian_process.py", line 709, in <lambda>
    log10t[i] - np.log10(self.thetaL[0, i]))
IndexError: index 1 is out of bounds for axis 0 with size 1

======================================================================
ERROR: sklearn.gaussian_process.tests.test_gaussian_process.test_random_starts
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/nose/case.py", line 197, in runTest
    self.test(*self.arg)
  File "/X/scikit-learn/debian/python-sklearn/usr/lib/python2.7/dist-packages/sklearn/gaussian_process/tests/test_gaussian_process.py", line 151, in test_random_starts
    verbose=False).fit(X, y)
  File "/X/scikit-learn/debian/python-sklearn/usr/lib/python2.7/dist-packages/sklearn/gaussian_process/gaussian_process.py", line 340, in fit
    self._arg_max_reduced_likelihood_function()
  File "/X/scikit-learn/debian/python-sklearn/usr/lib/python2.7/dist-packages/sklearn/gaussian_process/gaussian_process.py", line 732, in _arg_max_reduced_likelihood_function
    iprint=0)
  File "/usr/local/lib/python2.7/dist-packages/scipy/optimize/cobyla.py", line 177, in fmin_cobyla
    **opts)
  File "/usr/local/lib/python2.7/dist-packages/scipy/optimize/cobyla.py", line 244, in _minimize_cobyla
    f = c['fun'](x0, *c['args'])
  File "/X/scikit-learn/debian/python-sklearn/usr/lib/python2.7/dist-packages/sklearn/gaussian_process/gaussian_process.py", line 709, in <lambda>
    log10t[i] - np.log10(self.thetaL[0, i]))
IndexError: index 1 is out of bounds for axis 0 with size 1

======================================================================
ERROR: sklearn.tests.test_common.test_non_meta_estimators('KernelRidge', <class 'sklearn.kernel_ridge.KernelRidge'>)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/nose/case.py", line 197, in runTest
    self.test(*self.arg)
  File "/X/scikit-learn/debian/python-sklearn/usr/lib/python2.7/dist-packages/sklearn/utils/estimator_checks.py", line 312, in check_dtype_object
    estimator.fit(X, y.astype(object))
  File "/X/scikit-learn/debian/python-sklearn/usr/lib/python2.7/dist-packages/sklearn/kernel_ridge.py", line 159, in fit
    copy)
  File "/X/scikit-learn/debian/python-sklearn/usr/lib/python2.7/dist-packages/sklearn/linear_model/ridge.py", line 145, in _solve_cholesky_kernel
    overwrite_a=False)
  File "/usr/local/lib/python2.7/dist-packages/scipy/linalg/basic.py", line 77, in solve
    b1 = _asarray_validated(b, check_finite=check_finite)
  File "/usr/local/lib/python2.7/dist-packages/scipy/_lib/_util.py", line 140, in _asarray_validated
    raise ValueError('object arrays are not supported')
ValueError: object arrays are not supported

Got past the testing issues, this might have produced a working install by the time I tear off the gift wrapping tomorrow morning.

Woohoooo, that was the ticket! Cleaning up the patches and running tests against installed files now...

One more sort of major thing to iron out, and no idea why this isn't a problem for the python2 package. There is a flurry of downloading that happens during the install step, so we have to subvert that.

Success! I was able to install the deb and run the tests--the only failures were the 6 I mentioned earlier, due to incompatibility with scipy 0.14.0. That should be resolved by either backporting the sklearn packaging, or updating the scipy package.

@yuvipanda: Holler whenever you're ready to build this package, so I can wave lanterns from the jetty. The package we produce is good I believe, but the build process is lacking some safety features.

Probably best to just block on the subtask T110658, which will resolve the version issues.

Needs backporting to 0.15.2

Issues. This is some backcompatibility stuff, which makes me sad.

======================================================================
FAIL: test_warn (sklearn.utils.tests.test_testing.TestWarns)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "sklearn/utils/tests/test_testing.py", line 89, in test_warn
    "assert_warns does not preserver warnings state")
AssertionError: assert_warns does not preserver warnings state

----------------------------------------------------------------------

@Halfak: Please remind me whether it's a pain if we update scikit-learn to 0.16.1 and have to regenerate the serialized models.

@awight, it is a pain, but it is not a critical issue. If we'd be better off making a switch, then let's make it and not make another switch again soon!

Yes, let's switch to sklearn 0.16.1, if only because it's turning out to be an easier packaging port. I'm rebasing my work so far onto that tag...

Current obstacle: gfortran is failing to build shared libraries, it's missing at least the -shared flag.

Shouldn't we just try to use the exact versions that python-sklearn or is that causing problems as well? Packaging a new version dependency matrix sounds like it'll be a bit of a hell...

Seems like we're out of the woods. I was able to build python3-sklearn-0.16.1, and the tests even ran. We're depending on python3-scipy-0.14.0 which is the version that ships with jessie, so no trickery required.