Page MenuHomePhabricator

Inference Service pipeline intermittent failures
Closed, ResolvedPublic

Description

The inference service pipelines seem to have an intermittent bug, where some of the images occasionally error when invoking pip during the image builds:

See: https://integration.wikimedia.org/ci/job/inference-services-pipeline-draftquality/36/execution/node/47/log/

/usr/bin/python3.7: Error while finding module specification for 'pip' (AttributeError: module '__main__' has no attribute '__file__')

It seems that this is due to a bug in a recent release of setuptools: https://github.com/pypa/setuptools/issues/3002#issuecomment-1006266993

The bug should be fixed in pip v60.3.1
https://github.com/pypa/setuptools/issues/3002#issuecomment-1006710343

This error doesn't happen on every build (I can do 'recheck' and sometimes it will pass). This makes me think that there is a node using an outdated version of pip somewhere.

Event Timeline

I updated the base image to the most recent version of buster and things seem to work again.

I think we may need to update the base image for all of the images (model-server and transformers) to make sure we are using the most recent version pip.

Change 754015 had a related patch set uploaded (by Accraze; author: Accraze):

[machinelearning/liftwing/inference-services@main] outlink: update base image version

https://gerrit.wikimedia.org/r/754015

Change 754015 merged by jenkins-bot:

[machinelearning/liftwing/inference-services@main] outlink: update base image version

https://gerrit.wikimedia.org/r/754015

Ok I think all images that need to be updated have been updated. Going to mark this as RESOLVED.

ACraze claimed this task.