Page MenuHomePhabricator

Prepare docker image for hosting the logo-detection model-server on LiftWing
Closed, ResolvedPublic2 Estimated Story Points

Description

In T361803, we created a KServe custom model-server for the logo-detection model. To facilitate hosting the model-server on LiftWing, in this task we are going to:

  • Create test and production docker images for the model-server using blubber
  • Add configurations for the image build process into the CI pipeline
  • Publish the production image to the Wikimedia docker registry

Event Timeline

Successfully built the logo-detection model-server docker image locally. Below are the image layers with the largest layer size being ~2.37GB.

$ docker history c7907df16443
IMAGE          CREATED          CREATED BY                                      SIZE      COMMENT
c7907df16443   48 seconds ago   LABEL blubber.variant=production blubber.ver…   0B        buildkit.dockerfile.v0
<missing>      48 seconds ago   ENTRYPOINT ["./entrypoint.sh"]                  0B        buildkit.dockerfile.v0
<missing>      48 seconds ago   COPY common_settings.sh common_settings.sh #…   1.16kB    buildkit.dockerfile.v0
<missing>      48 seconds ago   COPY model_server_entrypoint.sh entrypoint.s…   294B      buildkit.dockerfile.v0
<missing>      48 seconds ago   COPY /opt/lib/python/site-packages /opt/lib/…   2.37GB    buildkit.dockerfile.v0
<missing>      51 seconds ago   COPY python python/ # buildkit                  23.2kB    buildkit.dockerfile.v0
<missing>      20 minutes ago   COPY logo_detection/model_server/. model_ser…   9.1kB     buildkit.dockerfile.v0
<missing>      20 minutes ago   ENV PATH=/opt/lib/python/site-packages/bin:/…   0B        buildkit.dockerfile.v0
<missing>      20 minutes ago   ENV PYTHONPATH=/srv/logo_detection              0B        buildkit.dockerfile.v0
<missing>      20 minutes ago   WORKDIR /srv/logo_detection                     0B        buildkit.dockerfile.v0
<missing>      20 minutes ago   ENV HOME=/home/somebody                         0B        buildkit.dockerfile.v0
<missing>      20 minutes ago   USER 65533                                      0B        buildkit.dockerfile.v0
<missing>      20 minutes ago   RUN |6 LIVES_AS=somebody LIVES_UID=65533 LIV…   9.33kB    buildkit.dockerfile.v0
<missing>      20 minutes ago   ARG RUNS_GID=900                                0B        buildkit.dockerfile.v0
<missing>      20 minutes ago   ARG RUNS_UID=900                                0B        buildkit.dockerfile.v0
<missing>      20 minutes ago   ARG RUNS_AS=runuser                             0B        buildkit.dockerfile.v0
<missing>      20 minutes ago   RUN |3 LIVES_AS=somebody LIVES_UID=65533 LIV…   9.14kB    buildkit.dockerfile.v0
<missing>      20 minutes ago   ARG LIVES_GID=65533                             0B        buildkit.dockerfile.v0
<missing>      20 minutes ago   ARG LIVES_UID=65533                             0B        buildkit.dockerfile.v0
<missing>      20 minutes ago   ARG LIVES_AS=somebody                           0B        buildkit.dockerfile.v0
<missing>      20 minutes ago   RUN /bin/sh -c apt-get update && apt-get ins…   36.7MB    buildkit.dockerfile.v0
<missing>      20 minutes ago   ENV DEBIAN_FRONTEND=noninteractive              0B        buildkit.dockerfile.v0
<missing>      20 minutes ago   ENV HOME=/root                                  0B        buildkit.dockerfile.v0
<missing>      20 minutes ago   USER 0                                          0B        buildkit.dockerfile.v0
<missing>      8 days ago       /bin/sh -c #(nop)  CMD ["/bin/bash"]            0B        
<missing>      8 days ago       /bin/sh -c #(nop)  ENV LC_ALL=C.UTF-8           0B        
<missing>      8 days ago       /bin/sh -c #(nop) ADD file:99ecae923f6a2afcc…   80.6MB    

We have previously faced issues while pushing images with layer sizes > 2GBs to the docker registry. I am going to investigate which package is installing the largest files and see if there is a lighter alternative available.

The tensorflow python package is the main contributor to the largest layer size indicated above. The installation of this package includes several large files, with the largest being ~965MBs as shown below:

$ du -hs /opt/lib/python/site-packages/tensorflow/* | sort -h | tail
580K	/opt/lib/python/site-packages/tensorflow/dtensor
612K	/opt/lib/python/site-packages/tensorflow/tools
6.3M	/opt/lib/python/site-packages/tensorflow/_api
6.7M	/opt/lib/python/site-packages/tensorflow/core
22M	/opt/lib/python/site-packages/tensorflow/lite
31M	/opt/lib/python/site-packages/tensorflow/compiler
104M	/opt/lib/python/site-packages/tensorflow/python
130M	/opt/lib/python/site-packages/tensorflow/libtensorflow_framework.so.2
422M	/opt/lib/python/site-packages/tensorflow/include
965M	/opt/lib/python/site-packages/tensorflow/libtensorflow_cc.so.2

Since this model-server will be running on CPU until the GPU procurement is complete, tensorflow has been replaced with tensorflow-cpu which installs smaller files:

$ du -hs /opt/lib/python/site-packages/tensorflow/* | sort -h | tail
580K	/opt/lib/python/site-packages/tensorflow/dtensor
612K	/opt/lib/python/site-packages/tensorflow/tools
6.2M	/opt/lib/python/site-packages/tensorflow/core
6.3M	/opt/lib/python/site-packages/tensorflow/_api
21M	/opt/lib/python/site-packages/tensorflow/lite
30M	/opt/lib/python/site-packages/tensorflow/compiler
48M	/opt/lib/python/site-packages/tensorflow/libtensorflow_framework.so.2
100M	/opt/lib/python/site-packages/tensorflow/python
174M	/opt/lib/python/site-packages/tensorflow/include
556M	/opt/lib/python/site-packages/tensorflow/libtensorflow_cc.so.2

The largest layer size for the logo-detection model-server docker image has been reduced from ~2.37GBs to ~1.61GB as shown below:

$ docker history 504c84534141
IMAGE          CREATED              CREATED BY                                      SIZE      COMMENT
504c84534141   About a minute ago   LABEL blubber.variant=production blubber.ver…   0B        buildkit.dockerfile.v0
<missing>      About a minute ago   ENTRYPOINT ["./entrypoint.sh"]                  0B        buildkit.dockerfile.v0
<missing>      About a minute ago   COPY common_settings.sh common_settings.sh #…   1.16kB    buildkit.dockerfile.v0
<missing>      About a minute ago   COPY model_server_entrypoint.sh entrypoint.s…   294B      buildkit.dockerfile.v0
<missing>      About a minute ago   COPY /opt/lib/python/site-packages /opt/lib/…   1.61GB    buildkit.dockerfile.v0
<missing>      50 minutes ago       COPY python python/ # buildkit                  23.2kB    buildkit.dockerfile.v0
<missing>      50 minutes ago       COPY logo_detection/model_server/. model_ser…   9.11kB    buildkit.dockerfile.v0
<missing>      3 hours ago          ENV PATH=/opt/lib/python/site-packages/bin:/…   0B        buildkit.dockerfile.v0
<missing>      3 hours ago          ENV PYTHONPATH=/srv/logo_detection              0B        buildkit.dockerfile.v0
<missing>      3 hours ago          WORKDIR /srv/logo_detection                     0B        buildkit.dockerfile.v0
<missing>      3 hours ago          ENV HOME=/home/somebody                         0B        buildkit.dockerfile.v0
<missing>      3 hours ago          USER 65533                                      0B        buildkit.dockerfile.v0
<missing>      3 hours ago          RUN |6 LIVES_AS=somebody LIVES_UID=65533 LIV…   9.33kB    buildkit.dockerfile.v0
<missing>      3 hours ago          ARG RUNS_GID=900                                0B        buildkit.dockerfile.v0
<missing>      3 hours ago          ARG RUNS_UID=900                                0B        buildkit.dockerfile.v0
<missing>      3 hours ago          ARG RUNS_AS=runuser                             0B        buildkit.dockerfile.v0
<missing>      3 hours ago          RUN |3 LIVES_AS=somebody LIVES_UID=65533 LIV…   9.14kB    buildkit.dockerfile.v0
<missing>      3 hours ago          ARG LIVES_GID=65533                             0B        buildkit.dockerfile.v0
<missing>      3 hours ago          ARG LIVES_UID=65533                             0B        buildkit.dockerfile.v0
<missing>      3 hours ago          ARG LIVES_AS=somebody                           0B        buildkit.dockerfile.v0
<missing>      3 hours ago          RUN /bin/sh -c apt-get update && apt-get ins…   36.7MB    buildkit.dockerfile.v0
<missing>      3 hours ago          ENV DEBIAN_FRONTEND=noninteractive              0B        buildkit.dockerfile.v0
<missing>      3 hours ago          ENV HOME=/root                                  0B        buildkit.dockerfile.v0
<missing>      3 hours ago          USER 0                                          0B        buildkit.dockerfile.v0
<missing>      8 days ago           /bin/sh -c #(nop)  CMD ["/bin/bash"]            0B        
<missing>      8 days ago           /bin/sh -c #(nop)  ENV LC_ALL=C.UTF-8           0B        
<missing>      8 days ago           /bin/sh -c #(nop) ADD file:99ecae923f6a2afcc…   80.6MB  

Change #1019773 had a related patch set uploaded (by Kevin Bazira; author: Kevin Bazira):

[machinelearning/liftwing/inference-services@main] logo-detection: containerize model-server

https://gerrit.wikimedia.org/r/1019773

Change #1019774 had a related patch set uploaded (by Kevin Bazira; author: Kevin Bazira):

[integration/config@master] inference-services: add CI pipeline jobs for logo-detection model-server

https://gerrit.wikimedia.org/r/1019774

Change #1019774 merged by jenkins-bot:

[integration/config@master] inference-services: add CI pipeline jobs for logo-detection model-server

https://gerrit.wikimedia.org/r/1019774

Change #1019773 merged by jenkins-bot:

[machinelearning/liftwing/inference-services@main] logo-detection: containerize model-server

https://gerrit.wikimedia.org/r/1019773

The logo-detection model-server has been containerized and added to the CI pipeline which published it successfully to the Wikimedia docker registry: https://docker-registry.wikimedia.org/wikimedia/machinelearning-liftwing-inference-services-logo-detection/tags/
In T362749, we are going to deploy this model-server to LiftWing staging for user acceptance testing.