Page MenuHomePhabricator

Update blubber version in inference services images
Open, In Progress, Needs TriagePublic

Description

We are currently under updating the base images on all models from the bullseye to bookworm in https://phabricator.wikimedia.org/T400144 . While tackling that image update we parsed that many of these models are using an old blubber syntax (<= 0.25) while the latest buildkit version is 1.3.1.

There is a need to update blubber syntax in multiple models since we are behind of the latest blubber versions.

Models that are using blubber version <= 0.25.0:

Blubber documentation

Event Timeline

gkyziridis edited subscribers, added: BWojtowicz-WMF; removed: Batorsz.
isarantopoulos renamed this task from Update blubber syntax to Update blubber version in inference services images.Jul 25 2025, 10:00 AM
isarantopoulos moved this task from Ready To Go to Unsorted on the Machine-Learning-Team board.

RevertRisk blubber Update

I am quoting here some comments from: https://phabricator.wikimedia.org/T400266#11034580 related to issues we faced during updating blubber for revertrisk model.

@gkyziridis o/ I think that the problem may be due to setting use-system-site-packages to false, since we explicitly copy from /opt/lib/python/site-packages in the production variant (from the build one).

Thank you for your fast response, much appreciated.
I used the use-system-site-packages: false because the current blubber had the: use-system-flag: false using the old schema.
I opened a new task for tackling that because we want to proceed with the bookworm updates in the rest of the models as well: https://phabricator.wikimedia.org/T400446

Yes yes it makes sense! In Bookworm the Python convention changed, namely the "system" path is by default reserved for Python Debian package, and pip-related installs should go in a venv. I am not 100% sure what is the guideline from Releng, but it may be ok to just create a venv in the blubber file and pip install packages on it (and eventually copy that from the build variant to the prod one).

OKarakaya-WMF changed the task status from Open to In Progress.Oct 15 2025, 12:40 PM
OKarakaya-WMF claimed this task.
OKarakaya-WMF updated the task description. (Show Details)
OKarakaya-WMF moved this task from Ready To Go to In Progress on the Machine-Learning-Team board.

Change #1196622 had a related patch set uploaded (by Ozge; author: Ozge):

[machinelearning/liftwing/inference-services@main] feat: upgrades article quality buildkit 1.x

https://gerrit.wikimedia.org/r/1196622

Change #1196622 merged by Ozge:

[machinelearning/liftwing/inference-services@main] feat: upgrades article quality buildkit 1.x

https://gerrit.wikimedia.org/r/1196622

Change #1196650 had a related patch set uploaded (by Ozge; author: Ozge):

[operations/deployment-charts@master] feat: upgrades article quality buildkit 1.x

https://gerrit.wikimedia.org/r/1196650

Change #1196650 merged by Ozge:

[operations/deployment-charts@master] feat: upgrades article quality buildkit 1.x

https://gerrit.wikimedia.org/r/1196650

articlequality deployed to staging successfully:

ozge@deploy2002:/srv/deployment-charts/helmfile.d/ml-services/experimental$ kubectl get pods
NAME                                                              READY   STATUS    RESTARTS   AGE
article-country-predictor-00011-deployment-7c7457ff56-7hnpv       3/3     Running   0          99d
articlequality-predictor-00007-deployment-7b6588d674-sl9lj        3/3     Running   0          2m29s
edit-check-predictor-00048-deployment-6888f64455-wp6qj            4/4     Running   0          41d
reference-need-predictor-00052-deployment-7449b75f5f-4vxtz        3/3     Running   0          99d
reference-risk-predictor-00003-deployment-754ffd956f-jh56w        3/3     Running   0          99d
revertrisk-wikidata-predictor-default-00018-deployment-74987sjg   3/3     Running   0          99d
ozge@deploy2002:/srv/deployment-charts/helmfile.d/ml-services/experimental$ cd
ozge@deploy2002:~$ ls
articlequalityinput.json  temp.txt
ozge@deploy2002:~$ curl  "https://inference-staging.svc.codfw.wmnet:30443/v1/models/articlequality:predict" -X POST  -d @articlequalityinput.json -i -H "Host: articlequality.experimental.wikimedia.org" -A "articlequality-staging okarakaya@wikimedia.org" --http1.1
HTTP/1.1 200 OK
content-length: 118
content-type: application/json
date: Thu, 16 Oct 2025 11:57:37 GMT
server: istio-envoy
x-envoy-upstream-service-time: 113

{"score":0.10748992767177634,"model_name":"articlequality","model_version":"1","wiki_db":"enwiki","revision_id":12345}ozge@deploy2002:~$

Change #1196912 had a related patch set uploaded (by Ozge; author: Ozge):

[machinelearning/liftwing/inference-services@main] feat: upgrades article descriptions buildkit 1.x

https://gerrit.wikimedia.org/r/1196912

Change #1196912 abandoned by Ozge:

[machinelearning/liftwing/inference-services@main] feat: upgrades article descriptions buildkit 1.x

https://gerrit.wikimedia.org/r/1196912