Page MenuHomePhabricator

Goal: Increase the number of models hosted on Lift Wing
Open, LowestPublic

Description

User story: As a product owner, I want the machine learning model that supports my product/feature to be hosted on Lift Wing.

Event Timeline

calbon renamed this task from Increase the number of models hosted on Lift Wing to Goal: Increase the number of models hosted on Lift Wing.Oct 4 2023, 3:19 PM

Update: Language ID model deployed. Diff blog posted. Recommendation API work continues.

1.Isaac from the research team tested the deployed rec-api and shared 2 edge cases:

  • the rec-api wasn't returning results besides 'spec' param, we investigated this in T347475, noticed envoy proxy constraints, and fixed them in T348607.
  • the rec-api was returning empty results when a query was made with 'seed' param not specified, we discovered that the pageviews envoy settings weren't correct and fixed them. We later updated them based on the wp-analytics team notice in T348607#9283681.

2.Started working on migrating the machine-generated article descriptions model from toolforge to LiftWing

calbon triaged this task as Medium priority.Nov 27 2023, 3:56 PM

Working on migrating the machine-generated article descriptions model from toolforge to LiftWing:

  • added article-descriptions model-server to LiftWing inference services repo
  • added CI pipeline jobs to test and publish the article-descriptions model-server image to the Wikimedia docker registry
  • uploaded article-descriptions model files to swift in mbart-large-cc25 and bert-base-multilingual-uncased paths.
  • added the article-descriptions inference service to the experimental namespace on LiftWing
  • fixed the model-server to use the local_files_only parameter to instantiate the pretrained pytorch tokenizer from local files only without having to download from huggingface.co.
  • in T351940#9359437 fixed the AsyncSession host header issue experienced in T351940#9358303.
  • currently working on fixing the wikipedia api summary endpoint as we have to use a k8s internal endpoint to access it.

Working on migrating the machine-generated article descriptions model from toolforge to LiftWing:

  • fixed the wikipedia api summary endpoint. now the model-server uses the http://rest-gateway.discovery.wmnet:4111/en.wikipedia.org/v1/page/summary/Clandonald k8s internal endpoint to access it.
  • built an image and tested the article-descriptions local-runs set up then provided reviews for its patch.
  • ran load tests for the article-descriptions isvc currently hosted in the experimental namespace on LiftWing.
  • communicated to Isaac and Seddon requesting them to test the model-server before we move it from the experimental namespace. so far a prediction bug and latency issues have been reported.
  • investigating prediction bug in article-descriptions model-server.
calbon lowered the priority of this task from Medium to Lowest.Dec 20 2023, 3:35 PM