Page MenuHomePhabricator

[SPIKE] Test MGAD Model on LiftWing
Open, LowPublic

Description

Background

For the initial experiment of machine-generated article descriptions, the model was hosted using Cloud APS and Toolforge. It was migrated recently to Lift Wing by the Machine Learning and Research teams, and is ready for testing by Android engineers in a production setting.

Related tasks:

  • Migrate Machine-generated Article Descriptions from toolforge to liftwing T343123 - comments contain instructions on how to access and documentation T343123#9607328
  • Investigate increased preprocessing latencies on LW of article-descriptions model T358195 - ML Team working, Android team Tracking
  • Put API on Cloud VPS T318384 - task for initial setup
The task
  • Test out the API endpoint directly
    • Record latency (ideally under 3 sec)
    • Verify that it is providing 2 article descriptions per article
  • Establish/document any relevant differences between the Liftwing hosted model and previous model
  • Express your opinion: is the model in a good enough state for us to rerelease this feature?
  • Document implementation steps based on the outcome of the engineering investigation and share with PM before proceeding with Implementation

Event Timeline

HNordeenWMF updated the task description. (Show Details)
HNordeenWMF moved this task from Needs Triage to Up Next on the Wikipedia-Android-App-Backlog board.
This comment was removed by HNordeenWMF.

Putting this in Blocked for now:
I found an issue at the gateway level (T365439) that makes it difficult for us to query the URLs of Lift Wing services without an ugly workaround in our network layer.

Otherwise, from some preliminary testing, the latency of the Lift Wing service for providing generated article descriptions seems to be on par with the previous service on toolforge, and should therefore be perfectly good for us to start consuming.

The gateway API issue was fixed, so we can now continue testing the Lift Wing model. Here is an APK for anyone else who would like to try it:
https://github.com/wikimedia/apps-android-wikipedia/actions/runs/9305222379/artifacts/1552968133

From my testing so far, the latency seems to be in the same range as the previous wmcloud-hosted model. The latency is quite variable depending on the article, but the average seems to be ~3 seconds or less.

Most frequently the suggested descriptions do take around 3 seconds to load.
I found that roughly every 10 articles ->


  • 1 took 8-10 seconds (which felt long)

  • 1-2 took 4-5 seconds 

  • the rest took around 3 seconds 



Is it possible to have all article suggested descriptions to load in around 3 seconds?

Is it possible to have all article suggested descriptions to load in around 3 seconds?

Unfortunately not; there are bound to be outlier articles that cause the model to take a little longer to generate a suggestion, and I'm not sure the model can be optimized much further at the moment.

Yes, just linking back to some old performance data to back up @OTichonova's findings: T343123#9573432. @Dbrant is right and I would advocate for launching. In parallel, the one thing that can be done is to follow up with ML Platform to see how close they are to being able to host on GPUs. That should be a silent change from your perspective if they can (no code updates needed) while having a noticeable impact on latency per T343123#9520331.

@JTannerWMF The logic of showing machine-suggested descriptions is still subject to an ABC test that we had been running originally. Is this still applicable, or can it be removed?