Page MenuHomePhabricator

Support recommendation API improvements
Closed, ResolvedPublic

Description

This task will capture the Research portion of support that is needed to improve the recommendation API for the Language team.

Event Timeline

leila triaged this task as Medium priority.Apr 3 2023, 10:39 PM
leila created this task.
leila moved this task from Backlog to FY2022-23-Research-April-June on the Research board.

Weekly Updates

  • I collecting information about the current API. I'm analyzing the discussion on this task: T308165

I have meet with @Pginer-WMF and there two main functionalities the Language team wants to add the recommendation API:

  • Section Translation Recommendation (article expansion): The goal here will be to recommend existing articles in the target languages that are missing sections.
  • Topic-based recommendations: Users should be able to chose a topic (using this taxonomy), and receive recommendations within that topic.

This two functionalities will require to develop and implement an algorithm. We (research) need to analyze the resources we have to work on this, in order to provide a plan and schedule to create such models.

Weekly Updates

  • No updates this week.

Weekly Updates

  • I'm testing some heuristics to create Section Translation Recommendations. This model is focused on the expansions of stubs in the target language, using high-quality articles on the source language. I'm planning to add some "impact-related features", such as the number of page views on the target language.

Weekly Updates

  • We are coordinating with ML-Platform team to define responsibilities and tasks on this project.
  • I'll be contacting @kevinbazira this week, and define an action plan.

Weekly Updates

  • I've talked with Kevin, and the availability from the ML team is to move to LifgWing the existing model.
  • If we want to add to add new features, as requested by the Language Team (@Pginer-WMF), Research will need to set aside time and resources for this.
  • Considering our current workload, I think a realistic time for adding those features would be during Q1 23-24.

@diego thanks for the update.

For everyone's visibility: Diego, Kevin, Chris and I just met and coordinated on what's left:

  • As Diego said: Kevin will work on moving recommendation-API to LiftWing. That work is tracked in T308164.
  • ML Platform will work on adding the topic feature: given a topic, pass the list of articles that can be created through translation to the user. Given that we already have topic models available, ML Platform expects no extensive support from Research.
  • We discussed whether the category option should also be offered as part of this batch of work. We understand that the Language team is aware of the limitations of categories and between topics and categories, they prefer topics. As a result, if ML Platform has time, they can implement categories as well but that's to-be-decided. In this case, again, no Research support is expected at this point.

@diego will remain available for support if @kevinbazira or @elukey need support from Research.

@diego please continue tracking weekly work on this task by reporting any support you may offer to ML Platform until the end of the quarter. Thanks!

I checked the deployment-charts repo and found out that we already have a recommendation-api service (fully bootstrapped) on the Wikikube cluster (handled by service ops), and IIUC from https://phabricator.wikimedia.org/T241230 it is the same service the we are discussing here (please correct me if I am wrong). If so I'd avoid moving the service from one k8s cluster to another one, we could concentrate only on the missing features.

Is my understanding correct that we expect the recommendation-api to be fully owned by ML? As in code development, deployments, etc.. (to understand how we should structure/organize things).

Ok something very confusing:

I think that we should try to do the following:

  1. Figure out if the service deployed on Wikikube was requested by Research, and/or who manages/handles it. We shouldn't have services named the same way, since it impacts a lot of configurations etc..
  2. Decide or not if the actual service needs to be undeployed, in favor of the new one.

@diego do you have any context on the above?

Hi @elukey , I don't have much information about this. My information comes from Isaac's analysis (T308165#7983559), but don't know anything about the deployment process.

I think it would be better to double check with @santhosh to be sure which end-point and code are the actually hitting/using.

Weekly updates

  • Current status of this project is detailed on the comment below:

@diego thanks for the update.

For everyone's visibility: Diego, Kevin, Chris and I just met and coordinated on what's left:

  • As Diego said: Kevin will work on moving recommendation-API to LiftWing. That work is tracked in T308164.
  • ML Platform will work on adding the topic feature: given a topic, pass the list of articles that can be created through translation to the user. Given that we already have topic models available, ML Platform expects no extensive support from Research.
  • We discussed whether the category option should also be offered as part of this batch of work. We understand that the Language team is aware of the limitations of categories and between topics and categories, they prefer topics. As a result, if ML Platform has time, they can implement categories as well but that's to-be-decided. In this case, again, no Research support is expected at this point.

@diego will remain available for support if @kevinbazira or @elukey need support from Research.

Hi @elukey , I don't have much information about this. My information comes from Isaac's analysis (T308165#7983559), but don't know anything about the deployment process.

I think it would be better to double check with @santhosh to be sure which end-point and code are the actually hitting/using.

CX is using the API at https://recommend.wmflabs.org/types/translation/ . This is the client code: https://github.com/wikimedia/mediawiki-extensions-ContentTranslation/blob/master/modules/dashboard/ext.cx.recommendtool.client.js
The source code for https://recommend.wmflabs.org is https://gerrit.wikimedia.org/r/plugins/gitiles/research/recommendation-api

@diego @santhosh thanks a lot for the extra context! So the current issue that we are trying to solve is to identify what runs in the current reccomendation-api service that is deployed on K8s Wikikube, and that responds to recommendation-api.discovery.wmnet in production. The main reason to ask is that one of the goals of putting your service in production on k8s is that you'll have an internal endpoint like the aforementioned, that will be called by CX. The main problem now is that we cannot have two services called the same way, and I suspect that the already deployed one was an older version not used anymore.

I found some tasks related to the API:

https://phabricator.wikimedia.org/T170877
https://phabricator.wikimedia.org/T203041
https://phabricator.wikimedia.org/T241230
https://phabricator.wikimedia.org/T148129

https://meta.wikimedia.org/wiki/Research:Technology/Article-Recommendation-Pipeline-Overview

One example could be:

elukey@stat1004:~$ curl -s "https://recommendation-api.discovery.wmnet:4632/commons.wikimedia.org/v1/caption/addition/it" | jq '.'
[
  {
    "pageid": 8190637,
    "ns": 6,
    "title": "File:US Navy 050301-N-1550W-003 Director, Naval Forces Europe, Plans and Operations-Commander, Submarine Group Eight, Rear Adm. Carl V. Mauney, speaks with Sailors.jpg",
    "mime": "image/jpeg",
    "structured": {
      "captions": {}
    },
    "globalusage": {
      "it": []
    }
  },
  {
    "pageid": 12843181,
    "ns": 6,
    "title": "File:Papaver alpinum subsp alpinum s str (AlpMohn) IMG 3067.jpg",
    "mime": "image/jpeg",
    "structured": {
      "captions": {}
    },
    "globalusage": {
      "it": []
    }
  },
[..]

It seems also exposed via restbase etc.. So my questions:

  1. Is the above an older / deprecated version of the new service? Or is it something different?
  2. Depending on what answered above, do we need to change the new of the new service?

Hi @elukey

I've been digging in the https://recommend.wmflabs.org/ instance that Santhosh mentioned above. That is hosted here: tool.recommendation-api.eqiad1.wikimedia.cloud.
I see that the installation code there points to https://gerrit.wikimedia.org/r/research/recommendation-api

About the "discovery" end-point, I don't have any information :(
Maybe if @bmansurov is around, he can shed more light about this.

Anyhow, I don't think that the discovery service is different from he one in the wmfcloud (at least I didn't find any documentation about it)

@leila @diego we are working in T338471 to figure out what to do with the old nodejs application called "recommendation-api", created years ago, but the path to deprecate it may not be slow. Would you consider calling the new recommendation-api differently? Or is it mandatory that it is called in that way?

I am asking since we cannot have, for infrastructure reasons etc.., two apps in k8s named in the same way. It would help a lot if we could come up with a different/new name, so that we'd proceed independently with the deprecation of the old API (it is still used by some clients afaics, very low traffic, but not zero).

@elukey I think is ok to change the name, we just need to notify Language team folks. I think we can go for something like translation-rec-api

Weekly updates

  • As can be read above, there was name-collision between and old recommender-api implementation (in nodejs) and the new one (in python).
  • The ML team is going to upload the newer version in their system, using a new name.

Weekly updates

  • As can be read above, there was name-collision between and old recommender-api implementation (in nodejs) and the new one (in python).
  • The ML team is going to upload the newer version in their system, using a new name.

@diego after a chat with Leila the best way forward seems to be the deprecation of the old one, so we clean up a service that we don't really use. I'll post updates in T338471 :)

@diego we discovered that the current recommendation api is used by some Wikimedia clients, so we cannot deprecate it. We need to find a new name for the service, do you have any ideas?

@elukey I think is ok to change the name, we just need to notify Language team folks. I think we can go for something like translation-rec-api

Missed this one among the conversation - is the new api only available to support the Content Translation team? I am asking since if it may expand its scope in the future then the name could be misleading.

@elukey

@diego we discovered that the current recommendation api is used by some Wikimedia clients, so we cannot deprecate it. We need to find a new name for the service, do you have any ideas?

What about: translation-rec ?

Missed this one among the conversation - is the new api only available to support the Content Translation team? I am asking since if it may expand its scope in the future then the name could be misleading.

To the best of my knowledge, yes.

@elukey is there something else needed from @diego or Research on this front?

I'm going to resolve this task given that we didn't receive any asks. However, if there is something missing, please re-open and we're happy to help.

I'm not sure about the scope of this sub-tusk. In case it helps, the parent task (T293648) contains more details (examples in the Product Requirements Document doc) about the new features needed for recommendations that the current system does not support.