Page MenuHomePhabricator

Move dedcode/mwaddlink from github to gerrit
Closed, ResolvedPublic

Description

We should move https://github.com/martingerlach/mwaddlink-query to gerrit (docs), both for code review purposes but also I think this would be a prerequisite to the proposed deployment pipeline solution that we are discussing in T258978.

Note: previously, the endpoint to query the trained model was the API (via toolforge) in https://github.com/martingerlach/mwaddlink-api . We decided to not follow that approach anymore. The API is not maintained anymore.

Event Timeline

kostajh renamed this task from Move mwaddlink from github to gerrit to Move mwaddlink-api from github to gerrit.Sep 15 2020, 10:27 AM
kostajh updated the task description. (Show Details)

@MGerlach how does the following sound to you:

  1. take https://github.com/martingerlach/mwaddlink-query and move utility methods from https://github.com/dedcode/mwaddlink into it
    1. maybe that involves making a small shared library between the two repos, depending on whether the model trainer also needs access to these methods? The overall goal would be to remove https://github.com/martingerlach/mwaddlink-query/blob/main/addlink-query_links.py#L8-L11
  2. you or I can file a request to make a gerrit repo, at which point we should switch to using that for development
  3. Growth engineers can work with you and Release Engineering to set up the Deployment Pipeline scaffolding
  4. Growth engineers can work with you to create a docker-compose / Dockerfile setup for local development with a MySQL backend as proposed by SRE

@kostajh

  1. take https://github.com/martingerlach/mwaddlink-query and move utility methods from https://github.com/dedcode/mwaddlink into it
    1. maybe that involves making a small shared library between the two repos, depending on whether the model trainer also needs access to these methods? The overall goal would be to remove https://github.com/martingerlach/mwaddlink-query/blob/main/addlink-query_links.py#L8-L11

this could work as a temporary solution. the better option would probably be to have a shared library for both the training and the query-part in order to make the parsing is consistent across both. this will probably become more important later as we make tweaks to the training of the model (when seeing what needs improvement when applying to different languages). maybe we can deal with the more general solution later.
I also want to incorporate some of the suggestions mentioned in T258978#6532612

  1. you or I can file a request to make a gerrit repo, at which point we should switch to using that for development

if you help me with requesting a gerrit-repo that would be great (no experience with that yet). also

  1. Growth engineers can work with you and Release Engineering to set up the Deployment Pipeline scaffolding
  2. Growth engineers can work with you to create a docker-compose / Dockerfile setup for local development with a MySQL backend as proposed by SRE

not sure what this means. perhaps best to discuss in person.

MGerlach renamed this task from Move mwaddlink-api from github to gerrit to Move mwaddlink-query from github to gerrit.Oct 13 2020, 1:42 PM
MGerlach updated the task description. (Show Details)

if you help me with requesting a gerrit-repo that would be great (no experience with that yet). also

Pinging Release-Engineering-Team and serviceops with some questions about this process:

First, as I understand it, since we are using the Deployment pipeline we just need a single gerrit repository that will have the .pipeline directory (and not a separate deploy repo as it appears ORES has), is that right? (Question for serviceops I think)

Second, should this go under mediawiki/services/{service-name}, or somewhere else?

And third, @MGerlach, should the service be called something more descriptive than mwaddlink-query? I've proposed the boring name "Link Recommendation Service" but I leave this one up to you :)

if you help me with requesting a gerrit-repo that would be great (no experience with that yet). also

You can just hit the request button on https://www.mediawiki.org/wiki/Gerrit/New_repositories/Requests and it shouldn't take very long.

if you help me with requesting a gerrit-repo that would be great (no experience with that yet). also

Pinging Release-Engineering-Team and serviceops with some questions about this process:

First, as I understand it, since we are using the Deployment pipeline we just need a single gerrit repository that will have the .pipeline directory (and not a separate deploy repo as it appears ORES has), is that right? (Question for serviceops I think)

That's correct, one hope is that the pipeline lessens the necessity for deploy repos. Additionally, there's some setup needed in integration/config to get the pipeline to work with a repo (documented: https://wikitech.wikimedia.org/wiki/Deployment_pipeline/Migration/Tutorial#project-pipelines.yaml)

Second, should this go under mediawiki/services/{service-name}, or somewhere else?

Seems reasonable. I think in the past repos under this namespace were generally owned by the WMF services team so someone may need to fiddle with refs/* owners in gerrit to get it setup correctly (rather than just inherit from mediawiki/services).

@kostajh

  1. take https://github.com/martingerlach/mwaddlink-query and move utility methods from https://github.com/dedcode/mwaddlink into it
    1. maybe that involves making a small shared library between the two repos, depending on whether the model trainer also needs access to these methods? The overall goal would be to remove https://github.com/martingerlach/mwaddlink-query/blob/main/addlink-query_links.py#L8-L11

this could work as a temporary solution. the better option would probably be to have a shared library for both the training and the query-part in order to make the parsing is consistent across both. this will probably become more important later as we make tweaks to the training of the model (when seeing what needs improvement when applying to different languages). maybe we can deal with the more general solution later.
I also want to incorporate some of the suggestions mentioned in T258978#6532612

@MGerlach, maybe it would be easier if we just have a single repository, and then use multiple requirements.txt files so that the code we ship in the production query service doesn't have all of the heavier libraries used for training the model, and the code used for training the model doesn't have the HTTP API libraries, etc? The advantage would be reduced overhead in making updates to code shared across training / querying (e.g. you wouldn't have to update a library, commit and push, then update the training and query repos to use the updated library).

Second, should this go under mediawiki/services/{service-name}, or somewhere else?

Seems reasonable. I think in the past repos under this namespace were generally owned by the WMF services team so someone may need to fiddle with refs/* owners in gerrit to get it setup correctly (rather than just inherit from mediawiki/services).

Thanks @thcipriani. So, @MGerlach since there isn't a requirement or pattern to use mediawiki/services/{name} I think we could just use research/{name-of-service}, once we finalize name-of-service per T261403#6540180

  1. Growth engineers can work with you and Release Engineering to set up the Deployment Pipeline scaffolding
  2. Growth engineers can work with you to create a docker-compose / Dockerfile setup for local development with a MySQL backend as proposed by SRE

not sure what this means. perhaps best to discuss in person.

@MGerlach https://gerrit.wikimedia.org/r/plugins/gitiles/wikibase/termbox/+/refs/heads/master is probably a good example to look at:

  1. It has a [docker-compose.yml configuration](https://gerrit.wikimedia.org/r/plugins/gitiles/wikibase/termbox/+/refs/heads/master/docker-compose.yml) so you can set up the tool locally for development. For us, that would mean bringing up a container with Python (with the HTTP API), and another one with MySQL. You'd open a shell into the Python container to use the command line interface, or you could expose ports from the container to your host so you could make requests via the browser on your local machine.
  2. The deployment pipeline is a way to define a Docker image that we can run in a production, the files go in .pipeline (example from termbox), this is what we'd set up to generate the image that will run in production for handling queries.

https://github.com/dedcode/mwaddlink is now imported at https://gerrit.wikimedia.org/r/plugins/gitiles/research/mwaddlink, so we should use gerrit for pushing code / code review now instead of GitHub.

kostajh renamed this task from Move mwaddlink-query from github to gerrit to Move dedcode/mwaddlink from github to gerrit.Oct 19 2020, 4:56 PM

Change 635070 had a related patch set uploaded (by Kosta Harlan; owner: Kosta Harlan):
[integration/config@master] Add noop job for test/gate-and-submit

https://gerrit.wikimedia.org/r/635070

Change 635070 merged by jenkins-bot:
[integration/config@master] layout: Add noop job for research/mwaddlink test and gate-and-submit

https://gerrit.wikimedia.org/r/635070

Mentioned in SAL (#wikimedia-releng) [2020-10-20T09:39:41Z] <hashar> Reloading Zuul for " layout: Add noop job for research/mwaddlink test and gate-and-submit - https://gerrit.wikimedia.org/r/635070" # T261403