mwaddlink-query into single repository
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	kostajh
	Oct 15 2020, 1:39 PM

Description

From T261403#6541529:

In T261403#6538868, @MGerlach wrote:

@kostajh

take https://github.com/martingerlach/mwaddlink-query and move utility methods from https://github.com/dedcode/mwaddlink into it

maybe that involves making a small shared library between the two repos, depending on whether the model trainer also needs access to these methods? The overall goal would be to remove https://github.com/martingerlach/mwaddlink-query/blob/main/addlink-query_links.py#L8-L11

this could work as a temporary solution. the better option would probably be to have a shared library for both the training and the query-part in order to make the parsing is consistent across both. this will probably become more important later as we make tweaks to the training of the model (when seeing what needs improvement when applying to different languages). maybe we can deal with the more general solution later.
I also want to incorporate some of the suggestions mentioned in T258978#6532612

@MGerlach, maybe it would be easier if we just have a single repository, and then use multiple requirements.txt files so that the code we ship in the production query service doesn't have all of the heavier libraries used for training the model, and the code used for training the model doesn't have the HTTP API libraries, etc? The advantage would be reduced overhead in making updates to code shared across training / querying (e.g. you wouldn't have to update a library, commit and push, then update the training and query repos to use the updated library).

Related Objects
Search...

Status	Assigned	Task
Resolved	MMiller_WMF	T252822 [EPIC] Growth: "add a link" structured task 1.0
Resolved	kostajh	T266437 Add a link engineering: backend product specifications
Resolved	kostajh	T261396 Add a link: engineering tasks for initial release
Resolved	kostajh	T265603 Add Link engineering: Link recommendation service setup
Resolved	MGerlach	T265605 Add Link engineering: Consolidate dedcode/addlink and mgerlach/mwaddlink-query into single repository

Event Timeline

Assigning to you, but if you'd like help with this (reviewing or implementing) let Growth-Team know.

kostajh edited projects, added Growth-Team (Sprint 0 (Growth Team)); removed Growth-Team.Oct 15 2020, 1:41 PM

Looks like the code is in a single repo (and will soon be imported to gerrit, where we should push patches), but leaving this open to implement the multiple requirements.txt approach.

In T265605#6560116, @kostajh wrote:

Looks like the code is in a single repo (and will soon be imported to gerrit, where we should push patches), but leaving this open to implement the multiple requirements.txt approach.

mwaddlink-query is deprecated and all its functionality merged into the main mwaddlink-repo, which is the one repo to be maintained and moved to gerrit (via T261403)

@kostajh are there any naming/structuting conventions for virtual environments in production that I should follow? For example, in the solution described above there will be several requirements-files in a reuqirements-folder, with the requirements.txt in the main folder mirroring the production environment.

In T265605#6561512, @MGerlach wrote:

@kostajh are there any naming/structuting conventions for virtual environments in production that I should follow? For example, in the solution described above there will be several requirements-files in a reuqirements-folder, with the requirements.txt in the main folder mirroring the production environment.

I think it is flexible. Maybe requirements-training.txt and requirements.txt, where the latter is the slimmed down version used for the production query service? They could both be in the root of the repository. Also, AIUI, the production environment wouldn't use a virtual environment; we'd install the libraries during the process of building the Docker image. But we can talk to Release Engineering about it in T265893: Add Link engineering: Deployment Pipeline setup.

kostajh closed this task as Resolved.Oct 22 2020, 10:25 AM

@kostajh at the moment the gerrit-repo contains two requirements-files:

requirements.txt (the full environment required for training and querying)
requirements_query.txt (the lighter environment only for training)

We could easily switch the names according to your suggestion above but if it works either way I would just leave as is. also wanted to check whether there is any dependence to setups?

MGerlach mentioned this in T260206: Add a link: testing API.Oct 23 2020, 3:06 PM

Add Link engineering: Consolidate dedcode/addlink and mgerlach/mwaddlink-query into single repositoryClosed, ResolvedPublicActions

Description

Related ObjectsSearch...

Event Timeline

Add Link engineering: Consolidate dedcode/addlink and mgerlach/mwaddlink-query into single repository
Closed, ResolvedPublic
Actions

Related Objects
Search...