Page MenuHomePhabricator

Add Link engineering: Local environment setup
Closed, ResolvedPublic

Description

We'll want to be able to run the link recommendation service locally alongside MediaWiki-Docker and/or #vagrant.

For the docker set up, we probably want some documentation about how to use the production image with MediaWiki-Docker, but we might also want a Dockerfile in the research/mwaddlink repo for building an image locally. A script to facilitate importing model data into a MySQL database when a container is created is probably also useful.

For vagrant, see T266490: Add Link engineering: Vagrant role.

Remaining work

Event Timeline

For vagrant, probably just set up the service in a local virtualenv and point MediaWiki to it? I don't think we want to run kubelets inside Virtualbox.

For vagrant, probably just set up the service in a local virtualenv and point MediaWiki to it? I don't think we want to run kubelets inside Virtualbox.

It will be a bit more complicated, in that the tool will be reading from a MySQL database instead of SQLite. So the vagrant role will have to support creating a new database and providing the correct configuration to research/mwaddlink to use that database. And we will also have to launch the Flask app and make its port available to GrowthExperiments.

In recent work reviewing / testing out Roan's VE prototype patch and the refreshLinkRecommendations.php script, I had to use a few hacks to get my local environment set up. In no particular order, some pain points:

  • the hasrecommendation:link query in ElasticSearch involve some custom processing (T262226) that doesn't happen in our local environments. Maybe there should be some developer mode in refreshLinkRecommendations.php that issues a POST request to the ElasticSearch instance to set this field directly.
  • refreshLinkRecommendations.php sends the local wiki ID (my_wiki as the name of my local site database) to the link recommendation provider, but instead I want to use a value of e.g. cswiki or arwiki, etc. Need to document a way to fake/override this more cleanly. Maybe it could be an argument to the script.
  • refreshLinkRecommendations.php iterates over articles by ORES topic locally. But the topic data doesn't exist, it needs to be POST'ed manually, see P10461
    • could be nice if there was a script to import several thousand articles from a remote wiki and update the local search index with topic data

hasrecommendation does not exist yet, so it will be interpreted as a plaintext search (which is convenient, you can just add the literal string hasrecommendation:link to the article to get it into the search results). I think even in the long run disabling the search keyword and relying on that will be more convenient than setting the flag in CirrusSearch. The same works for ORES topics as well. That said, a CirrusSearch maintenance script for setting arbitrary fields could be useful - figuring out the exact syntax for doing it manually can be very annoying.

For the wiki name, just renaming the (SQLite) databases worked fine for me. You could also do something like

use GrowthExperiments\NewcomerTasks\AddLink\LinkRecommendationProvider;
use MediaWiki\MediaWikiServices;

$wgHooks['MediaWikiServices'][] = function ( MediaWikiServices $services ) {
    $services->redefineService( 'GrowthExperimentsLinkRecommendationProvider',
        function ( MediaWikiServices $services ): LinkRecommendationProvider {
            return new ServiceLinkRecommendationProvider(
                $services->getTitleFactory(),
                $services->getRevisionLookup(),
                $services->getHttpRequestFactory(),
                $GrowthExperimentsServices::wrap( $services )->getConfig()->get( 'GELinkRecommendationServiceUrl' ),
                'enwiki'
            );
        } );
};

But an override would certainly be more convenient.

hasrecommendation does not exist yet, so it will be interpreted as a plaintext search (which is convenient, you can just add the literal string hasrecommendation:link to the article to get it into the search results).

Sure, but it will soon (T269493 has a patch up). And manually adding the literal string to articles is annoying.

I think even in the long run disabling the search keyword and relying on that will be more convenient than setting the flag in CirrusSearch. The same works for ORES topics as well. That said, a CirrusSearch maintenance script for setting arbitrary fields could be useful - figuring out the exact syntax for doing it manually can be very annoying.

Agreed that a maintenance script for CirrusSearch would be nice. I'll file a task.

Change 657980 had a related patch set uploaded (by Kosta Harlan; owner: Kosta Harlan):
[research/mwaddlink@main] Add docker-compose config for local environment

https://gerrit.wikimedia.org/r/657980

Change 657980 merged by jenkins-bot:
[research/mwaddlink@main] Add docker-compose config for local environment

https://gerrit.wikimedia.org/r/657980

I think the remaining thing here is to review the documentation on wiki and in the repo, then we could close this.

Change 664823 had a related patch set uploaded (by Kosta Harlan; owner: Kosta Harlan):
[research/mwaddlink@main] docker-compose: Switch to flask as default HTTP server

https://gerrit.wikimedia.org/r/664823

Change 665994 had a related patch set uploaded (by Kosta Harlan; owner: Kosta Harlan):
[research/mwaddlink@main] README: Add docker-compose notes

https://gerrit.wikimedia.org/r/665994

Change 664823 merged by jenkins-bot:
[research/mwaddlink@main] docker-compose: Switch to flask as default HTTP server

https://gerrit.wikimedia.org/r/664823

kostajh claimed this task.

Resolving, I don't think there's anything to QA here especially.

Change 665994 merged by jenkins-bot:
[research/mwaddlink@main] README: Add docker-compose notes

https://gerrit.wikimedia.org/r/665994