Page MenuHomePhabricator

Move backend of ORES MediaWiki extension to Lift Wing
Open, Needs TriagePublic

Description

The idea is to keep the current ORES extension active on MediaWiki deployments, migrating its backend calls from ORES to Lift Wing.

Some notes:

  • As described in T312518#8108190, the PHP code change shouldn't be too hard. The extra difficult point is that we'll need to add meaningful HTTP Host headers to the extentions' HTTP calls.
  • To ease the transition and the code change, we should make the revscoring-models' output equal to the ORES one. See T318932.

The goal of the task is to update the ORES extension code to support Lift Wing, and then configure it via MediaWiki deployments.

Event Timeline

Umherirrender renamed this task from Move ORES MediaWiki extension to Lift Wing to Move backend of ORES MediaWiki extension to Lift Wing.Oct 4 2022, 4:53 PM

There is a caveat that I just realized today :)

https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/extensions/ORES/+/refs/heads/master/includes/ORESService.php#96 shows how the ORES extension gets the link to use, and afaics it is something like: https://ores.wikimedia.org/v3/scores/enwiki/14563

The main trouble with the above link is that ORES returns a list of scores in return, and IIUC the MediaWiki extension is configured to pick up one or more from the list. We don't have a similar endpoint in Lift Wing, so multiple calls will be needed (done in the background by an async task). Once all the scores are returned, they should be able to be "packed" into a single row in the MediaWiki database.

It seems to me that there are two main changes that will enable the transition to Lift Wing:

  • ORESService.php: as @elukey notes above we need to be able to answer which models are available for each wiki (https://ores.wikimedia.org/v3/scores/enwiki/). Since there is no such thing in Lift Wing we could save these data in a configuration file -> load once and then serve.
  • ScoreFetcher.php: the function getScores will be modified to perform multiple async calls to liftwing according to the parameters passed and we will create a helper function that will merge the result into what it is now.

Ofc there will be other things we'll need to change so let me know if I am missing something big. (also remove references and calls to ORES precache)

We set up a local development environment for MediaWiki + ORES extension using the following guide:

MediaWiki Extension

1. Install mediawiki with docker

Clone MediaWiki repo

bash
git clone ssh://<USERNAME>@gerrit.wikimedia.org:29418/mediawiki/core.git mediawiki

add this in docker-compose.override.yml to use mysql and expose the port to our localhost

bash
version: '3.7'
services:
  database:
    image: mariadb
    environment:
      MYSQL_ALLOW_EMPTY_PASSWORD: 1
    volumes:
      - ./cache/dbdata:/var/lib/mysql
    ports:
      - "3306:3306"
volumes:
  dbdata:
    driver: local
bash
docker compose up

We run the following command to install and configure mediawiki. We name the database as enwiki otherwise ORES service won’t be able to find a model for a wiki name that doesn’t exist. Alternatively we could point to another service that would have such a wikiname e.g. a local running instance of ores-legacy app.

bash
docker compose exec mediawiki /bin/bash -c 'php maintenance/install.php --server=http://localhost:8080 --scriptpath="/w"  --dbuser root --dbserver database --dbname enwiki --lang en --skins=Vector --with-extensions --pass !Q@W#E$R%T^Y enwiki isaranto'

2. Install ORES

bash
cd mediawiki/core

git clone "https://gerrit.wikimedia.org/r/mediawiki/skins/Vector" skins/Vector

git clone "https://gerrit.wikimedia.org/r/mediawiki/extensions/ORES" extensions/ORES

Add the following code at the bottom of your LocalSettings.php:

bash
wfLoadSkin( 'Vector' );

wfLoadExtension( 'ORES' );

3. Score existing revisions

Score revisions and populate database tables

bash
docker compose exec mediawiki php extensions/ORES/maintenance/PopulateDatabase.php

I have added the above guide to the ORES extension documentation under the install section. Link to guide

I've thought how we could tackle this and there are 3 strategies I can think of:

  1. Extend ORESServices objects to allow them to use Lift Wing: this would require a big amount of refactoring work to make the current services extensible.
  2. Change current code to use Lift Wing instead of ORES. There is a big downside to this that in order to rollback the extension to use ORES (in case something goes wrong in early phases) we would have to revert the commit back and forth (not preferrable)
  3. Create duplicate classes that use Lift Wing. This way we just use a different configuration file (extension.json) for LiftWing and changing between Lift Wing/ORES is simply switching configuration file. Downside is that we will have a lot of duplicate code (with just some renaming) but the plan is to delete this code along when ORES is deprecated completely. This will make reviews difficult but will allow us to have speed of execution and an easy way to manage/rollback.

So the plan as discussed within the team is to proceed with the 3rd option.

Another thing we need to take care is to temporarily cover some of the functionalities that ORES covers and are related to the old revscoring models e.g. getting thresholds for models https://ores.wikimedia.org/v3/scores/enwiki/?models=damaging&model_info=statistics.thresholds.true.%22maximum+recall+@+precision+%3E=+0.9%22
To deal with this we can extract this information and either save it in ORES legacy app or just as files in the ORES extension module repo.

One thing that needs to be taken care of is the following:
ORES models have some calculated thresholds which correspond to specific statistics (precision, recall, f1) which are used to answer queries in the form "maximum recall @ precision >= 0.995" that are mapped in the categories we see in the recent filters e.g. Likely good etc. These thresholds are fetched from ORES with a query that looks like this:

https://ores.wikimedia.org/v3/scores/frwiki/?models=damaging&model_info=statistics.thresholds.false.%22maximum+recall+%40+precision+%3E%3D+0.995%22%7Cstatistics.thresholds.true.%22maximum+filter_rate+%40+recall+%3E%3D+0.9%22%7Cstatistics.thresholds.true.%22maximum+recall+%40+precision+%3E%3D+0.6%22%7Cstatistics.thresholds.true.%22maximum+recall+%40+precision+%3E%3D+0.9%22&format=json

In order to overcome this we are going to extract this information and save it /hard code it into the ORES extension for the time being as it is a functionality that is not likely to be required from Lift Wing.
We are going to commit these files to the Ores extension repository for the wikis defined in this file and for the statistics defined in this conf file.

Change 915541 had a related patch set uploaded (by Ilias Sarantopoulos; author: Ilias Sarantopoulos):

[mediawiki/extensions/ORES@master] feat: hardcode threshold calls to switch to Lift Wing

https://gerrit.wikimedia.org/r/915541

Change 910439 had a related patch set uploaded (by Ilias Sarantopoulos; author: Ilias Sarantopoulos):

[mediawiki/extensions/ORES@master] feat: use Lift Wing instead of ORES

https://gerrit.wikimedia.org/r/910439

In the patch https://gerrit.wikimedia.org/r/915541 I have a first approach on how to load hardcoded-in-repo thresholds instead of calling ores.
A second approach I thought of would include serving this from ores-legacy app without having to modify the extension on this part much (just point to ores-legacy instead of ores).
Will look into this if we figure out there is a blocker somehow in the current approach.

Did some research how we can test the changes and there are two options:

  1. Deployment-prep beta cluster. There are some mediawiki deployments on this cluster and every time a change is merged in the repository's master/main branch it is deployed on them (only if the extension is a wmf enabled extension). An example of such a website is https://en.wikipedia.beta.wmflabs.org/. Only merged changes are deployed, which will then be also deployed in production in the next release cycle.
  2. Use patchdemo. With patchdemo one can spin up a mediawiki + extensions deployment in Wikimedia Cloud Services for her own testing purposes. Which will be deleted afterwards. Any pr/patch/change can be deployed even before it is merged.

I find patchdemo amazing and it allows a great development experience (develop/test locally -> deploy branch on patchdemo -> deploy on staging -> deploy on prod).
I opened a pull request for patchdemo since it doesn't have ORES in the list of available extensions. -> https://github.com/MatmaRex/patchdemo/pull/560

At this moment the downside is that all options don't have access to the internal Lift Wing endpoint so we are going to test things using the external one (through the API Gateway)

Current status of patchdemo for ORES

Screenshot 2023-05-11 at 7.12.29 PM.png (1×2 px, 213 KB)

ORES has been added to patchdemo. thanks @matmarex!

I figured out an issue with any new installation. At the moment the first time it is installed the ores_models table is initialized by parsing the response from https://ores.wikimedia.org/v3/scores/enwiki/

{
  "enwiki": {
    "models": {
      "articlequality": {
        "version": "0.9.2"
      },
      "articletopic": {
        "version": "1.3.0"
      },
      "damaging": {
        "version": "0.5.1"
      },
      "draftquality": {
        "version": "0.2.1"
      },
      "drafttopic": {
        "version": "1.3.0"
      },
      "goodfaith": {
        "version": "0.5.1"
      },
      "wp10": {
        "version": "0.9.2"
      }
    }
  }
}

Again we have two options to hardcode this info in the extension or use ores-legacy to fetch this info.
For now I am proceeding with the first option to test things.

Test wiki created on Patch demo by ISarantopoulos-WMF using patch(es) linked to this task:
https://patchdemo.wmflabs.org/wikis/55c5454d6e/w

Change 922512 had a related patch set uploaded (by Ilias Sarantopoulos; author: Ilias Sarantopoulos):

[operations/mediawiki-config@master] ORES: add model versions configuration

https://gerrit.wikimedia.org/r/922512

I have provided a solution to the above issue is available with two different ways, both of which are available in : https://gerrit.wikimedia.org/r/c/mediawiki/extensions/ORES/+/910439

  1. Save the response from ores.wikimedia.org/v3/scores to a file and serve the model versions from that file.
  2. Save this information in the mediawiki configuration as done in this patch https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/922512

Keep in mind that this procedure is just used in the initialization phase where the table ores_models needs to be populated which has model name => model version

Test wiki on Patch demo by ISarantopoulos-WMF using patch(es) linked to this task was deleted:

https://patchdemo.wmflabs.org/wikis/55c5454d6e/w/

Test wiki created on Patch demo by ISarantopoulos-WMF using patch(es) linked to this task:
https://patchdemo.wmflabs.org/wikis/6671d13c4b/w

Test wiki on Patch demo by ISarantopoulos-WMF using patch(es) linked to this task was deleted:

https://patchdemo.wmflabs.org/wikis/6671d13c4b/w/

Test wiki created on Patch demo by ISarantopoulos-WMF using patch(es) linked to this task:
https://patchdemo.wmflabs.org/wikis/9ba961d7f2/w

Test wiki created on Patch demo by ISarantopoulos-WMF using patch(es) linked to this task:
https://patchdemo.wmflabs.org/wikis/3d36483f25/w

Test wiki on Patch demo by ISarantopoulos-WMF using patch(es) linked to this task was deleted:

https://patchdemo.wmflabs.org/wikis/9ba961d7f2/w/

Test wiki created on Patch demo by ISarantopoulos-WMF using patch(es) linked to this task:
https://patchdemo.wmflabs.org/wikis/95f72f9247/w

Test wiki created on Patch demo by ISarantopoulos-WMF using patch(es) linked to this task:
https://patchdemo.wmflabs.org/wikis/2206c4b777/w

Test wiki on Patch demo by ISarantopoulos-WMF using patch(es) linked to this task was deleted:

https://patchdemo.wmflabs.org/wikis/2206c4b777/w/

Test wiki on Patch demo by ISarantopoulos-WMF using patch(es) linked to this task was deleted:

https://patchdemo.wmflabs.org/wikis/95f72f9247/w/

Test wiki created on Patch demo by ISarantopoulos-WMF using patch(es) linked to this task:
https://patchdemo.wmflabs.org/wikis/b79ae47b5a/w

The patches for the switch are the following:

The first patch that has configuration changes required by the other 2 patches will be deployed according to the backport windows schedule https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20230601T0700
Then a week afterwards we can use it in the other patches.

Change 922512 merged by jenkins-bot:

[operations/mediawiki-config@master] ORES: add model versions configuration and thresholds

https://gerrit.wikimedia.org/r/922512

Mentioned in SAL (#wikimedia-operations) [2023-06-01T08:18:12Z] <daniel@deploy1002> Started scap: Backport for [[gerrit:922512|ORES: add model versions configuration and thresholds (T319170)]]

Mentioned in SAL (#wikimedia-operations) [2023-06-01T08:19:54Z] <daniel@deploy1002> daniel and isaranto: Backport for [[gerrit:922512|ORES: add model versions configuration and thresholds (T319170)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet

Mentioned in SAL (#wikimedia-operations) [2023-06-01T08:28:25Z] <daniel@deploy1002> Finished scap: Backport for [[gerrit:922512|ORES: add model versions configuration and thresholds (T319170)]] (duration: 10m 12s)

Test wiki on Patch demo by ISarantopoulos-WMF using patch(es) linked to this task was deleted:

https://patchdemo.wmflabs.org/wikis/3d36483f25/w/

Test wiki created on Patch demo by ISarantopoulos-WMF using patch(es) linked to this task:
https://patchdemo.wmflabs.org/wikis/1b862a0beb/w

Change 926420 had a related patch set uploaded (by Ilias Sarantopoulos; author: Ilias Sarantopoulos):

[mediawiki/extensions/ORES@master] feat: use Lift Wing instead of ORES (2) - with one Scorefetcher

https://gerrit.wikimedia.org/r/926420