Page MenuHomePhabricator

Move backend of ORES MediaWiki extension to Lift Wing
Closed, ResolvedPublic

Description

The idea is to keep the current ORES extension active on MediaWiki deployments, migrating its backend calls from ORES to Lift Wing.

Some notes:

  • As described in T312518#8108190, the PHP code change shouldn't be too hard. The extra difficult point is that we'll need to add meaningful HTTP Host headers to the extentions' HTTP calls.
  • To ease the transition and the code change, we should make the revscoring-models' output equal to the ORES one. See T318932.

The goal of the task is to update the ORES extension code to support Lift Wing, and then configure it via MediaWiki deployments.

Details

SubjectRepoBranchLines +/-
mediawiki/extensions/ORESmaster+1 -1
mediawiki/extensions/ORESmaster+9 -2
mediawiki/extensions/ORESmaster+0 -2
operations/deployment-chartsmaster+16 -0
operations/mediawiki-configmaster+1 -1
operations/deployment-chartsmaster+5 -2
mediawiki/extensions/ORESwmf/1.41.0-wmf.17+5 -3
operations/deployment-chartsmaster+4 -0
mediawiki/extensions/ORESmaster+5 -3
operations/puppetproduction+1 -0
operations/mediawiki-configmaster+1 -0
operations/mediawiki-configmaster+4 -0
operations/puppetproduction+14 -0
operations/deployment-chartsmaster+4 -0
mediawiki/extensions/ORESmaster+208 -19
operations/mediawiki-configmaster+3 -0
mediawiki/extensions/ORESmaster+329 -21
mediawiki/extensions/ORESmaster+138 -16
operations/mediawiki-configmaster+3 -0
mediawiki/extensions/ORESmaster+395 -16
operations/mediawiki-configmaster+4 K -0
Show related patches Customize query in gerrit

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Change 910439 abandoned by Ilias Sarantopoulos:

[mediawiki/extensions/ORES@master] feat: use Lift Wing instead of ORES

Reason:

covered in https://gerrit.wikimedia.org/r/c/mediawiki/extensions/ORES/+/926420

https://gerrit.wikimedia.org/r/910439

Change 929352 had a related patch set uploaded (by Ilias Sarantopoulos; author: Ilias Sarantopoulos):

[operations/mediawiki-config@master] ores: enable per wiki deployment of Ores deprecation

https://gerrit.wikimedia.org/r/929352

Change 929352 merged by jenkins-bot:

[operations/mediawiki-config@master] ores: override Beta cluster liftwing URL

https://gerrit.wikimedia.org/r/929352

Change 915541 merged by jenkins-bot:

[mediawiki/extensions/ORES@master] feat: hardcode threshold calls to switch to Lift Wing

https://gerrit.wikimedia.org/r/915541

Change 926420 merged by jenkins-bot:

[mediawiki/extensions/ORES@master] feat: use Lift Wing instead of ORES

https://gerrit.wikimedia.org/r/926420

Change 931939 had a related patch set uploaded (by Ilias Sarantopoulos; author: Ilias Sarantopoulos):

[operations/mediawiki-config@master] ores: enable liftwing on beta

https://gerrit.wikimedia.org/r/931939

Change 931939 merged by jenkins-bot:

[operations/mediawiki-config@master] ores: enable liftwing on beta

https://gerrit.wikimedia.org/r/931939

Lift Wing has been enabled in the beta cluster but at the moment seems that ORES extension is broken. This seems to be related with the patch about hardcoding thresholds
https://gerrit.wikimedia.org/r/c/mediawiki/extensions/ORES/+/915541
I'm looking into it in order to fix it.

Change 932289 had a related patch set uploaded (by Ilias Sarantopoulos; author: Ilias Sarantopoulos):

[mediawiki/extensions/ORES@master] fix: corrent config variable access

https://gerrit.wikimedia.org/r/932289

Test wiki created on Patch demo by ISarantopoulos-WMF using patch(es) linked to this task:
https://patchdemo.wmflabs.org/wikis/00c407be43/w

I have the following issue while testing the extension with Liftwing: There are some dummy revisions which of course don't exist in mwapi and lift wing returns a 400 error (I'd prefer 404 but it is a kserve error by default).
Is there the possibility that this can happen in production? Requesting a rev_id that doesn't exist?
Nevertheless we should handle this in the exception. The difference is that ORES returns a 200 response with an error entity in the message so I'm thinking to just log the error response in that case as this is what it is done in the current version of the extension,

So the ORES extension will throw an error if the above thing happens anyway, so I propose to continue keeping the same behavior, unless someone believes otherwise.

I have the following issue while testing the extension with Liftwing: There are some dummy revisions which of course don't exist in mwapi and lift wing returns a 400 error (I'd prefer 404 but it is a kserve error by default).

I think we chose a 400 to indicate a client input error, but 404 may also work (even if it seems a little more specific for this use case, I'd prefer a 400).

Is there the possibility that this can happen in production? Requesting a rev_id that doesn't exist?

I don't follow the question, do you mean if any client do it on purpose? I'd say no but we should be resilient to weird use cases (to avoid http 500s or similar that will impact our SLO).

Nevertheless we should handle this in the exception. The difference is that ORES returns a 200 response with an error entity in the message so I'm thinking to just log the error response in that case as this is what it is done in the current version of the extension,

In the extension you mean? +1

Test wiki created on Patch demo by ISarantopoulos-WMF using patch(es) linked to this task:
https://patchdemo.wmflabs.org/wikis/c9245e125c/w

Test wiki on Patch demo by ISarantopoulos-WMF using patch(es) linked to this task was deleted:

https://patchdemo.wmflabs.org/wikis/c9245e125c/w/

Test wiki created on Patch demo by ISarantopoulos-WMF using patch(es) linked to this task:
https://patchdemo.wmflabs.org/wikis/c05ca5c76e/w

Test wiki on Patch demo by ISarantopoulos-WMF using patch(es) linked to this task was deleted:

https://patchdemo.wmflabs.org/wikis/c05ca5c76e/w/

Test wiki on Patch demo by ISarantopoulos-WMF using patch(es) linked to this task was deleted:

https://patchdemo.wmflabs.org/wikis/00c407be43/w/

Test wiki created on Patch demo by ISarantopoulos-WMF using patch(es) linked to this task:
https://patchdemo.wmflabs.org/wikis/ed72f0f24b/w

Test wiki on Patch demo by ISarantopoulos-WMF using patch(es) linked to this task was deleted:

https://patchdemo.wmflabs.org/wikis/ed72f0f24b/w/

Test wiki created on Patch demo by ISarantopoulos-WMF using patch(es) linked to this task:
https://patchdemo.wmflabs.org/wikis/2a38ebe9b7/w

Test wiki created on Patch demo by ISarantopoulos-WMF using patch(es) linked to this task:
https://patchdemo.wmflabs.org/wikis/ad97984435/w

Change 932289 merged by jenkins-bot:

[mediawiki/extensions/ORES@master] fix: corrent config variable access

https://gerrit.wikimedia.org/r/932289

Ores filters now appear on Beta cluster.
However the extension on Beta cluster hasnt provided any scores since May (before we deployed anything related to Lift Wing), so I am investigating this at the moment.

Change 935712 had a related patch set uploaded (by Ilias Sarantopoulos; author: Ilias Sarantopoulos):

[operations/deployment-charts@master] ml-services: deploy revscoring models for test.wikipedia.org

https://gerrit.wikimedia.org/r/935712

Change 935712 merged by jenkins-bot:

[operations/deployment-charts@master] ml-services: deploy revscoring models for test.wikipedia.org

https://gerrit.wikimedia.org/r/935712

Change 935742 had a related patch set uploaded (by Ilias Sarantopoulos; author: Ilias Sarantopoulos):

[operations/puppet@production] httpbb: add testwiki model tests

https://gerrit.wikimedia.org/r/935742

Change 935742 merged by Elukey:

[operations/puppet@production] httpbb: add testwiki model tests

https://gerrit.wikimedia.org/r/935742

Change 935743 had a related patch set uploaded (by Ilias Sarantopoulos; author: Ilias Sarantopoulos):

[operations/mediawiki-config@master] ores extension: deploy LiftWing usage on testwiki

https://gerrit.wikimedia.org/r/935743

We will wait for the train for wmf.16 to be completed (to be sure it isnt reverted) and then we will start deploying to wikis starting from testwiki.

Change 935743 merged by jenkins-bot:

[operations/mediawiki-config@master] ores extension: deploy LiftWing usage on testwiki

https://gerrit.wikimedia.org/r/935743

Mentioned in SAL (#wikimedia-operations) [2023-07-10T13:18:59Z] <ladsgroup@deploy1002> Started scap: Backport for [[gerrit:935743|ores extension: deploy LiftWing usage on testwiki (T319170)]]

Mentioned in SAL (#wikimedia-operations) [2023-07-10T13:20:22Z] <ladsgroup@deploy1002> isaranto and ladsgroup: Backport for [[gerrit:935743|ores extension: deploy LiftWing usage on testwiki (T319170)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet

Mentioned in SAL (#wikimedia-operations) [2023-07-10T13:28:02Z] <ladsgroup@deploy1002> Finished scap: Backport for [[gerrit:935743|ores extension: deploy LiftWing usage on testwiki (T319170)]] (duration: 09m 02s)

Change 936796 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[operations/mediawiki-config@master] Override liftwing hostname

https://gerrit.wikimedia.org/r/936796

Change 936796 merged by jenkins-bot:

[operations/mediawiki-config@master] Override liftwing hostname

https://gerrit.wikimedia.org/r/936796

Mentioned in SAL (#wikimedia-operations) [2023-07-11T09:49:00Z] <ladsgroup@deploy1002> Started scap: Backport for [[gerrit:936796|Override liftwing hostname (T319170)]]

Mentioned in SAL (#wikimedia-operations) [2023-07-11T09:52:57Z] <ladsgroup@deploy1002> ladsgroup: Backport for [[gerrit:936796|Override liftwing hostname (T319170)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet

Change 937056 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/puppet@production] profile::services_proxy::envoy: add inference to enabled_listeners

https://gerrit.wikimedia.org/r/937056

Mentioned in SAL (#wikimedia-operations) [2023-07-11T10:03:34Z] <ladsgroup@deploy1002> Finished scap: Backport for [[gerrit:936796|Override liftwing hostname (T319170)]] (duration: 14m 34s)

Change 937056 merged by Elukey:

[operations/puppet@production] profile::services_proxy::envoy: add inference to enabled_listeners

https://gerrit.wikimedia.org/r/937056

Change 937142 had a related patch set uploaded (by Ilias Sarantopoulos; author: Ilias Sarantopoulos):

[mediawiki/extensions/ORES@master] fix lift wing URL by adding slash suffix

https://gerrit.wikimedia.org/r/937142

Change 937158 had a related patch set uploaded (by Ilias Sarantopoulos; author: Ilias Sarantopoulos):

[mediawiki/extensions/ORES@master] fix: add request headers properly

https://gerrit.wikimedia.org/r/937158

Change 937158 merged by jenkins-bot:

[mediawiki/extensions/ORES@master] fix: add request headers properly

https://gerrit.wikimedia.org/r/937158

Change 937438 had a related patch set uploaded (by Ilias Sarantopoulos; author: Ilias Sarantopoulos):

[operations/deployment-charts@master] ml-services: deploy models for testwiki

https://gerrit.wikimedia.org/r/937438

Change 937438 merged by jenkins-bot:

[operations/deployment-charts@master] ml-services: deploy models for testwiki

https://gerrit.wikimedia.org/r/937438

Change 937122 had a related patch set uploaded (by Ladsgroup; author: Ilias Sarantopoulos):

[mediawiki/extensions/ORES@wmf/1.41.0-wmf.17] fix: add request headers properly

https://gerrit.wikimedia.org/r/937122

Change 937122 merged by jenkins-bot:

[mediawiki/extensions/ORES@wmf/1.41.0-wmf.17] fix: add request headers properly

https://gerrit.wikimedia.org/r/937122

Mentioned in SAL (#wikimedia-operations) [2023-07-12T10:50:00Z] <ladsgroup@deploy1002> Started scap: Backport for [[gerrit:937122|fix: add request headers properly (T319170)]]

Mentioned in SAL (#wikimedia-operations) [2023-07-12T10:51:33Z] <ladsgroup@deploy1002> ladsgroup: Backport for [[gerrit:937122|fix: add request headers properly (T319170)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet

Mentioned in SAL (#wikimedia-operations) [2023-07-12T11:00:21Z] <ladsgroup@deploy1002> Finished scap: Backport for [[gerrit:937122|fix: add request headers properly (T319170)]] (duration: 10m 20s)

There was an issue and headers were not set correctly so requests to the internal endpoint couldn't be used correctly. Now fixed and also deployed draftquality and articlequality models for testwiki because they are enabled on it.
ORES extension now works on test.wikipedia.org using LiftWing (recent changes page).
Next stop rolling out gradually to the other wikis.

Note that the liftwing backend makes a req per model meaning it'll be make possibly four times more reqs than the old system. That makes is quite important to use envoy before deploying to any large wiki.

Change 937453 had a related patch set uploaded (by Ilias Sarantopoulos; author: Ilias Sarantopoulos):

[operations/mediawiki-config@master] ores: use envoy proxy for Lift Wing

https://gerrit.wikimedia.org/r/937453

I added the envoy proxy for Lift Wing. At the moment all wikis have at most 2 models enabled (goodfaith and damaging) except testwiki which has 4, so this will mean 2 times more reqs.

Before deploying the extension we need to make sure that we have deployed on LiftWing all models that are available in ORES for the following wikis:

'default' => false,
	'arwiki' => true, // T192498
	'bswiki' => true, // T197010
	'cawiki' => true, // T192501
	'cswiki' => true, // T151611
	'enwiki' => true, // T140003
	'eswiki' => true, // T130279
	'eswikibooks' => true, // T145394
	'eswikiquote' => true, // T219160
	'etwiki' => true, // T159609
	'fawiki' => true, // T130211
	'fiwiki' => true, // T163011
	'frwiki' => true,
	'hewiki' => true, // T161621
	'huwiki' => true, // T192496
	'itwiki' => true, // T211032
	'kowiki' => true, // T161628
	'lvwiki' => true, // T192499
	'nlwiki' => true, // T139432
	'plwiki' => true, // T140005
	'ptwiki' => true, // T139692
	'rowiki' => true, // T170723
	'ruwiki' => true,
	'simplewiki' => true, // T182012
	'sqwiki' => true, // T170723
	'srwiki' => true, // T197012
	'svwiki' => true, // T174560
	'testwiki' => true, // T199913
	'test2wiki' => true, // T200412
	'trwiki' => true, // T139992
	'ukwiki' => true, // T256887
	'wikidatawiki' => true, // T130212
	'zhwiki' => true, // T225562

I am working on that.

I added the envoy proxy for Lift Wing. At the moment all wikis have at most 2 models enabled (goodfaith and damaging) except testwiki which has 4, so this will mean 2 times more reqs.

In enwiki where a significant portion of jobs and reqs come from, it's four: https://github.com/wikimedia/operations-mediawiki-config/blob/master/wmf-config/ext-ORES.php#L67

I cross checked the models available on LW vs ORES.
The following models are available in ORES but not in Lift Wing:

articletopic
  -simplewiki
  -testwiki
damaging
  -fakewiki
  -simplewiki

goodfaith
  -fakewiki  
  -simplewiki

reverted
  -testwiki
articlequality
  -simplewiki
  -wikidata
  -itemquality
draftquality
  -simplewiki
drafttopic
  -testwiki
wp10
  -simplewiki
  -wikidata
  -itemquality
pagelevel
  -frwikisource
itemquality
  -wikidatawiki
itemtopic
  -wikidatawiki

From these models wp10 is a renamed version of articlequality.
The rest if enabled in the extension will result to a failure.

I am currently debugging the following warnings related to database insertions and updates.

I am currently debugging the following warnings related to database insertions and updates.

My rough guess (without looking at anything) is that since everything in a req or job in mw gets wrapped in a transaction, we are inserting the model rows twice in case they are new and that is leading to errors.

I haven't had any luck trying to check the above errors when running the extension locally.

Change 938267 had a related patch set uploaded (by Ilias Sarantopoulos; author: Ilias Sarantopoulos):

[mediawiki/extensions/ORES@master] add flag for host header

https://gerrit.wikimedia.org/r/938267

I the above patch I am attempting to resolve the issue that occurs when one sets a Host header when using the api gateway. The response returned is the following:

<!DOCTYPE html>
<html lang="en">
<meta charset="utf-8">
<title>Wikimedia Error</title>
<style>
* { margin: 0; padding: 0; }
body { background: #fff; font: 15px/1.6 sans-serif; color: #333; }
.content { margin: 7% auto 0; padding: 2em 1em 1em; max-width: 640px; }
.footer { clear: both; margin-top: 14%; border-top: 1px solid #e5e5e5; background: #f9f9f9; padding: 2em 0; font-size: 0.8em; text-align: center; }
img { float: left; margin: 0 2em 2em 0; }
a img { border: 0; }
h1 { margin-top: 1em; font-size: 1.2em; }
.content-text { overflow: hidden; overflow-wrap: break-word; word-wrap: break-word; -webkit-hyphens: auto; -moz-hyphens: auto; -ms-hyphens: auto; hyphens: auto; }
p { margin: 0.7em 0 1em 0; }
a { color: #0645ad; text-decoration: none; }
a:hover { text-decoration: underline; }
code { font-family: sans-serif; }
.text-muted { color: #777; }
</style>
<div class="content" role="main">
<a href="https://www.wikimedia.org"><img src="https://www.wikimedia.org/static/images/wmf-logo.png" srcset="https://www.wikimedia.org/static/images/wmf-logo-2x.png 2x" alt="Wikimedia" width="135" height="101">
</a>
<h1>Error</h1>
<div class="content-text">
<p>Our servers are currently under maintenance or experiencing a technical problem.

Please <a href="" title="Reload this page" onclick="window.location.reload(false); return false">try again</a> in a few&nbsp;minutes.</p>

<p>See the error message at the bottom of this page for more&nbsp;information.</p>
</div>
</div>
<div class="footer"><p>If you report this error to the Wikimedia System Administrators, please include the details below.</p><p class="text-muted"><code>Request from 2a02:85f:fcaf:c922:a827:22ac:9414:b45b via cp3060 cp3060, Varnish XID 405548350<br>Upstream caches: cp3060 int<br>Error: 400,  at Fri, 14 Jul 2023 15:44:48 GMT</code></p>
</div>
</html>

The flag OresLiftWingAddHostHeader is intended to be false only for Beta Cluster (or any other external/local deployment).

Change 938856 had a related patch set uploaded (by Ilias Sarantopoulos; author: Ilias Sarantopoulos):

[operations/deployment-charts@master] ml-services: add new variable in chart for s3 path

https://gerrit.wikimedia.org/r/938856

Change 938859 had a related patch set uploaded (by Ilias Sarantopoulos; author: Ilias Sarantopoulos):

[operations/deployment-charts@master] ml-services: deploy models for simplewiki

https://gerrit.wikimedia.org/r/938859

Change 938856 merged by jenkins-bot:

[operations/deployment-charts@master] ml-services: add new variable in chart for s3 path

https://gerrit.wikimedia.org/r/938856

Change 937453 merged by jenkins-bot:

[operations/mediawiki-config@master] ores: use envoy proxy for Lift Wing

https://gerrit.wikimedia.org/r/937453

Mentioned in SAL (#wikimedia-operations) [2023-07-18T08:53:15Z] <ladsgroup@deploy1002> Started scap: Backport for [[gerrit:937453|ores: use envoy proxy for Lift Wing (T319170)]]

Mentioned in SAL (#wikimedia-operations) [2023-07-18T08:57:33Z] <ladsgroup@deploy1002> isaranto and ladsgroup: Backport for [[gerrit:937453|ores: use envoy proxy for Lift Wing (T319170)]] synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)

Change 938859 merged by jenkins-bot:

[operations/deployment-charts@master] ml-services: deploy models for simplewiki

https://gerrit.wikimedia.org/r/938859

Mentioned in SAL (#wikimedia-operations) [2023-07-18T09:08:11Z] <ladsgroup@deploy1002> Finished scap: Backport for [[gerrit:937453|ores: use envoy proxy for Lift Wing (T319170)]] (duration: 14m 56s)

Change 939240 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[mediawiki/extensions/ORES@master] Fix model row upsert warning

https://gerrit.wikimedia.org/r/939240

We have deployed the model servers on Lift Wing for simplewiki (using enwiki models) and now all models that are used on ores-extension are available on LW.

Change 939240 merged by jenkins-bot:

[mediawiki/extensions/ORES@master] Fix model row upsert warning

https://gerrit.wikimedia.org/r/939240

Change 938267 merged by jenkins-bot:

[mediawiki/extensions/ORES@master] Add flag for host header

https://gerrit.wikimedia.org/r/938267

all backend have been moved to LW and ORES has zero traffic from Mediawiki as verified in logstash

Change 937142 abandoned by Ilias Sarantopoulos:

[mediawiki/extensions/ORES@master] fix: set default lift wing url to null

Reason:

not needed

https://gerrit.wikimedia.org/r/937142

Test wiki on Patch demo by ISarantopoulos-WMF using patch(es) linked to this task was deleted:

https://patchdemo.wmflabs.org/wikis/b79ae47b5a/w/

Test wiki on Patch demo by ISarantopoulos-WMF using patch(es) linked to this task was deleted:

https://patchdemo.wmflabs.org/wikis/2a38ebe9b7/w/

Test wiki on Patch demo by ISarantopoulos-WMF using patch(es) linked to this task was deleted:

https://patchdemo.wmflabs.org/wikis/ad97984435/w/

Test wiki on Patch demo by ISarantopoulos-WMF using patch(es) linked to this task was deleted:

https://patchdemo.wmflabs.org/wikis/1b862a0beb/w/