Make RCFilters compatible with both the old and new thresholds APIs
Closed, ResolvedPublic

Description

See https://phabricator.wikimedia.org/phame/post/view/68/more_better_model_information_and_threshold_optimizations/

This is a breaking change about how "thresholds" are reported. Currently you can test against ores-misc.wmflabs.org. Soon we'll deploy to ores-beta.wmflabs.org too.

The ORES extension currently resolves thresholds by calling the model_info API and interpreting its output (see Stats.php from line 77 onwards). Currently, the thresholds to look up are specified using the names from the old-style model_info output; see the default value of wgOresFiltersThresholds and how it's overridden in InitialiseSettings.php for specific wikis.

I think the best way to go about this would be to make the wgOresFiltersThresholds config format more structured, so that we have something like 'recall' => 0.9 instead of 'filter_rate_at_recall(min_recall=0.9)'. That would give us enough information to generate requests against both the old and new APIs, so then we could either have a config var that tells us which to use, or code that tries one and falls back on the other (or in some other way discovers which version is used by the ORES API it's talking to).

In the future, I'd like for a threshold to be defined as the min or max of multiple lookups as explained in T173019: Demonstrate collab threshold detection needs in new 'thresholds' system. That can be done separately from this though, and is probably best off waiting until the new API is deployed and compatibility code for the old one is removed, but in designing the format of our config var we might want to keep that in mind.

Strawman config proposal:

"wgOresFiltersThresholds": {
    "damaging": {
        "likelygood": {"min": 0, "max": { "precision": 0.995 } },
        "maybebad": { "min": { "recall": 0.9 }, "max": 1 },
       ....
     },
    "goodfaith": {
        ...
    }
}

Then maybe for multi-lookup stuff we can express that as "maybebad": { "min": [ { "recall": 0.9 }, { "precision": 0.15 } ], "max": 1" }, or maybe "maybebad": { "min": { "recall": 0.9, "precision": "0.15" } }? (Whether the min or max of those should be used could be inferred: if specified for min, use the max; if specified for max, use the min. Always use the narrower of the two intervals.)

Halfak created this task.Sep 5 2017, 6:17 PM
Restricted Application added a project: Scoring-platform-team. · View Herald TranscriptSep 5 2017, 6:17 PM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript
Catrope renamed this task from Confirm that "thresholds" change will not affect RCFilters to Make RCFilters compatible with both the old and new thresholds APIs.Sep 6 2017, 10:05 PM
Catrope updated the task description. (Show Details)Sep 6 2017, 10:31 PM
awight claimed this task.Sep 26 2017, 6:26 PM
awight triaged this task as High priority.

Deploying this without a break in service might be tricky. The new API cannot return the old format test_stats, even on the v1 route, so the ext-ORES code will have to accept both old and new formats, and somehow automatically detect which request format is appropriate. It would make this easier if we could have both revscoring 1 and 2 servers available in production, and switch ext-ORES between them using configuration. IMO we've broken the concept of versioned API endpoints.

Change 380893 had a related patch set uploaded (by Awight; owner: Awight):
[mediawiki/extensions/ORES@master] [WIP] Support new thresholds API

https://gerrit.wikimedia.org/r/380893

Deploying this without a break in service might be tricky. The new API cannot return the old format test_stats, even on the v1 route, so the ext-ORES code will have to accept both old and new formats, and somehow automatically detect which request format is appropriate. It would make this easier if we could have both revscoring 1 and 2 servers available in production, and switch ext-ORES between them using configuration. IMO we've broken the concept of versioned API endpoints.

I originally told Aaron that it was OK to break b/c on the server side and offered to write forwards-compatible client code. But that was before you were on the team, and before writing that code became something you would do. My apologies if that screwed you over, I thought I was going to be the one doing it, and I didn't mind screwing myself over in this way :) . Also at that time I thought we were just talking about the same URL returning a different result, but as Aaron actually implemented thresholds his understanding of what he was implementing changed, and now we have different URLs.

No apologies needed, it's shallow water under the bridge! I'm mostly just taking notes about the horrors that lie ahead :-). FWIW, I think it was a perfectly reasonable call to break API v1 and v2 compatibility, while we're the only consumer of the broken feature. But I did want to call it out and encourage us to either maintain or deprecate in the future.

Back compat turned out to be easy :-D.

We'll be able to toggle between APIs using MediaWiki configuration, $wgOresApiVersion = 1 or 3.

Actually, Revscoring 2.x doesn't support the API v1 stats so this change cannot be gracefully deployed. We'll need to schedule an hour or two of downtime to deploy.

Change 381261 had a related patch set uploaded (by Awight; owner: Awight):
[mediawiki/extensions/ORES@master] Update built-in config to API v3

https://gerrit.wikimedia.org/r/381261

awight added a comment.Oct 2 2017, 4:05 PM

@Halfak This is the MW thresholds work.

awight added a comment.Oct 2 2017, 4:30 PM

@Halfak is suggesting that we have the frontend fallback from v3 to v1 if it's getting bad results. This could be cached and might allow us a graceful upgrade.

awight added a comment.Oct 3 2017, 9:56 PM

A bit of discussion from IRC,

[18:12:46] <awight|afk>	 RoanKattouw: o/ I realized the thresholds deployment will still require down time.  Donno why I keep overlooking this key point:
[18:13:10] <awight>	 revscoring 2.x breaks the v1 thresholds API
[18:13:18] <RoanKattouw>	 Wait what
[18:13:23] <RoanKattouw>	 There are TWO breaking changes?
[18:13:31] <awight>	 Which means, no matter what back compat exists in ext-ORES we’re hosed
[18:13:45] <awight>	 lemme double-check that now
[18:14:06] <awight>	 https://ores-misc.wmflabs.org/v2/scores/enwiki/draftquality/?model_info=test_stats
[18:14:29] <awight>	 That’s the old thresholds style, called against revscoring 2.0
[18:15:18] <awight>	 However, that specific bug is masking what the server *should* do.  I’ll see if I can patch the bug and make the old-style call.
[18:15:30] <awight>	 (working on that under T176830)
[18:15:30] <stashbot>	 T176830: Bug: ORES thresholds API fails - https://phabricator.wikimedia.org/T176830
[18:19:24] <awight>	 RoanKattouw: Down time wouldn’t be too horrible, I would disable the ORES UI by config and somehow stop the FetchScoreJob.  Not totally sure how to backfill the missed RCs, but that might be an acceptable loss.
[18:19:50] <RoanKattouw>	 There's a script for that
[18:19:54] <RoanKattouw>	 Backfilling I mean
[18:20:17] <RoanKattouw>	 Also we drop scores on RCs every now and then already, presumably due to ORES errors/timeouts
[18:22:40] <awight>	 k thanks for knowing that.  I won’t worry too much about downtime, if that turns out to be necessary.
Halfak added a comment.Oct 4 2017, 2:42 PM

Looks like there is a misunderstanding here. There are not "TWO breaking changes". The only breaking change is how model information is reported.

awight added a comment.Oct 4 2017, 4:36 PM

Looks like there is a misunderstanding here. There are not "TWO breaking changes". The only breaking change is how model information is reported.

The API parameters have also changed. But yeah I consider that a single breaking change.

awight added a comment.Oct 6 2017, 1:43 AM

The code is in good shape and ready for another review.

Change 382778 had a related patch set uploaded (by Awight; owner: Awight):
[mediawiki/extensions/ORES@master] Fallback to old thresholds API as necessary

https://gerrit.wikimedia.org/r/382778

Change 381261 abandoned by Awight:
[DO NOT DEPLOY] Update built-in config to API v3

https://gerrit.wikimedia.org/r/381261

Change 380893 merged by jenkins-bot:
[mediawiki/extensions/ORES@master] Support new thresholds API

https://gerrit.wikimedia.org/r/380893

Change 382778 merged by jenkins-bot:
[mediawiki/extensions/ORES@master] Fallback to old thresholds API as necessary

https://gerrit.wikimedia.org/r/382778

The ORES extension code merged to master doesn't quite check out on beta Wikipedia. It seems to work still, although I can't say why. There seems to be no attempt to use the new-style API, and caching doesn't seem to work because there are requests for test_stats every few seconds.

Change 383593 had a related patch set uploaded (by Awight; owner: Awight):
[mediawiki/extensions/ORES@master] Revert "Fallback to old thresholds API as necessary"

https://gerrit.wikimedia.org/r/383593

Change 383594 had a related patch set uploaded (by Awight; owner: Awight):
[mediawiki/extensions/ORES@master] Revert "Support new thresholds API"

https://gerrit.wikimedia.org/r/383594

Change 383593 merged by jenkins-bot:
[mediawiki/extensions/ORES@master] Revert "Fallback to old thresholds API as necessary"

https://gerrit.wikimedia.org/r/383593

Change 383594 merged by jenkins-bot:
[mediawiki/extensions/ORES@master] Revert "Support new thresholds API"

https://gerrit.wikimedia.org/r/383594

JobRunner exception coming from the new code,

2017-10-11 06:52:30 [Wd2jjQpEFhUAAELgsiMAAAAE] deployment-jobrunner02 wikidatawiki 1.31.0-alpha exception ERROR: [Wd2jjQpEFhUAAELgsiMAAAAE] /rpc/RunJobs.phpwiki=wikidatawiki&type=ORESFetchScoreJob&maxtime=30&maxmem=300M   RuntimeException from line 172 of /srv/mediawiki/php-master/extensions/ORES/includes/Cache.php: No model available for [models] {"exception_id":"Wd2jjQpEFhUAAELgsiMAAAAE","exception_url":"/rpc/RunJobs.php?wiki=wikidatawiki&type=ORESFetchScoreJob&maxtime=30&maxmem=300M","caught_by":"mwe_handler"} 
[Exception RuntimeException] (/srv/mediawiki/php-master/extensions/ORES/includes/Cache.php:172) No model available for [models]
  #0 /srv/mediawiki/php-master/extensions/ORES/includes/Cache.php(207): ORES\Cache->getModelId(string)
  #1 /srv/mediawiki/php-master/extensions/ORES/includes/Cache.php(49): ORES\Cache->processRevision(array, string, array)
  #2 /srv/mediawiki/php-master/extensions/ORES/includes/FetchScoreJob.php(72): ORES\Cache->storeScores(array)
  #3 /srv/mediawiki/php-master/includes/jobqueue/JobRunner.php(295): ORES\FetchScoreJob->run()
  #4 /srv/mediawiki/php-master/includes/jobqueue/JobRunner.php(193): JobRunner->executeJob(ORES\FetchScoreJob, Wikimedia\Rdbms\LBFactoryMulti, BufferingStatsdDataFactory, integer)
  #5 /srv/mediawiki/rpc/RunJobs.php(47): JobRunner->run(array)
  #6 {main}

Change 383625 had a related patch set uploaded (by Awight; owner: Awight):
[mediawiki/extensions/ORES@master] Support new thresholds API (take 2)

https://gerrit.wikimedia.org/r/383625

Change 383626 had a related patch set uploaded (by Awight; owner: Awight):
[mediawiki/extensions/ORES@master] Fallback to old thresholds API as necessary (take 2)

https://gerrit.wikimedia.org/r/383626

Change 383625 merged by jenkins-bot:
[mediawiki/extensions/ORES@master] Support new thresholds API (take 2)

https://gerrit.wikimedia.org/r/383625

Change 383626 merged by jenkins-bot:
[mediawiki/extensions/ORES@master] Fallback to old thresholds API as necessary (take 2)

https://gerrit.wikimedia.org/r/383626

I've left the beta cluster in another pickle for the night. The service is rolled back to 1.3, using scap to roll back to -r 42c56632e. The Extension:ORES has new code with the compatibility, however its FetchScoreJob is broken. The request is made using the v3 API, but we parse the response as v1. I missed this in testing.

I'll push a workaround for review, in which the Scoring class uses ApiV1. In a day or so, I'll push another patch to clean this up and parse v3 responses correctly.

Change 383776 had a related patch set uploaded (by Awight; owner: Awight):
[mediawiki/extensions/ORES@master] Temporarily use the v1 API for fetching scores.

https://gerrit.wikimedia.org/r/383776

Change 383776 merged by jenkins-bot:
[mediawiki/extensions/ORES@master] Temporarily use the v1 API for fetching scores.

https://gerrit.wikimedia.org/r/383776

Change 383895 had a related patch set uploaded (by Awight; owner: Awight):
[mediawiki/extensions/ORES@master] Choose more magical magic

https://gerrit.wikimedia.org/r/383895

Change 383895 merged by jenkins-bot:
[mediawiki/extensions/ORES@master] Choose more magical magic

https://gerrit.wikimedia.org/r/383895

Change 386374 had a related patch set uploaded (by Ladsgroup; owner: Amir Sarabadani):
[mediawiki/extensions/ORES@master] Fix undefined variable in Stats::mungeV1Forumula()

https://gerrit.wikimedia.org/r/386374

Change 386374 merged by jenkins-bot:
[mediawiki/extensions/ORES@master] Fix undefined variable in Stats::mungeV1Forumula()

https://gerrit.wikimedia.org/r/386374

awight closed this task as Resolved.Oct 25 2017, 9:11 PM