Introduce ORES rvprop
Closed, ResolvedPublic

Description

See T122689#1939440

We need to use ORES in several places but most importantly we need it as prop of revision (rvprop) e.g. in most of cases we query like this and result is like this but if users add "oresscore" to rvprop a new result should have been returned like this. An extra json part:

"oresscore": {
    "damaging": {
        "true" : 0.4320,
        "false": 0.5680
     }
}

Once we agreed on design, implementing them is easy, given that everything is stored in ores_classification table.

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptAug 22 2016, 9:09 PM
Anomie added a subscriber: Anomie.Aug 23 2016, 1:54 PM
Halfak triaged this task as Normal priority.
Anomie updated the task description. (Show Details)Sep 19 2016, 6:31 PM
Anomie added a comment.EditedSep 20 2016, 5:10 PM

One open question is what to do for revisions that are not already saved in the database:

  1. Return no scores for those revisions
  2. Fetch scores for those revisions
    • How many would be sane to fetch in one API request? We'd need to choose a limit that won't take too long to execute.
    • If there are more than that many revisions that need fetching, is it worth the added complexity of scheduling jobs to hopefully load-and-cache the remaining revisions before the client submits the continuation?

We might answer the question differently by endpoint, for example list=recentchanges and list=watchlist might return no score (to avoid hundreds of clients all fetching scores for the same just-created revision before the FetchScoreJob runs) while the others fetch.

  • How many would be sane to fetch in one API request? We'd need to choose a limit that won't take too long to execute.

In the service endpoint 50 is our safest bet, specially on recent change and watchlist because the data probably already stored in redis cache in ores service.

  • If there are more than that many revisions that need fetching, is it worth the added complexity of scheduling jobs to hopefully load-and-cache the remaining revisions before the client submits the continuation?

We already have an abstraction layer to trigger jobs of scoring and storing revisions. We can use it.

We already have an abstraction layer to trigger jobs of scoring and storing revisions. We can use it.

Although I note FetchScoreJob only does one revision at a time. BTW, what is the 'precache' parameter it passes to ORES in the one case it's currently triggered?

Although I note FetchScoreJob only does one revision at a time.

We can work on it and make it accept more than one, I'm guessing it won't be hard.

BTW, what is the 'precache' parameter it passes to ORES in the one case it's currently triggered?

It's for the ORES service to understand source of requests, precache is when the edit is made. So we don't need precache in rvprop or other API modules.

Although I note FetchScoreJob only does one revision at a time.

We can work on it and make it accept more than one, I'm guessing it won't be hard.

Looking at it closer, it accepts multiple revids for the revid parameter without any issue. Nothing in the job actually depends on being passed only one revid.

Change 313830 had a related patch set uploaded (by Anomie):
API: Add hooks for ApiQueryBase's query and row-processing

https://gerrit.wikimedia.org/r/313830

Change 313831 had a related patch set uploaded (by Anomie):
Action API integration for ORES

https://gerrit.wikimedia.org/r/313831

Change 313830 merged by jenkins-bot:
API: Add hooks for ApiQueryBase's query and row-processing

https://gerrit.wikimedia.org/r/313830

Halfak closed this task as Resolved.Oct 11 2016, 11:51 PM
Halfak claimed this task.
Anomie reopened this task as Open.Oct 12 2016, 2:35 PM
Anomie claimed this task.
Anomie added a subscriber: Halfak.

This isn't done yet.

Change 313831 merged by jenkins-bot:
Action API integration for ORES

https://gerrit.wikimedia.org/r/313831

DarTar awarded a token.Nov 7 2016, 8:53 PM