Page MenuHomePhabricator

[Discuss] api.php integration with ORES
Closed, ResolvedPublic

Description

A discussion started in T112956 that probably deserves it's own ticket.

Let's discuss how ORES could integrate into api.php and what that would look like.

This task is done when: We have a plan for which integration points will happen first (if any).

Next steps

  • T143614: Introduce ORES rvprop
  • T143616: Introduce rcshow=oresreview and similar ones
  • T143617: Expose ores_model data in API using meta=ores

Background
Here's some useful quotes from T112956

Chiming in here because this question came up in discussions regarding the ORES API. It seems that we are discussing this as either/or when I think we can have both since APIs can consume other APIs.

As an API developer, I don't want to be constrained in that *everything* needs to operate behind either api.php? or restbase_v1/ (so, +1 @Nuria). This is because the API that we develop might not always make sense within the conventions of these spaces. I think it makes more sense that new API's adopt an appropriately flexible endpoint first and that we work on integration/bridges afterwards.

E.g. ORES scores edits, so it fits well within api.php?query=revisions and restbase_v1/pages/revisions, but it also has endpoints that provide information about model fitness and other details that don't really make sense in these locations. api.php and restbase_v1/ can consume the service just like any user and act as a bridge. In this case, we'd have both the flexibility to do new things with APIs *and* we can include the relevant outputs in api.php? and restbase_v1/. @Anomie, at some point, I'd like to discuss with you what ORES integration in api.php would look like. I've already talked to @GWicke about what a bridge into restbase_v1 will look like.

The tradeoff of this strategy, of course, is that we'll have two ways to get at the same data. Honestly, I think that this is more desirable than the alternatives. If you want to get at the data through a familiar interface and are happy with the limitation imposed, use the extension to a standard API. If you need more or you want to work with something that's cutting edge and doesn't have a bridge/integration yet, learn how the flexible endpoint works.

E.g. ORES scores edits, so it fits well within api.php?query=revisions and restbase_v1/pages/revisions, but it also has endpoints that provide information about model fitness and other details that don't really make sense in these locations.

Model fitness and such could fit into the action API as a query meta module, if it's useful data.

@Anomie, at some point, I'd like to discuss with you what ORES integration in api.php would look like.

Yes, we should definitely do that.

The tradeoff of this strategy, of course, is that we'll have two ways to get at the same data. Honestly, I think that this is more desirable than the alternatives. If you want to get at the data through a familiar interface and are happy with the limitation imposed, use the extension to a standard API. If you need more or you want to work with something that's cutting edge and doesn't have a bridge/integration yet, learn how the flexible endpoint works.

+1

Event Timeline

Halfak raised the priority of this task from to Needs Triage.
Halfak updated the task description. (Show Details)
Halfak moved this task to Parked on the Machine-Learning-Team (Active Tasks) board.
Halfak added subscribers: Halfak, Anomie, Nuria, GWicke.

There are generically four kinds of modules in the action API:

  1. Query "prop" modules, that return information about pages (in sets of up to 5000 pages).
  2. Query "list" modules, that list something. Often they list titles.
  3. Query "meta" modules, that return metadata of some sort about the site.
  4. Action modules, that generally do something that's not querying.

A prop module can take revision IDs as input rather than page IDs (prop=revisions supports this, for example, and a few generators support generating revision IDs), which would be a good fit for the main functionality of ORES. When given page IDs instead of revision IDs, the module would probably look up the latest rev_id for each page to pass to the backend service/API.

Or we could add some hooks in ApiQueryRevisionsBase to allow for easily adding a new rvprop (and in this case an "rvoresmodels" parameter too) and fetching the relevant data. There's a few places in the API where something like this would be useful: ApiQueryInfo (page-level stuff), ApiQueryRevisionsBase (revision-level stuff like this), and ApiQueryImageInfo (files, see T89971) immediately come to mind. OTOH, ApiQuerySiteInfo shows the disadvantage of such a scheme: Where do we draw the line that something is complicated enough that it should be its own module instead of hooking into an existing one?

As for the "endpoints that provide information about model fitness and other details", is that referring to this data? That would work well as a meta module if there's a use case for it. We'll have to query that data internally to determine the available models for action=help and action=paraminfo (unless the backend service/API has a more specific endpoint for getting just the models without the extraneous data that isn't being exposed by the public API at ores.wmflabs.org), but unless there's a use case beyond "what models are available?" that's already served by action=help and action=paraminfo I wouldn't recommend we bother with the meta module right away.

My proposal about API for ORES extension follows:

  • We need to use ORES in several places but most importantly we need it as prop of revision (rvprop) e.g. in most of cases we query like this and result is like this but if users add "oresscore" to rvprop a new result should have been returned like this. An extra json part:
"oresscore": {
    "damaging": {
        "true" : 0.4320,
        "false": 0.5680
     }
}

Once we agreed on design, implementing them is easy, given that everything is stored in ores_classification table.

  • We need to add two cases to "..show" parameters when list=watchlist or recentchanges, or usercontribs (for the sake of consistency between GUI and API). I suggest we add "oresreview" and "!oresreview" which the first one only shows edits that passes a certain threshold [1] and second one only shows good edits. we can simply query from ores_classification table
  • We need to add "ores" to meta during query, e.g. &action=query&meta=ores should return data from ores_model table and expose these data

[1]: This threshold is $wgOresDamagingThreshold and it's determined wiki-wide but my plan is to change it to something users can alter in their preferences.

@Halfak is this something you're still interested in? What sorts of use cases for the api.php endpoint would you envision?

Hey @dr0ptp4kt, I envision uses combining ORES results with other queries. E.g. get revision info and include ORES scores or filtering a query based on scores. We already keep a cache of ORES scores in the DB for wikis with MediaWiki-extensions-ORES installed. This cache/table tracks RecentChanges and I imagine could be populated historically based on API requests.

Halfak renamed this task from [Discussion] api.php integration with ORES to api.php integration with ORES.Aug 18 2016, 9:31 PM
Halfak triaged this task as Low priority.
Halfak moved this task from Ideas to New development on the Machine-Learning-Team board.

I made related phab cards to implement such schema, the schema got approved by @Catrope, I will keep iterating on the development process.

Halfak renamed this task from api.php integration with ORES to [Discuss] api.php integration with ORES.Aug 29 2016, 4:47 PM

@Halfak @Ladsgroup can you guys briefly comment on this ticket being closed as resolved? Is there "a plan for which integration points will happen first (if any)".

@Ladsgroup: if you created separate cards can you link them / mention them here?

@DarTar, please see cards mentioned above:

  • T143614: Introduce ORES rvprop .Mon, Aug 22, 21:09
  • T143616: Introduce rcshow=oresreview and similar ones.
  • T143617: Expose ores_model data in API using meta=ores.

@Halfak excellent, thanks. I'll copy them in the description.