Page MenuHomePhabricator

Set up revscoring entry points in RESTBase
Open, NormalPublic

Description

We'd like to expose revision scoring data produced by the revscoring service in a way that

  • can be used by production tools and gadgets,
  • does not introduce security vulnerabilities,
  • performs and scales well,
  • is discoverable and well documented, and
  • is easy to use for developers and gadget authors.

We can achieve many of these objectives by setting up entry points in the RESTBase REST API:

  • the entry points become part of the wider REST content API, which makes them discoverable
  • interactive sandbox & documentation encourages experimentation & helps developers get started
  • revision scores can be stored
  • request latency is low
  • RESTBase enforces valid JSON syntax and sends CSP headers for security
  • metrics covering request latencies and status are automatically provided per entry point, as well as all requests to the revscoring service

API strawman

The current entry points exposed by the revscoring service are:

Additional types of scores will likely be added later.

Based on this, we could consider hooking this up as

  • /api/rest_v1/page/revision/{revision}/score/wp10
  • /api/rest_v1/page/revision/{revision}/score/reverted

By default, /api/rest_v1/page/revision/{revision}/score/ would provide a listing of available scores. Each score type has its own documentation section, which can include rich markdown content with links to background information.

Event Timeline

GWicke assigned this task to Halfak.
GWicke raised the priority of this task from to High.
GWicke updated the task description. (Show Details)
GWicke added subscribers: csteipp, mobrovac, GWicke.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJul 28 2015, 7:43 PM
GWicke updated the task description. (Show Details)Jul 28 2015, 7:43 PM
GWicke set Security to None.
Sitic added a subscriber: Sitic.Jul 28 2015, 7:49 PM

The revscoring API also supports scoring multiple revivsions at once, and that's the reccomended calling pattern as well. Not sure how exactly to support that via restbase.

GWicke added a comment.EditedJul 28 2015, 9:17 PM

The revscoring API also supports scoring multiple revivsions at once, and that's the reccomended calling pattern as well.

Hmm, indeed:

http://ores.wmflabs.org/scores/enwiki/?revids=668898500|34854258&models=reverted

Not sure how exactly to support that via restbase.

One option would be to provide a similar entry point, and convert the list of revision ids into individual requests in RESTBase. That way stored scores for each of the revision ids could still be reused.

I do wonder though if the batching is worth the complexity, considering that 80+% of our clients are using SPDY with very low per-request overheads. Without batching, we could also cache responses in Varnish to further speed up response latencies.

DarTar added a subscriber: DarTar.Jul 29 2015, 12:21 AM
Ricordisamoa added a subscriber: Ricordisamoa.

+1 for just using the single revision score request pattern within RESTBase.

Generally it is nice to support batching because we get about a 2x speedup on scoring new revisions and many of our users will be sending requests this way. However, once we have a pre-caching daemon in place, this will only matter for historically analyses and we can have a either a separate instance or direct access to ORES to serve those types of requests.

He7d3r added a subscriber: He7d3r.Jul 30 2015, 10:57 PM
Halfak moved this task from Done to Backlog on the Scoring-platform-team (Current) board.
DarTar moved this task from Staged to Radar on the Research board.Aug 6 2015, 10:18 PM
Halfak added a comment.Aug 7 2015, 4:52 PM

@GWicke, it looks like this might be done. Is there anything else you would like me to do right now?

GWicke added a comment.EditedAug 8 2015, 12:28 AM

@Halfak: No, I think at this point it's up to us to hook it up. I'm waiting for T105975 to land, as that will make it more convenient.

Edit: Just spoke with @yuvipanda, and he's working on packaging & deployment preps. So, we can prepare the entry points, but don't have a production backend to point this to yet.

GWicke moved this task from Backlog to Ready / next on the RESTBase board.Aug 8 2015, 12:28 AM
Halfak added a comment.Aug 8 2015, 2:15 PM

Makes sense. Thanks for the update.

Halfak removed Halfak as the assignee of this task.Sep 3 2015, 10:41 PM
yuvipanda closed this task as Declined.Nov 20 2015, 4:23 AM
yuvipanda claimed this task.

I'm not really sure if we need this - we might just send traffic directly from varnish to ores instead. RESTBase adds an extra layer of complexity without much benefit in ORES's case (since nothing is cacheable at the HTTP layer atm), and I'd prefer to just keep the entire setup simpler by skipping RESTBase.

@yuvipanda: The discussion above indicated that responses should be fairly easy to cache. Has this changed?

That said, even for non-cacheable end points there are benefits like consistent documentation / swagger specs, discoverability, request metrics, and authentication support for private wikis. It also lets us reuse the regular project domain connection, which is good for performance if clients hit revscoring directly. Finally, it avoids adding even more special-case logic to Varnish.

We are happy to help with writing the swagger spec needed for hooking this up in the REST API. There shouldn't be any need to write code.

From talking to @BBlack a while ago current vague plan is to have things similar to en.wikipedia.org/api/<ores-specific-name>, similar to how we have en.wikipedia.org/api/rest_v1 for restbase, and have that hit the lvs endpoint from varnish. This would allow it to continue using the regular project domain connection. Also revscoring would never be appropriate for private wikis, and I think the swagger stuff will be extrenous in this case since someone (who?) will need to maintain it forever to sync with any change that happen in the python codebase itself.

From my perspective, the options are:

  1. varnish -> ores
  2. varnish -> restbase -> ores

and #1 is simpler, and there aren't much advantages to #2 in this case and there's an additional layer of both http requests and complexity that I'd rather avoid.

GWicke added a comment.EditedNov 20 2015, 6:28 AM

I think the swagger stuff will be extrenous in this case since someone (who?) will need to maintain it forever to sync with any change that happen in the python codebase itself.

Creating a basic spec is pretty easy, and quickly pays for itself by getting users started quickly. There are also benefits like automatic service monitoring & integration tests for the end point & service, which make sure that the spec stays in sync with the python codebase.

If there are no objections we'll probably set up an entry point in any case, even if it's just to make the API more discoverable. Whether all requests pass through restbase is a different question, and really up to the Varnish configuration. It is absolutely an option to set up a rule for /api/rest_v1/page/revision/{revision}/score that sends those requests straight to LVS, if the hop through RESTBase is adding too much overhead.

Let us know when the LVS end point is ready.

GWicke reopened this task as Open.Nov 20 2015, 6:29 AM
Joe added a subscriber: Joe.Nov 20 2015, 8:38 AM

While I don't see any even remotely sensible reason to want to put restbase in front of this (instead of just going directly via varnish), I do agree that adding and maintaining a spec for any new service is a good idea for a few reasons:

  1. Reciprocal autodiscovery would have a standardized format
  2. Any service exposing a swagger spec can get automatic monitoring of selected endpoints

Since that has nothing to do with restbase, I'm re-closing this ticket. I think we might still open one on creating a spec for revscoring.

Joe closed this task as Declined.Nov 20 2015, 8:38 AM
Joe added a project: Operations.
GWicke removed yuvipanda as the assignee of this task.Nov 20 2015, 3:20 PM
GWicke lowered the priority of this task from High to Normal.

@Joe: If you read carefully, your proposal is actually compatible with what I said just above in T107196#1820063. The only difference is that you would like to keep the documentation separate, while I think that there is value in promoting / documenting this API as part of the wider REST API.

Since this is a services task primarily I'm going to reopen this task as a reminder to ourselves to revisit this. In particular, I'd be interested in hearing @Halfak's input on whether he would prefer revscoring to be documented in the wider REST API or not.

GWicke reopened this task as Open.Nov 20 2015, 3:57 PM

Just got done talking to @GWicke on IRC. We talked about what it would take to set up ORES within restbase or beside it. We also talked about how swagger specs work. It seems clear that we want that regardless, so I opened a task for ORES: T119271

Given that we're not planning on making use of RESTBase, I think it makes the most sense to have the end-point *next to* RESTBase and listed at /api/ along with the Action API and RESTBase. Reading through @GWicke's goals for this ticket (in the description), it seems that we achieve all of them with this strategy.

I don't see any concrete reason in the ticket as to why not to go through RESTBase. The big benefit from my point of view is a unified API, all the more so because ORES' endpoints can be really well integrated into it, since they complete revision information.

That said, this is also an option I'd endorse:

It is absolutely an option to set up a rule for /api/rest_v1/page/revision/{revision}/score that sends those requests straight to LVS, if the hop through RESTBase is adding too much overhead.