Page MenuHomePhabricator

Implement JS ORES client in mw-ORES extension
Closed, ResolvedPublic

Description

Inspired by:

apiClient = new mw.Api();
apiClient.get({action: "query", prop: "revisions", ...}).done(function(data){})

We should have an ORES api client available for use in MW for supporting gadgets and other JS-based ORES usages.

oresClient = new ores.API();
ores.score({revids: [1234, 1235, ...], models: ["damaging", "wp10"]}).done(function(scores){})

To complicate things, we should also have the means to submit a large quantity of revisions for scoring to ORES and to optimize the request pattern by batch-size and # of parallel connection threads.

ores.score({revids: [1234, ...], ..., batch_size: 50, connections: 2}).progress(function(score_batch){})

Here, the function passed to "progress" would be called multiple times -- one for each batch that returns from ORES. Order would not be guaranteed.

Event Timeline

Halfak created this task.Aug 10 2018, 3:25 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptAug 10 2018, 3:25 PM
Restricted Application added a project: User-Ladsgroup. · View Herald TranscriptAug 17 2018, 7:34 PM

Change 459549 had a related patch set uploaded (by Ladsgroup; owner: Amir Sarabadani):
[mediawiki/extensions/ORES@master] Expose basic ORES API configs to frontend

https://gerrit.wikimedia.org/r/459549

Change 459561 had a related patch set uploaded (by Ladsgroup; owner: Amir Sarabadani):
[mediawiki/extensions/ORES@master] [WIP] Introduce mw.Ores.Api

https://gerrit.wikimedia.org/r/459561

With my patches, this works:

oresClient = new mw.Ores.Api();
oresClient.get({revids: [1234, 1235], models: ["damaging", "articlequality"]}).done(function(result){ console.log(result)});

But it needs more polishing.

Change 459561 abandoned by Ladsgroup:
[WIP] Introduce mw.Ores.Api

Reason:
No need, merged in the parent.

https://gerrit.wikimedia.org/r/459561

Halfak added a comment.EditedOct 19 2018, 6:13 PM

In T207505#4681434, I describe a complete batching approach. This differs from the specifications originally put forward in the task. In the end, I think it is much better. With this code, a pool is constructed for a specific type of request and revIds can be added to the queue adhoc. As described in that comment, the whole system works transparently. E.g. one could do the following and expect that batching and pool will take care of the details.

oresApi = new OresApi({host: "https://ores.wikimedia.org", dbname: "enwiki"})
aqPool = oresApi.pool(["articlequality"], {maxWorkers: 2, batchSize: 10})

aqPool.score(1234567)
    .done(function(scoreDoc){...do things with score...})
aqPool.score(1234568)
    .done(function(scoreDoc){...do things with score...})
aqPool.score(1234569)
    .done(function(scoreDoc){...do things with score...})
...

As a result of this call, a maximum of 2 connections to ORES will be made at a time and the revIds will be batched as expected.

awight added a subscriber: awight.Oct 30 2018, 4:51 PM

I'd like to see the pooling done transparently, maybe defaulting to 1 worker but keeping the promise signature.

Halfak added a comment.Nov 8 2018, 2:23 AM

This would be doubly-hard because we can't batch without specifying a set of models to use in the pool. So, I'm not sure we can do this transparently -- at least not with the current version of ORES.

OK so new idea: separate queues. One queue per unique set of models.

Each requested score is given an index that is incremented based on what order the request was added to the queue. Batches are drawn from queues depending on which one contains the lowest index.

Expected performance: Identical to single-set-of-models batching when only querying for a unique set. Behavior would be somewhat strange in the scenario where two or more different model-sets are requested in a collated pattern. Requests for one model-set would return in batch earlier than one might expect. But with multiple query threads, the difference should be minor.

Considering the enterprise-y level of machinery going into this library, I suggest making it a CommonJS module, reusable from nodejs or the browser.

The separate queues sound like a good workaround for the queue being coupled to models. Alternatively, it seems like we could tie the list of models to a request rather than have it be part of the queues state (if I'm understanding correctly). That lets us multiplex the requests using a single thread pool.

I don't see how to fit this into a general library.

I'm not sure what you mean by "tie the list of models to a request rather than have it be part of the queues state". ORES does not let us make batch requests with a mixture of different models.

I should mention that ORES has this constraint for a good reason -- different models need different data. So you can't batch a mixture of different data needs.

I don't see how to fit this into a general library.

"multiplatform" was the direction I was headed in, just cos Node.js applications deserve a high-performance ORES library too. I agree that the library shouldn't be generalized or rescoped.

I'm not sure what you mean by "tie the list of models to a request rather than have it be part of the queues state". ORES does not let us make batch requests with a mixture of different models.

Sorry—now I'm catching up. I hadn't noticed the cool feature where new requests get added to an *existing* queue, so client code can make single-revision requests and they're transparently batched. Your solution to maintain a queue per (wiki, model) makes perfect sense.

However, this single-request feature might lead to bad client behaviors. Ideally, the client will be able to find all articles that need scoring in a single scan of the active page, and will push them as a single, large batch that we can split. This is how jQuery works at least, everything is a set of zero or more elements, so it's a familiar paradigm for front-end code. Is there a good example showing why single-request is needed?

Yeah. The thing I built the code for works way better in a single-request pattern because it allows you to do simple closures to maintain references to specific elements that need to be modified. If we were manually making a bulk request, we need to explicitly associate key data and elements with the set of revisions to score so that it can be available when the score returns. In https://meta.wikimedia.org/wiki/User:EpochFail/ArticleQuality-system.js I was able to not make any code changes to the main system because one could make calls to ORES as though individual scores were being requested. When I was working on a bulk-request-only implementation, I was over 200 lines into rewrites trying to get that workflow to work.

In the end, it turns out that because of the way that Javascript does not actually support real concurrency, we don't have a problem with batching requests. Making a bunch of individual requests tends to manifest as full batches because the main function needs to complete before any callbacks (~threads) can be called.

@Ladsgroup I made a bunch of notes here. Would you like to implement batching before calling this task done or submit a followup task?

Change 459549 merged by jenkins-bot:
[mediawiki/extensions/ORES@master] Introduce ext.ores.api

https://gerrit.wikimedia.org/r/459549

@Halfak This task is for a basic implementation of the API. I think we should followups for those.

Ladsgroup closed this task as Resolved.Feb 21 2019, 7:18 PM