Page MenuHomePhabricator

Allow prop modules to decrease generator limits
Closed, InvalidPublic

Description

Sometimes a prop module is more expensive than the underlying generator and needs a lower per-request limit. A current example is revision diffs; two upcoming examples are T143895: [Epic] Implement ORES service proxy in api.php and T144865: Add pageview stats to the action API. In all three cases the limit is not really predictable since data for some revisions might be available from cache for free.

Currently this is solved by omitting the data from requests beyond the limit, which is unhelpful if that data is the main reason the user made the query. All three modules are/will be able to produce data for a small number of items even if it is not in the cache, but there is no way to allow the user to continue in a separate request from the first item with missing data.

To fix that, prop modules would need a way to peek at the resultset after the generator returned it but before other query modules started to process it, and truncate it once they decide their limit has been reached.

Event Timeline

Anomie subscribed.

Currently this is solved by omitting the data from requests beyond the limit, which is unhelpful if that data is the main reason the user made the query. All three modules are/will be able to produce data for a small number of items even if it is not in the cache, but there is no way to allow the user to continue in a separate request from the first item with missing data.

This is incorrect. The action API provides a robust continuation mechanism which can and should be used in this circumstance.

I am not sure what you mean. The problem as I understand it is that the continuation mechanism cannot take the limits of prop modules into account. E.g. I want to get pageviews for the recent changes; I ask the query api with generator=recentchanges, grclimit=100, prop=pageviews; but the pageview module can only provide data for the first 10 RC items. What should the module in such a situation?

The same thing that a module like prop=links does if there are more links on all the generated pages than it can do in one query. See the behavior of this query for example.

generator=recentchanges is a slightly special case if you're using the default parameters (rcdir=older, no rcstart) since it starts with the most recent changes and on a busy wiki like enwiki there will probably be more "most recent changes" by the time you get to continuation, but the solution for that is to make it possible for a generator to specify an "I started here" value to ApiContinuationManager rather than trying to have every prop module pre-limit the generator. See T146176 for that.

I think I get it now, thanks. So if both the generator and a props module set a continuation, the props continuation will be called until it's exhausted and the generator continuation only after that?

Exactly right. The main motivation for the "new" continuation (remember T96858: Use "new" continuation by default) was to have the server handle that logic instead of making every client reimplement it.

Isn't that still a bit painful from the client's POV? For example mw:API:Query#Continuing_queries has a code example but that wouldn't really be helpful in such a case; instead, the query method would have to keep querying while the same page set is returned, and do some sort of recursive merge on the results. For links that cannot easily be helped since you might need multiple queries to get all the links for a single page. For modules which return exactly one data item per page, returning one fully filled page result per response seems much nicer from the user's point of view. (OTOH maybe most client libraries already deal with this problem?)

Since client libraries have to handle the more complicated link case, the "one result" case is trivially taken care of by the same code.