Limit on number of piped values in API should throw an error
Closed, ResolvedPublic

Description

There is a limit on number of piped values (arrays) in the API.


Version: unspecified
Severity: normal
See Also:
T41937: Aliases edit in UI hits the bug on maximum number of piped values in the API

Details

Reference
bz39936
bzimport raised the priority of this task from to Normal.
bzimport set Reference to bz39936.
bzimport added a subscriber: Unknown Object (MLST).
jeblad created this task.Sep 3 2012, 12:39 PM
Denny added a comment.Sep 6 2012, 9:53 AM

This is about piped values in the request parameters. There is a limit on them (by default, 50 for normal users) which is fine. But if there are more than the limit number of items, the rest gets cut off silently and the action is still performed with the cut off parameter list. There should be an error message instead.

(In reply to comment #0)

There is a limit on number of piped values (arrays) in the API.

You can try to get the apihighlimit right to get a limit of 500.

(In reply to comment #1)

This is about piped values in the request parameters. There is a limit on them
(by default, 50 for normal users) which is fine. But if there are more than the
limit number of items, the rest gets cut off silently and the action is still
performed with the cut off parameter list. There should be an error message
instead.

Check the warnings:

<warnings>
  <query xml:space="preserve">Too many values supplied for parameter 'titles': the limit is 50</query>
</warnings>

a.d.bergi wrote:

I guess the warning is enough and marks the "gets cut off silently" as invalid.

However, it would be nice if that limit could be queried somehow, you only have to guess from your userrights.contains("apihighlimit") whether it is 50 or 500.

Reedy added a comment.Sep 24 2012, 7:46 PM

(In reply to comment #3)

I guess the warning is enough and marks the "gets cut off silently" as invalid.

However, it would be nice if that limit could be queried somehow, you only have
to guess from your userrights.contains("apihighlimit") whether it is 50 or 500.

Not the best example...

https://en.wikipedia.org/w/api.php?action=paraminfo&modules=parse

<param name="prop" description="Which pieces of information to get&#10; text - Gives the parsed text of the wikitext&#10; langlinks - Gives the language links in the parsed wikitext&#10; categories - Gives the categories in the parsed wikitext&#10; categorieshtml - Gives the HTML version of the categories&#10; languageshtml - Gives the HTML version of the language links&#10; links - Gives the internal links in the parsed wikitext&#10; templates - Gives the templates in the parsed wikitext&#10; images - Gives the images in the parsed wikitext&#10; externallinks - Gives the external links in the parsed wikitext&#10; sections - Gives the sections in the parsed wikitext&#10; revid - Adds the revision ID of the parsed page&#10; displaytitle - Adds the title of the parsed wikitext&#10; headitems - Gives items to put in the &lt;head&gt; of the page&#10; headhtml - Gives parsed &lt;head&gt; of the page&#10; iwlinks - Gives interwiki links in the parsed wikitext&#10; wikitext - Gives the original wikitext that was parsed&#10; properties - Gives various properties defined in the parsed wikitext" default="text|langlinks|categories|links|templates|images|externallinks|sections|revid|displaytitle|iwlinks|properties" multi="" limit="500" lowlimit="50" highlimit="500">

  <type>
    <t>text</t>
    <t>langlinks</t>
    <t>languageshtml</t>
    <t>categories</t>
    <t>categorieshtml</t>
    <t>links</t>
    <t>templates</t>
    <t>images</t>
    <t>externallinks</t>
    <t>sections</t>
    <t>revid</t>
    <t>displaytitle</t>
    <t>headitems</t>
    <t>headhtml</t>
    <t>iwlinks</t>
    <t>wikitext</t>
    <t>properties</t>
  </type>
</param>

I'm not sure if I would mark this as resolved, as I believe what the API does now are wrong. If some limitations exists that makes the API unable to fulfill its contract with the requester, then the API should fail. If the requester ask for an operation that imply a list of 50+ values and the API can only handle 50 of them, well then the request should fail. Only if the requester includes some parameter that says it is ok with a partial result should the API attempt to continue with truncated lists. something like "partial=continue" that defalts to "partial=fail".

In our case where we observed this we had lists of aliases and someone could create lists of 50+ aliases which would then be truncated when someone with lesser rights edited the same aliases. This is now changed so anyone with a limit of 50 values can only change up to 50 at a time. This kind of work but creates weird errors during bulk upload.

There is at least one way we can get around this, but I tend to think that this is an issue that should be fixed in core.

Perhaps the most obvious argument isn't clear from the previous. We need an error report before data gets corrupted, not after the API has corrupted the data due to truncated lists.

(In reply to comment #3)

I guess the warning is enough and marks the "gets cut off silently" as invalid.

However, it would be nice if that limit could be queried somehow, you only have
to guess from your userrights.contains("apihighlimit") whether it is 50 or 500.

sadly, warnings don't use machine readable codes. So warnings are only useful for humans, not for client bots.

I guess we could add a flag to mw.config if userrights.contains("apihighlimit") and then limit the requests accordingly. If the UI tries to build a request that will end up truncated it could intercept its own request and warn the user before sending it to the server.

a.d.bergi wrote:

OK, I agree that the non-machine-readable warning is a bug.

However I don't think we need an extra parameter for the api to throw an error instead of truncating when there are too many values.

As we've seen, you can query the api (userinfo, paraminfo) for your current limit and split the request if needed into chunks of the correct size.
Even if you don't want to do that, you can count the results you got and continue requesting if got over the limit - I did that once when getting paraminfo for *all* querymodules :-)

(In reply to comment #9)

However I don't think we need an extra parameter for the api to throw an error
instead of truncating when there are too many values.

Right: it should *always* throw an error.

Not that this issue is far more severe for queries that update data - in that case, it causes silent data loss. That's what caused john to file the initial report.

As we've seen, you can query the api (userinfo, paraminfo) for your current
limit and split the request if needed into chunks of the correct size.

For that to work, the client needs a way to even determine that limit. I don't think there's currently any reliable way to do that.

Even if you don't want to do that, you can count the results you got and
continue requesting if got over the limit - I did that once when getting
paraminfo for *all* querymodules :-)

So... that would basically be a paging mechanism. Orthogonal to the existing paging mechanism. I think that can get pretty confusing...

a.d.bergi wrote:

(In reply to comment #10)

Right: it should *always* throw an error.

A warning is enogh imho.

Not that this issue is far more severe for queries that update data - in that
case, it causes silent data loss. That's what caused john to file the initial
report.

Which query (in terms of http://www.mediawiki.org/wiki/API:Query) does update data?
Or do we really encounter too long piped-lists in any of the update modules (http://www.mediawiki.org/wiki/API:Changing_wiki_content)? That might be worth an error, yes.

> Even if you don't want to do that, you can count the results you got and
> continue requesting if got over the limit - I did that once when getting
> paraminfo for *all* querymodules :-)

So... that would basically be a paging mechanism. Orthogonal to the existing
paging mechanism. I think that can get pretty confusing...

Oh, we already have that three-dimensional paging mechanism for queries (generators, properties, and big page sets) :-) But yes, it *is* confusing and not easy to implement.
However, a kind of query-continue value in the result (instead of error/warning?) which contains the part of the list that was not handled yet (to continue with) would be very nice.

> Even if you don't want to do that, you can count the results you got and
> continue requesting if got over the limit - I did that once when getting
> paraminfo for *all* querymodules :-)

This isn't about the query modules, its about core functionality in the API which has implications on other use of it, that is the Wikidata project. In this project we set, get and manipulates data and if we use lists for some of the operations then they will be truncated and still passed on. That creates silent data loss.

In some cases we can code around this, but in other cases its not possible. For the UI we can pass on the limits in mw.config but for bots it gets very cumbersome.

It would be far better to throw an error if the request is out of limit and there is no way to survive a data loss. Perhaps this could be marked in the getAllowedParams as "changes to this list is fatal".

'values' => array(
ApiBase::PARAM_TYPE => 'string',
ApiBase::PARAM_ISMULTI => true,
ApiBase::PARAM_NOTRUNC => true,
),

(In reply to comment #12)

It would be far better to throw an error if the request is out of limit and
there is no way to survive a data loss. Perhaps this could be marked in the
getAllowedParams as "changes to this list is fatal".

'values' => array(

ApiBase::PARAM_TYPE => 'string',
ApiBase::PARAM_ISMULTI => true,
ApiBase::PARAM_NOTRUNC => true,

),

That seems reasonable to me. Returning less data than requested in action=query because something got cut off is fine (and that behavior should be kept), but if you have a module where truncation would be catastrophic, then telling the validator to error out rather than truncating is a good idea.

Change 306457 had a related patch set uploaded (by Anomie):
API: Log when too many values are passed for a multi-valued parameter

https://gerrit.wikimedia.org/r/306457

Change 306457 merged by jenkins-bot:
API: Log when too many values are passed for a multi-valued parameter

https://gerrit.wikimedia.org/r/306457

Anomie set Security to None.
Anomie renamed this task from Limit on number of piped values in API should throw machine-readable warning to Limit on number of piped values in API should throw an error.Jan 17 2017, 5:29 PM

Change 355863 had a related patch set uploaded (by Anomie; owner: Anomie):
[mediawiki/extensions/TextExtracts@master] Chunk page ids in internal API call to avoid too-many-pageids-for-query

https://gerrit.wikimedia.org/r/355863

Change 355864 had a related patch set uploaded (by Anomie; owner: Anomie):
[mediawiki/extensions/PageImages@master] Chunk page ids in internal API call to avoid too-many-pageids-for-query

https://gerrit.wikimedia.org/r/355864

Change 355866 had a related patch set uploaded (by Anomie; owner: Anomie):
[mediawiki/extensions/ProofreadPage@master] Chunk page ids in internal API call to avoid too-many-pageids-for-query

https://gerrit.wikimedia.org/r/355866

Change 355866 merged by jenkins-bot:
[mediawiki/extensions/ProofreadPage@master] Chunk page ids in internal API call to avoid too-many-pageids-for-query

https://gerrit.wikimedia.org/r/355866

Change 355863 merged by jenkins-bot:
[mediawiki/extensions/TextExtracts@master] Chunk page ids in internal API call to avoid too-many-pageids-for-query

https://gerrit.wikimedia.org/r/355863

Change 355864 merged by jenkins-bot:
[mediawiki/extensions/PageImages@master] Chunk page ids in internal API call to avoid too-many-pageids-for-query

https://gerrit.wikimedia.org/r/355864

Change 433742 had a related patch set uploaded (by Anomie; owner: Anomie):
[mediawiki/core@master] API: Raise an error when too many values are passed

https://gerrit.wikimedia.org/r/433742

Change 433742 merged by jenkins-bot:
[mediawiki/core@master] API: Raise an error when too many values are passed

https://gerrit.wikimedia.org/r/433742

Anomie closed this task as Resolved.May 23 2018, 12:59 PM
Anomie claimed this task.