Page MenuHomePhabricator

Action API: avoid using POST with action=query
Open, Stalled, MediumPublic

Description

Problem: POST is intended for requests that modify state, which queries should not do. Using POST for queries hides the query parameters from logs and metrics, making it harder to investigate incidents and identify usage patterns. It also makes it impossible to implement rules for specific queries (e.g. meta=tokens) in layers that sit in front of MediaWiki (e.g. for rate limiting in the API/REST gateway). Finally, POST requests have to be handled by the primary data center, preventing effective load balancing.

Proposal: The API documentation should include a recommendation to not use POST requests for queries, and if POST must be used, to include as many of the parameters as possible in the URL (compare T421288). Eventually, we should log a warning when receiving query requests as POST, and finally we should refuse such requests (possibly with some exceptions).

Exceptions:
Certain query modules require the request to be posted (codesearch). These would obviously have to be exempt from this rule. However, these modules should be examined to determine whether that requirement is actually needed, or accidental.

Sometimes it is necessary to list a large number of titles or IDs for processing in a query, which would exceed the maximum length of the URL if encoded in as query parameters. Using POST request for these cases could be allowed for users that have the apihighlimits right, since that indicates that the account is allowed to make expensive API queries.

Details

Related Changes in Gerrit:

Event Timeline

See also: T410883: Support HTTP QUERY method as standard alternative to Promise-Non-Write-API-Action header

IMHO we should make the QUERY method work before we deprecate using POST for the same purpose.

Proposal: The API documentation should include a recommendation to not use POST requests for queries, and if POST must be used, to include as many of the parameters as possible in the URL

How is the end user, tool writer, etc supposed to know how long of a URL can be used?

It can vary by browser, by servers...

The proposed solution here doesn't really seem to actually address the problem statement. I also agree with the other folks here where: a) Size is limited for query params which might be overly restrictive, b) it would be a change in established user behavior/integrations, c) the QUERY HTTP method is more compelling than simply forcing people to GET statements if we want to continue offering a query capability. QUERY is also supported in the latest version of OpenAPI, so it could be worth exploring next year.

That being said, the part of the problem statement that stood out to me (in addition to the comment about data center performance) is:

POST for queries hides the query parameters from logs and metrics

That seems like a more basic root cause, where we should figure out how to more effectively log parameters provided in POST bodies instead of creating a workaround for our own convenience/measurement purposes. I understand that there might be some POST body params that we wouldn't want to log (like long parse requests, for example), but creating a system where we can specifically exclude those values seems more correct. If the specific issue is the token handling though, I think it would be reasonable to say that including the token request in a POST body is explicitly not supported/"use at your own risk of rate limits" or something like that, since I do see where you're coming from with it being impossible to handle otherwise.

Additionally, the API Etiquette page already calls out that we prefer if people use GET over POST for the Action API:

POST requests
Whenever you're reading data from the web service API, you should try to use GET requests if possible, not POST, as the latter are not cacheable and, in multi-datacenter configurations (including Wikimedia sites), may go to a farther data center.

That suggests that if folks are using POST anyway, they likely have good reason to (for things like exceeding URL sizes or frankly ease of use of setting a POST body over a long URL encoded value, which also seems somewhat valid). The fact that some query modules expect or require a POST also suggests its validity and that it would potentially be a pretty big breaking change for Action API users. If we do want to make a callout about specifically not requesting tokens through a POST body though, the etiquette page might be a good place to add a note about it.

The other questions I have here are:

  1. How frequently is POST used for query over GET for queries? Looking at turnilo, it looks like it's less than 1% of requests, but I want to confirm that.
  2. What proportion of POST requests would exceed URL max lengths? (assuming a modern browser + our actual server config)
  3. What types of POSTs are you specifically concerned about? Is it restricted to the token request, or just in general?
  4. What is the risk & performance impact of unnecessary POSTs today? What actions besides query should we be considering in that? (for example, I think people typically use POST for parse requests even though they don't technically have to).

But yeah, I definitely tend to agree with @Reedy & @LucasWerkmeister on this one, where it seems like a bit of a knee jerk reaction to disable POST entirely. My strong preference would be to explore QUERY as an HTTP verb, potentially as a new REST endpoint alternative instead of breaking the existing Action API functionality.

But yeah, I definitely tend to agree with @Reedy & @LucasWerkmeister on this one, where it seems like a bit of a knee jerk reaction to disable POST entirely. My strong preference would be to explore QUERY as an HTTP verb, potentially as a new REST endpoint alternative instead of breaking the existing Action API functionality.

I'm not proposing to shut if off tomorrow, I just think we should actively work to move away from using POST for queries. The reasons are practical as well as semantic:

  • Conceptually, a query *gets* data, it doesn't *post* data. using the POST method is a hack to overcome technical limitations, and it should be constraint to situations where that is actually needed.
  • GET requests are defined to be "safe" (idempotent) and may be cacheable, so they can be handled by a passive DC or even a cache DC. POST has to always hits the application servers on the active DC.
  • We can't make decisions on rate limiting or routing based on the request body, since buffering and decoding the body in intermediate layers is impractical. So any information we need in layers that sit on top of MediaWiki will have to be in the URL. So if we need to e.g. exempt requests for tokens from API rate limiting, we can only do that if meta=tokens is specified as part of the URL. Using QUERY does not fix that, but more REST-ish routes would.

All this said, I'm in favor of moving towards a pattern where the relevant information is part of the path (not a query). So /query/v2/list/{something} could map to action=query&list={something}, and /query/v2/meta/{something} maps to action=query&meta={something}. If there is a list of titles or such that should go with the query, it could be encoded in a QUERY body. That would be nice and clean.

See also: T410883: Support HTTP QUERY method as standard alternative to Promise-Non-Write-API-Action header

IMHO we should make the QUERY method work before we deprecate using POST for the same purpose.

That sounds reasonable and shouldn't be too hard. Though it could be fiddly to make sure it's supported through all layers. The gateway currently doesn't support it, but it should be trivial to add.

How is the end user, tool writer, etc supposed to know how long of a URL can be used?

It can vary by browser, by servers...

True. We'd have to give clear guidance, e.g. "up to five titles and up to 20 IDs" or something like that.

daniel renamed this task from Action API: do not use POST with action=query to Action API: avoid using POST with action=query.Apr 7 2026, 7:23 PM

How is the end user, tool writer, etc supposed to know how long of a URL can be used?

It can vary by browser, by servers...

True. We'd have to give clear guidance, e.g. "up to five titles and up to 20 IDs" or something like that.

These days, apparently, 8000 bytes is commonly recommended as the minimum URL length that should be supported, and 2000 is a safer limit. https://stackoverflow.com/a/417184

There is indeed no way to know for sure, though. You can only try, and see if you get an error (and an error may not even be guaranteed; it seems the URL may also just get cut off).

Adding support for QUERY seems like a better long-term idea than documenting complex limitations on POST and GET.

Change #1269471 had a related patch set uploaded (by Bartosz Dziewoński; author: Bartosz Dziewoński):

[mediawiki/core@master] Action API landing page improvements

https://gerrit.wikimedia.org/r/1269471

My patch adds a mention in the docs on the API landing page, as a soft recommendation:

Action API requests may use GET and POST methods. Prefer using the GET method, unless the length of the URL with parameters would exceed its length limit (commonly 8000 bytes), or a module requires the POST method.

I hope that one we'll day update it to recommend using QUERY instead :)

  • GET requests are defined to be "safe" (idempotent) and may be cacheable, so they can be handled by a passive DC or even a cache DC. POST has to always hits the application servers on the active DC. […]

Note that the Promise-Non-Write-API-Action header allows POST requests to route to the nearest/passive DC.

  • We can't make decisions on rate limiting or routing based on the request body, […]. So if we need to e.g. exempt requests for tokens from API rate limiting, we can only do that if meta=tokens is specified as part of the URL. Using QUERY does not fix that, but more REST-ish routes would.

I assume that the vast majority of clients follow the API etiquette to make requests in series (instead of in parallel) and thus will never discover, hear about, understand, or need to implement rate limiting in any way. If they do, that's gonna be a bug for us to fix (like we found at the Arnhem Hackathon this year).

Whether they use the POST method or formulate an uncachable GET request, doesn't make a difference in cost to us, and they'll presumably be under the overall rate limit (before exemptions) either way. Given that the lower rate limit is already the forcing hand that gets us our desired outcome, I don't see a need for deprecation on top of that.

For the small set clients that need to violate this etiquette, by operating from multiple instances that perhaps individually follow the etiquette but we treat them as one (through their contact-address-in-user-agent or logged-in status), I can see how they may be more likely to trip a rate limit without incurring a heavy cost on us, due to certain cheap meta requests being counted the same as the heavier ones. It seems like a fair deal in this case to document that to avoid those requests from counting towards your main limit, that you may need to accomodate a few details. This wouldn't be so much a deprecation, but rather an optional guideline for <1% of clients that are given two choices:

  • slow down, or;
  • adapt their client slightly by moving action/list/meta fields to the query string, so that we can count cheap requests like these separate from their main limit.

Sigh. More stupid ideas to break tons of existing clients because of the poorly thought through rate limiting garbage.

and finally we should refuse such requests

Try to do this and you'll likely run into similar issues as T406283.

That seems like a more basic root cause, where we should figure out how to more effectively log parameters provided in POST bodies instead of creating a workaround for our own convenience/measurement purposes.

https://wikitech.wikimedia.org/wiki/Data_Platform/Data_Lake/Traffic/mediawiki_api_request

Similarly to T421288, I think this should be a recommendation, but not following it should never result in warnings, deprecations, errors, and so on. It may result in your requests being rate limited when they otherwise wouldn't be, and it may result in them being slower due to being routed to the wrong datacenter, but it doesn't seem like a necessary breaking change.

As @bd808 says, we have better ways of logging API actions that were taken, but as @daniel says, for POST requests with params in the body, we can't determine what action is about to be taken until we're halfway through processing the request.

I will also say again that I wish we'll be able to update this recommendation to suggest QUERY instead one day, which is obviously better.

That seems like a more basic root cause, where we should figure out how to more effectively log parameters provided in POST bodies instead of creating a workaround for our own convenience/measurement purposes.

The problem is that doing anything with the post body requires buffering and decoding it. That's not something we want to do before hitting the application servers, if we can avoid it.

I ran a superset query to see how often POST is used fro queries:

Data for April 4, 8:00 to 9:00 UTC:

method	requests	UAs
POST	 3,461,122	 2,213
GET	26,804,830	75,245

So, 13% of requests (and 3% or clients) use POST for action=query. That's more than I hoped, but less than I feared.

It's clear that we can't make this a hard requirement. I still think we should push for GET or QUERY as a best practice. But we should implement support for QUERY first. I'd consider this blocked until we do.

Change #1269471 merged by jenkins-bot:

[mediawiki/core@master] Action API landing page improvements

https://gerrit.wikimedia.org/r/1269471

matmarex changed the task status from Open to Stalled.Mon, Apr 27, 9:42 PM

I guess no further changes will happen here until T410883: Support HTTP QUERY method as standard alternative to Promise-Non-Write-API-Action header is done (and supported on Wikimedia wikis too).