Page MenuHomePhabricator

API intermittently returns the help page instead of a valid JSON response (due to POST lacking a Content-Type header)
Closed, InvalidPublic

Description

See P8914 for a demonstration.

I make a fairly standard API request expecting a JSON response. Most of the time, it works fine. Increasingly (about 20% of the time?), I'm getting back the HTML for the main API help page without any indication that I did anything wrong.

I've only been able to reproduce this with POSTs. Is POSTing incorrect? I've been doing it this way for years, so that would be surprising, especially since it only fails sometimes.

Alternatively, I can add logic to my bot to retry in this case, except it seems more like a symptom that I'm doing something wrong rather than a server error—but I can't figure out what it is!

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript
Anomie subscribed.

What happened is that somehow the post data didn't actually make it to MediaWiki, thus it served the help page as is the default when no action is specified.

I don't see any logs on the server side for the request beyond one that simply states that the request was made and no parameters were found, so if the post data was dropped server-side it must have been done without logging or in some place where I don't know where to look to find the log. If you can reproduce it with logging of the full request headers+body on your end that would be helpful to narrow it down.

Hey Anomie. Here's the full request generated by Python's requests:

POST /w/api.php HTTP/1.1
Host: en.wikipedia.org
Connection: keep-alive
Accept-Encoding: gzip, deflate
Accept: */*
User-Agent: EarwigBot/0.4.dev0 (wikipedia.earwig@gmail.com)
Cookie: WMF-Last-Access-Global=16-Aug-2019; GeoIP=:::::v4; WMF-Last-Access=16-Aug-2019
Content-Length: 145

inprop=protection%7Curl&rvslots=main&titles=User%3AThe_Earwig&rvlimit=1&format=json&action=query&rvprop=content%7Ctimestamp&prop=info%7Crevisions

In context with the response:

>>> resp = session.post(url, data=data)
send: 'POST /w/api.php HTTP/1.1\r\nHost: en.wikipedia.org\r\nConnection: keep-alive\r\nAccept-Encoding: gzip, deflate\r\nAccept: */*\r\nUser-Agent: EarwigBot/0.4.dev0 (wikipedia.earwig@gmail.com)\r\nCookie: WMF-Last-Access-Global=16-Aug-2019; GeoIP=:::::v4; WMF-Last-Access=16-Aug-2019\r\nContent-Length: 145\r\n\r\ninprop=protection%7Curl&rvslots=main&titles=User%3AThe_Earwig&rvlimit=1&format=json&action=query&rvprop=content%7Ctimestamp&prop=info%7Crevisions'
reply: 'HTTP/1.1 200 OK\r\n'
header: Date: Fri, 16 Aug 2019 01:36:37 GMT
header: Content-Type: text/html; charset=utf-8
header: Transfer-Encoding: chunked
header: Connection: keep-alive
header: Server: mw1225.eqiad.wmnet
header: X-Powered-By: PHP/7.2.16-1+0~20190307202415.17+stretch~1.gbpa7be82+wmf1
header: X-Content-Type-Options: nosniff
header: P3P: CP="This is not a P3P policy! See https://en.wikipedia.org/wiki/Special:CentralAutoLogin/P3P for more info."
header: Content-language: en
header: Expires: Thu, 01 Jan 1970 00:00:00 GMT
header: X-Frame-Options: SAMEORIGIN
header: Content-Disposition: inline; filename=api-help.html
header: Cache-Control: private, must-revalidate, max-age=0
header: Backend-Timing: D=315408 t=1565919397423687
header: Vary: Accept-Encoding,Cookie,Authorization,X-Seven
header: Content-Encoding: gzip
header: X-Varnish: 724862107, 854798546
header: Accept-Ranges: bytes
header: Age: 0
header: X-Cache: cp1079 pass, cp1075 pass
header: X-Cache-Status: pass
header: Server-Timing: cache;desc="pass"
header: Strict-Transport-Security: max-age=106384710; includeSubDomains; preload
header: X-Analytics: ns=-1;special=ApiHelp;WMF-Last-Access=16-Aug-2019;WMF-Last-Access-Global=16-Aug-2019;https=1
header: X-Client-IP: 172.16.7.167
DEBUG:urllib3.connectionpool:https://en.wikipedia.org:443 "POST /w/api.php HTTP/1.1" 200 None

Your POST lacks a Content-Type header. According to RFC 7231 § 3.1.1.5, in this situation the server may interpret the content as application/octet-stream or it may attempt to sniff the content type.

A sender that generates a message containing a payload body SHOULD generate a Content-Type header field in that message unless the intended media type of the enclosed representation is unknown to the sender. If a Content-Type header field is not present, the recipient MAY either assume a media type of "application/octet-stream" ([RFC2046], Section 4.5.1) or examine the data to determine its type.

It looks like PHP7 does the former (and so sees no form parameters) while HHVM does the latter (and so does). Thus your code works when the response is handled by HHVM but fails when handled by PHP7; see T176370: Migrate to PHP 7 in WMF production for progress on the migration.

You should update your code to generate requests that include a Content-Type header. Since this doesn't seem to be a problem with MediaWiki, I'm going to close the task.

At a quick test, it looks like Python's requests library sets a default content type when you pass a dictionary for data but not when you pass a string as you're doing in P8914.

Aklapper renamed this task from API intermittently returns the help page instead of a valid JSON response to API intermittently returns the help page instead of a valid JSON response (due to POST lacking a Content-Type header).Aug 18 2019, 10:48 AM