I am trying to understand why my toolforge tool (https://wdreconcile.toolforge.org) returns HTTP 502 errors for some URLs.
The tool is written in Python 3.7, deployed with Kubernetes via WSGI.
In my experience, HTTP 502 errors signal a lack of connectivity between the HTTP server and the WSGI process. This normally happens when the service is overloaded or just times out. But this time, there are specific URLs which reliably return 502 errors for an extended period of time, while any variation around them
For instance, the following URL returns an HTTP 502 error at the moment:
https://wdreconcile.toolforge.org/en/api?queries=%7B%22q0%22%3A%7B%22query%22%3A%22Ujjwal%22%2C%22type%22%3A%22Q202444%22%2C%22type_strict%22%3A%22should%22%7D%2C%22q1%22%3A%7B%22query%22%3A%22Ant%C3%B3nio%22%2C%22type%22%3A%22Q202444%22%2C%22type_strict%22%3A%22should%22%7D%2C%22q2%22%3A%7B%22query%22%3A%22Milan%22%2C%22type%22%3A%22Q202444%22%2C%22type_strict%22%3A%22should%22%7D%2C%22q3%22%3A%7B%22query%22%3A%22Sevag%22%2C%22type%22%3A%22Q202444%22%2C%22type_strict%22%3A%22should%22%7D%2C%22q4%22%3A%7B%22query%22%3A%22Magdalena%22%2C%22type%22%3A%22Q202444%22%2C%22type_strict%22%3A%22should%22%7D%2C%22q5%22%3A%7B%22query%22%3A%22John%22%2C%22type%22%3A%22Q202444%22%2C%22type_strict%22%3A%22should%22%7D%2C%22q6%22%3A%7B%22query%22%3A%22Shelby%22%2C%22type%22%3A%22Q202444%22%2C%22type_strict%22%3A%22should%22%7D%2C%22q7%22%3A%7B%22query%22%3A%22Nicolas%22%2C%22type%22%3A%22Q202444%22%2C%22type_strict%22%3A%22should%22%7D%2C%22q8%22%3A%7B%22query%22%3A%22Earl%22%2C%22type%22%3A%22Q202444%22%2C%22type_strict%22%3A%22should%22%7D%2C%22q9%22%3A%7B%22query%22%3A%22Dan%22%2C%22type%22%3A%22Q202444%22%2C%22type_strict%22%3A%22should%22%7D%7D
I am puzzled by two things:
* the URL reliably returns HTTP 502 errors, and does so instantly (no timeout)
* if I change the query slightly (for instance by changing "Magdalena" to Magdalen", or changing any character in the query) then the problem disappears.
* the query runs fine on a local instance of the service
Therefore I am starting to wonder if HTTP 502 errors could be cached by some layer, between the WSGI process and the HTTP server?
I also observe 502 errors in POST requests, but is not clear to me whether they could also be affected by caching issues (I hope not, since caching POST requests is obviously an issue on its own).