Current Status
As of T102030: Document and hook up public mathoid end point in RB, RESTBase is handling, proxying and storing POST requests for/to Mathoid. The logic is the following;
- Clients do a POST request to RESTBase at https://{domain}/api/rest_v1/media/math/{format} with the request body being as if they're querying Mathoid directly.
- RESTBase checks if the request body has been encountered before (stored). If so, the result is directly returned to the client.
- RESTBase does a POST request to Mathoid which renders the given formula
- RESTBase stores the complete result (containing all available formats)
- The desired render is returned to the client, together with the X-Resource-Location header which contains the hash referencing the request
- On subsequent requests, the client uses GET /media/math/{format}/{hash} to obtain the same formula rendered in a different format. Such requests are served exclusively by RESTBase.
The rationale here is that the Math extension would use the POST /media/math/{format} endpoint to obtain a first mathml render of a formula on a page save and incorporate GET /media/math/{format}/{hash} calls into the page for fall-back calls of other formats.
Problem
The current set-up assumes that both RESTBase and Mathoid are installed on the same network as the Math extension, so access to the POST endpoint is restricted to internal IPs only by RESTBase. However, third-party users usually lack these services, which renders the usage of the extension rather pointless. Opening up access would allow:
- third-party users to use the Math extension and simply point it to use WMF's production RESTBase; and
- WMF to host a (possibly-) comprehensive catalogue of mathematical formulae.
The worrying aspect of this move is security and stability. RESTBase may sustain a much much higher request rate than Mathoid, which needs around a couple of seconds to serve one request and is hosted on only two machines. Therefore, it can be (easily) saturated, especially when swamped with invalid or erroneous requests.
Solution
In order to keep things up and running, RESTBase would need to limit the requests actually reaching Mathoid. In normal operation, the number of requests naturally decreases over time as RESTBase stores more and more renders. However, we need a way to protect Mathoid against attacks. RESTBase could:
- Check all of the requests and ensure they conform to the endpoint's specification. If the request doesn't adhere to it, reject it automatically.
- Limit the size of request body data to 16kB. No formula to be rendered on-wiki should be longer than that. We can probably even set a lower limit. If the request's size is larger than that, we assume the request is erroneous.
- Rate-limit the endpoint, probably could be part of T107934: Reliable and scaleable rate limiting mechanism for RESTBase API entry points. The exact number is yet to be determined, but the logic is that there are 2*32 Mathoid processes accepting requests. If each of them takes a couple of seconds, then we can quickly come to a rough 32 requests per second and assume that anything above that would start a backlog of requests. Each process having a backlog of 3 to 4 requests should be fine (both in terms of memory and processing power), so the rate could be set to 100 req/s.
- Create a new endpoint in Mathoid which would only check the correctness of a request and return the appropriate status code (2xx, 4xx). The logic would then be slightly more complicated in that when a request comes to RESTBase and its body is not known to it, Mathoid would be called to inspect the request. If it's a legitimate one, the body data would be stored and only then would Mathoid render the formula contained in it.
Discussion
Thoughts and comments are welcome on the following questions:
- Should we open up the POST route?
- If so, how can we ensure its stability?
See Also
- the original ticket - T102030: Document and hook up public mathoid end point in RB
- PR 339 introducing POST storage for Mathoid in RESTBase (and the discussion on it)
- Mathoid's public API spec
- Gerrit PS 245478 introducing the usage of the POST route in the Math extension