The image Info end point is one of the most popular end points in the PHP API. During a brief period with metric collection a few months ago, we saw around 1100 requests per second for this entry point. While the information returned by it is fairly static and cacheable, there is currently no support for caching. With a Varnish-cacheable imageinfo end point in the REST API, we should be able to significantly lower response latencies for imageinfo requests, while also lowering the load on the PHP API cluster, as well as its storage backends.
The basic requirements for a cacheable end point are:
- deterministic URLs to enable active (Varnish) purging, and
- less or no parametrization about the information contained in the response, to reduce cache fragmentation.
There might be others, and we are soliciting input from the main consumers of the imageinfo end point to discover these.
One major consumer is Parsoid. As Parsoid has fast local-DC connectivity, minor bandwidth savings are probably not critical. As a result, returning more information than strictly required might be okay.
Properties to include vs. response size
A request for all properties results in about 13k of uncompressed JSON. The main bulky properties are:
- extmetadata, especially the Credit and Permission sub-fields. This metadata is HTML formatted, which greatly increases its size.
- html, the HTML returned by the uploadwarning property. This is documented as internal-only in any case, so should be excluded.
With these two properties removed, the response shrinks to a very reasonable 1.6kb before compression. To make a decision on extmetadata, we should look into how this is currently used by consumers. Depending on the actual use cases it might be desirable to either expose this HTML-formatted metadata in a separate API end point, or include it as structured data without formatting.
Cache invalidation
The caches need to be invalidated on
- file upload / deletion / rename
- any changes to the file description page
We cover much of this in the RESTBase extension, but going forward we should make sure that the events defined in T116247 will support this use case really well.