Page MenuHomePhabricator

Cached REST end point for imageinfo requests
Closed, InvalidPublic

Description

The image Info end point is one of the most popular end points in the PHP API. During a brief period with metric collection a few months ago, we saw around 1100 requests per second for this entry point. While the information returned by it is fairly static and cacheable, there is currently no support for caching. With a Varnish-cacheable imageinfo end point in the REST API, we should be able to significantly lower response latencies for imageinfo requests, while also lowering the load on the PHP API cluster, as well as its storage backends.

The basic requirements for a cacheable end point are:

  • deterministic URLs to enable active (Varnish) purging, and
  • less or no parametrization about the information contained in the response, to reduce cache fragmentation.

There might be others, and we are soliciting input from the main consumers of the imageinfo end point to discover these.

One major consumer is Parsoid. As Parsoid has fast local-DC connectivity, minor bandwidth savings are probably not critical. As a result, returning more information than strictly required might be okay.

Properties to include vs. response size

A request for all properties results in about 13k of uncompressed JSON. The main bulky properties are:

  • extmetadata, especially the Credit and Permission sub-fields. This metadata is HTML formatted, which greatly increases its size.
  • html, the HTML returned by the uploadwarning property. This is documented as internal-only in any case, so should be excluded.

With these two properties removed, the response shrinks to a very reasonable 1.6kb before compression. To make a decision on extmetadata, we should look into how this is currently used by consumers. Depending on the actual use cases it might be desirable to either expose this HTML-formatted metadata in a separate API end point, or include it as structured data without formatting.

Cache invalidation

The caches need to be invalidated on

  • file upload / deletion / rename
  • any changes to the file description page

We cover much of this in the RESTBase extension, but going forward we should make sure that the events defined in T116247 will support this use case really well.

Event Timeline

GWicke assigned this task to Pchelolo.
GWicke raised the priority of this task from to Needs Triage.
GWicke updated the task description. (Show Details)
GWicke updated the task description. (Show Details)
GWicke set Security to None.
GWicke added subscribers: ssastry, daniel, Jhernandez and 8 others.

@ssastry, @cscott, @tstarling: Does the proposal above sound good to you? Any other thoughts on this?

I think @cscott and @tstarling have thoughts on this but couple things:

  1. Will T112045 and T117003 also be supported as part of this?
  2. I think it makes sense to leave out extmetadata from the core api -- and have it be a separate api endpoint.

But, separately, makes sense to make this cacheable. Also, how does this relate to T89971 ?

To really make a cached image info API useful to clients, I believe that we need to first address some of the issues brought up in T66214. For that reason, I have added that task as a dependency.

GWicke raised the priority of this task from High to Needs Triage.Aug 8 2017, 6:00 PM
GWicke triaged this task as Medium priority.
GWicke moved this task from blocked to designing on the Services board.
GWicke edited projects, added Services (designing); removed Services (blocked).

Possibly related discussion around caching support on other RESTBase endpoints in T184534#4014590.

Given introduction of the REST API in MediaWiki, I believe this is no longer nesessary in RESTBase