Page MenuHomePhabricator

Thumb API: Varnish / CDN questions
Closed, InvalidPublic

Description

In T66214, we are working on designing an official thumb API, which should let clients select common options like the size of the thumbnail. We expect the introduction of such an API to involve a long transition period, so we need to prepare solutions that let us avoid fragmenting caches during the transition period & beyond.

Concretely, the Varnish / CDN related questions are these:

  • Query string normalization: Most discussion participants favor using regular query strings to specify thumb parameters. To avoid cache fragmentation, we would want to normalize query string order in Varnish, as discussed in T138093. Is supporting query string parameter order normalization in Varnish 4 feasible in the foreseeable future?
  • General redirect following support in Varnish: During the transition period, we will need to rewrite thumb URL formats to avoid cache fragmentation. Supporting this including all features in Varnish is not feasible, so this should be handled by backend code. One possibility is to return HTTP redirects from the backend service. However, returning those to clients would introduce a latency penalty, especially on high-latency connections. To avoid this, it would be interesting to follow this redirect (possibly triggered by specific response headers) in Varnish on behalf of the client. Does this sound sane / feasible to you?
  • Rewrite simple thumbnails in Varnish: Most thumbnails have no custom parameters except for the size. As an optimization, we could thus speed up the processing of the vast majority of thumb requests by supporting limited rewriting of *simple* thumb URLs only in Varnish. The downside is extra complexity.

Event Timeline

  • Query string normalization: Yes, I think we can support this in Varnish4. I would note from the other ticket though that there are two particular problem areas that our query-strings should try to avoid:
    1. Do not use/support arrays or duplicate keys. As in, we shouldn't design such that any of ?foo[]=1&foo[]=2, ?foo[0]=1&foor[1]=2, or ?foo=1&foo=2 are useful things clients would try to do.
    2. Avoid default-able parameters, and in cases where they seem unavoidable, avoid clients setting default parameters. For example, if the default value of foo is 123, it becomes difficult for varnish to normalize and de-duplicate the two equivalents ?foo=123&bar=xyz and ?bar=xyz. The best defense against this is to not support default-able parameters (require that foo is always set or it's a request error), but that tends to be unavoidable in the long run as parameters get added later in an API's life, customizing something that used to be fixed and now needs to be defaulted. At that point all we can do is try to ensure our own use of the API never sets the parameter when it's the default value, and hope everyone else does too. (or worse, have to encode knowledge of all the defaults into Varnish, which seems very unpalatable).
  • General redirect following support in Varnish + -Rewrite simple thumbnails in Varnish
    • It would be better if we don't have to support Varnish following applayer redirects to hide latency issues. In theory, Varnish4 is capable of doing this sort of thing, but we've never done it before, it would be quite hacky and add a lot of special-case cruft in our VCL code, and likely cause statistics and debugging confusion at the very least.
    • If the rewrites are mechanical in nature (can be accomplished with a series of regexen, which is probably true for at least your "simple" case), or at worst can be driven by a short and relatively-slowly-changing data table, we'd be better off supporting this as internal rewrites in VCL until the bulk of clients have switched to the new-style URLs, and then at that point we can convert to public redirects for the stragglers.
    • Another thing to keep in mind here: while our major use case is our own wikis, our thumbnail URLs also get used by various 3rd parties directly in the wild (which we intentionally allow), and we have no control over their eventual conversion of their links. I surveyed this at one point a couple of years ago and IIRC I found that something like 40% of our upload.wikimedia.org traffic was in service of 3rd-party referrers. The important thing here is that in practice the shift from old-style to new-style thumbnail access for the bulk of traffic could take much longer than anyone's thinking.

@BBlack, thanks for the feedback! Some follow-up comments from my end:

  • Query string normalization: The understanding is indeed that we won't use duplicate parameters, and also generally avoid optional (default or otherwise) parameters that don't affect the result. These rules would be enforced by the backend, and requests not conforming to them rejected.
  • Rewrite / redirect support:
    • As far as I'm aware, simple thumbs should indeed be rewritable with regexp replaces.
    • To establish whether redirects or cache fragmentation would be tolerable for the long tail, it would be helpful to have more precise data on simple vs. complex thumbnail requests. The most precise numbers for this should be available from web request logs, as those would include both cache hits & misses. @BBlack, @Gilles: What is the best way to analyze upload.wmorg request logs? Are those included in analytics pageview logs?
    • 40% of upload.wm.org traffic from third party referrers is higher than I expected. So far I assumed that a deprecation period of a year or so might be enough, provided that we track down remaining users. In any case, HTTP redirects sound like a good strategy for handling stragglers while also sending a clear signal.

The swap of Traffic for Traffic-Icebox in this ticket's set of tags was based on a bulk action for all such tickets that haven't been updated in 6 months or more. This does not imply any human judgement about the validity or importance of the task, and is simply the first step in a larger task cleanup effort. Further manual triage and/or requests for updates will happen this month for all such tickets. For more detail, have a look at the extended explanation on the main page of Traffic-Icebox . Thank you!

BCornwall subscribed.

Closing as this was not actionable. For the future, please contact the team at https://wikitech.wikimedia.org/wiki/SRE/Traffic for any questions.

Thanks!