Page MenuHomePhabricator

Move thumbnail caching from upload cluster to text
Open, Needs TriagePublic

Description

(Idea by @BBlack. I'm mostly documenting the discussions.)

Currently, 81% of the requests to the upload cluster are to thumbnails. The proposal is to move these requests to be served from text cluster instead. Basically changing the separation from "text" vs. "upload" to "main" vs. "aux" instead.

Why?

  • Having one large cluster gives us elasticity. Meaning we will have better tolerance towards depooled hosts and spikes
  • Thumbs are by nature smaller and are being hurt by originals and large files displacing them out of memory. Meaning, moving them to text and assuming the same capacity, the cache hit ratio should go up.
  • This would increase our resilience against scrapers which by and large mostly hit originals leading to outage of all of upload cluster leading to degraded user experience.
  • Thumbs are mostly human traffic while originals is most scrapers (47% of the requests to originals have browser score of 80 or above, that number is 60% for thumbs). This allows us to rate limit originals more strictly.
  • It would also allows us to set network QoS for upload to be something lower to avoid overwhelming backhauls when originals are being scraped to death while still giving the first class experience to human traffic.

Technical implementation:

Notes:

  • Moving to thumb.wikimedia.org also allows us to move the caching to somewhere else easily if/when the need arises (for example a fully dedicated edge cluster, etc.)
  • That leaves upload with these: originals, transcodes, maps
  • We could eventually move MPEG-DASH files to text too, they are similar in nature. I have no knowledge on maps infra so I can't say anything about moving those.

Event Timeline

It would also allows us to set network QoS for upload to be something lower to avoid overwhelming backhauls

Funny only yesterday I was suggesting that we could possibly do exactly this, and then map “upload” to lower priority for qos.

To me it makes sense, the rationale was to prioritize humans browsing wikis over other usage. So - at least on that aspect - this sounds like a sensible and straightforward thing we could do +1

We could eventually move MPEG-DASH files to text too, they are similar in nature.

On the QoS side we might be better keeping all video in low-priority I feel. I’m constantly paranoid of popular video ramping up bandwidth by many orders of magnitude. MPEG-DASH will help as it’s responsive, but to make sure it responds well it should be low priority. We can discuss again.

Some random questions/comments:

  1. Do you already have in mind the final split of # of hosts between text and upload at the end of the migration? How much will this complicate operations on and resilience of the upload cluster given the smaller size?
  2. I think that one step we might need add to the implementation list is to make sure that the analytics pipeline will be ready for this change as many statistics and analytics workflows are probably based on the clear split we have now between text and upload and their related content.
  3. I guess you're planning to keep a redirect from old URLs to new URLs, it should probably be explicitly mentioned somewhere ;)

I have only one suggestion regarding the ticket name and the framing (not the project itself, which looks to me like a great idea): I would pitch it as "having a dedicated cluster for thumbs only", which is what the description seems to imply, except that we may have limited resources and the cluster may have to share resources with the existing text one for the reasons stated, but the different domain establishes a "logical" separation but that way: 1) I wouldn't reject being able to get additional resources so early in the timeline for the main cluster and 2) it may be more attractive and relatively accurate as a summary despite later trade offs, so stressing the need to separate thumbs from originals rather than "moving" it. I would do the same for swift, pitching it as a separation, even if due to resource constraints it may end up sharing the same physical cluster hw in implementation time. Feel free to disagree with my opinion.

We could eventually move MPEG-DASH files to text too, they are similar in nature.

On the QoS side we might be better keeping all video in low-priority I feel. I’m constantly paranoid of popular video ramping up bandwidth by many orders of magnitude. MPEG-DASH will help as it’s responsive, but to make sure it responds well it should be low priority. We can discuss again.

Yeah, let's first get the MPEG-DASH out of the door, see how it works, cache hit ratio, etc. Then based on measurements we can make a more informed decision.

Some random questions/comments:

  1. Do you already have in mind the final split of # of hosts between text and upload at the end of the migration? How much will this complicate operations on and resilience of the upload cluster given the smaller size?

The requests to upload are 80% thumbnails so my assumption is that we should move 4 out of five hosts from upload to text but whether that could pose a redundancy problem in upload cluster is a question for traffic team.

  1. I think that one step we might need add to the implementation list is to make sure that the analytics pipeline will be ready for this change as many statistics and analytics workflows are probably based on the clear split we have now between text and upload and their related content.

Definitely, I will talk to them.

  1. I guess you're planning to keep a redirect from old URLs to new URLs, it should probably be explicitly mentioned somewhere ;)

At least right now, the idea is to have basically both upload and text have the same backened, so no redirects, we just serve it (from the same source) but obviously once we are done and comfortable, we can start issuing 302s instead.

I have only one suggestion regarding the ticket name and the framing (not the project itself, which looks to me like a great idea): I would pitch it as "having a dedicated cluster for thumbs only", which is what the description seems to imply, except that we may have limited resources and the cluster may have to share resources with the existing text one for the reasons stated, but the different domain establishes a "logical" separation but that way: 1) I wouldn't reject being able to get additional resources so early in the timeline for the main cluster and 2) it may be more attractive and relatively accurate as a summary despite later trade offs, so stressing the need to separate thumbs from originals rather than "moving" it. I would do the same for swift, pitching it as a separation, even if due to resource constraints it may end up sharing the same physical cluster hw in implementation time. Feel free to disagree with my opinion.

My understanding is that the idea is to have a big elastic cluster to give you more freedom. Similar to what wikikube has become. Allowing us to move thumbs around later is a nice benefit but not the main reason. Of course the idea is not mine and I hope Brandon corrects me if I'm wrong.

There's the potential to break various parser tests that hardcode a file URL. I don't think this will be a big concern in practice, but something to look out for.

A couple of thoughts. I am broadly in favour of handling thumbs and originals separately :)

First - our delightful rewrite middleware uses the /thumb/ in the URI path to know to rewrite the request to the relevant thumbnail container. That middleware currently is server-hostname-agnostic (because it assumes it's always the same, as does the associated test suite). So we could at least in theory teach it new URI schemes for accessing the thumbs, but it'd need doing with care.

Second, as you know, I would very much like to start caching thumbs in a cache (cf T345334) rather than in swift; and that process feels like it might be a bit more tractable now we have a much more finite set of thumbs of standard sizes. That's obviously not happening this FY; but can we try and make it at least not-harder to do in future? :)

A couple of thoughts. I am broadly in favour of handling thumbs and originals separately :)

First - our delightful rewrite middleware uses the /thumb/ in the URI path to know to rewrite the request to the relevant thumbnail container. That middleware currently is server-hostname-agnostic (because it assumes it's always the same, as does the associated test suite). So we could at least in theory teach it new URI schemes for accessing the thumbs, but it'd need doing with care.

Then I'd guess I keep it for now, later we can revisit :D I assume it eases the migration of rate limits and data pipelines too.

Second, as you know, I would very much like to start caching thumbs in a cache (cf T345334) rather than in swift; and that process feels like it might be a bit more tractable now we have a much more finite set of thumbs of standard sizes. That's obviously not happening this FY; but can we try and make it at least not-harder to do in future? :)

So that's on the roadmap. It's not written on stone and might change at any second but tentatively, the idea is to do two works on that front first: 1- improve performance of thumbor so when we reduce the storage TTL of thumbs from infinity to a more finite time, users and the infra wouldn't suffer too much. The idea is to work on that in Q1 of next FY (alongside this ticket) 2- Measure more data on thumb storage and usage and growth. Basically to have a more informed understanding of what can be done and what hardware needed or what ttl should be picked and maybe we might not even need a storage, edge caches could be enough (and directly connecting to thumbor on misses) but obviously we need to measure things again given the standardization. All of this would inform us what to do next.

In other words, I think we will be working on reworking the storage of thumbs in Q2 once 1 and 2 are done. But don't quote me on that since it's still being discussed and might change.

This has probably been discussed at length, but I've wondered a little if it would be possible to have same-host pathing (or at least shared second level domain, when not already a .wikimedia.org domain - still independent of the main HTML serving...although the task Description suggests a shared text cluster anyway) for the thumbnail URLs so that cookies and other things come in the request when embedded in a page for a normal web browser.

I understand, though, the various notions here and not ending up in a situation where we potentially adversely affect genuine webpage HTML serving in any way. Also, I'm mindful that, supposing we had something like image.<second_level_domain> instead of one shared subdomain, it could induce yet another pattern of scraping more. And yet, it's hard to imagine it being any worse than present state.

In any case, I get the idea for separating the full-res from the thumbs/derivative types. Thought I might put the idea out there, at least, about same cookie SLD thing, as I've been meaning to say it but didn't have the occasion.

Moving images to under the subdomain of the wikis bring a lot of complexities. These are things to come to mind right now (and there might be more): How to do CSP. Making sure problematic files can't access sensitive cookies, how to do cache defragmentieren and so on. I personally prefer that files that anyone on the internet can upload to be served from a different domain just for the sake of security hardening but security hardening can happen somewhere else.

Another reason is for it is that I want to reduce the number of moving parts. And make change incrementally, I know it's not super nice to change canonical urls of thumbs every year [1] but it's better than planning something extremely ambitious and not getting there because this change has basically the resources of 0.5FTE. That being said, if someone else feels like doing the work of that and get it over the finish line, I'm a strong believer in do-ocracy and I go with the change.

[1] We should eventually get T66214: Define an official thumb API done but the age of that ticket is a testament to how difficult it is to get there :D

Thanks @Ladsgroup, understood on the extra complexity.