Page MenuHomePhabricator

Consider a very short term cache (5-10 min?) of 404’s for thumbnails, bearing in mind the possibility for cache pollution attacks
Open, Needs TriagePublic

Description

Followup actionable from https://wikitech.wikimedia.org/wiki/Incident_documentation/20200511-thumbor

During the incident, it became clear that if we had a very short cache (e.g. 5-10m) for 404s for thumbnails, the amount of requests that would reach the service eventually would be way less and thus would greatly mitigate the incident.

That however would open a path for cache pollution attacks. One that I can think of easily is the following:

  • Race condition, e.g. thumbnails being requested before the original has been uploaded and thus the thumbnail taking some time to generate. With request coalescing at the edge(which we currently have) and a short cache period (even 5-10m might be too much), that race condition would probably be mitigated before it became enough of a nuissance.

Event Timeline

elukey added a subscriber: elukey.

I removed the SRE-OnFire-Incident-Docs tag since this seems to be an action item rather than a document to review, but please re-add the tag if I am wrong.