To reduce the load on thumbor, we store generated thumbnails in swift. This is in theory a cache, but in fact there is no expiry of thumbnails, meaning that now about ⅓ of our swift capacity is spent on storing thumbnails - about 540TB across both data centres. As well as being an inefficient use of swift capacity, the very large number of objects involved (nearly 4 billion) means that the on-disk databases that correspond to these swift containers are large and unwieldy. A further problem is that we are storing thumbnails that relate to original objects that have been deleted (for copyright or other legal reasons), which we should not be keeping.
These are largely architectural consequences of using swift for a purpose to which it is not really suited - we are attempting to use a persistent storage system for caching. Instead, we propose to use our existing caching infrastructure to cache thumbnails; the eventual outcome being to remove swift from the thumbnailing process entirely. Instead, ATS will cache thumbnails, and get thumbor to (re-)generate thumbnails that are cache misses.
To achieve this, we propose that ATS gradually start caching thumbnails for longer; increased storage use (and reduced request rate to swift-and-thumbor) would be monitored until we reach the point were swift storage of thumbnails is no longer necessary, at which point ATS would talk to thumbor directly when it needed a thumbnail not already in cache. The end result would be simpler infrastructure, and more efficient use of swift capacity.
This is a proposal that arose out of discussion regarding T211661 (and the deficiencies of an approach based on simply turning on object expiry for thumbnails, or of updating an expiry date when a thumb is re-accessed); while we could try and write a new LRU-based expiry process as some sort of sidecar, it seemed better to see if we could use our existing caching infrastructure to cache enough thumbnails that we wouldn't need swift for this use any more (and potentially could take swift out of the thumbnail request path entirely).




