We have a few cases where cache-maps needed to be fully wiped to correct from bad tiles put in cache. We need better ways to do selective caching. A first proposal is to add an HTTP header identifying the version of kartotherian. This could be used if bad tiles are generated as a result of a bug in kartotherian to selectively wipe cache.
I guess the only time this is useful is if we catch a problem within the 1 day cache period. If it is longer, like happened on Friday, all objects would already have the new version number, and we would need to purge objects based on other criteria, like the zoom.
@MaxSem that's a tricky one - because we don't have any good way to invalidate the tiles on update. It might actually make sense to reduce the cache time rather than increase it, so that Varnish would solve the hot load areas, but any kind of severe problems would self-solve themselves in hours, removing the need to purge cache. Cache's value is in optimizing very expensive calls, whereas Kartotherian has a relatively low tile generation cost (especially if we move to the client-side rendering).
I still hope that we will catch problems faster than what happened on Friday, at least some of the times. The goal should be to solve severe problems in minutes, not hours.
It looks to me like the implementation of this should be relatively simple, so it might make sense to do it "just in case". The larger issue of cache invalidation, we should try to find good natural keys to expose to the caching infrastructure (but I have no idea what those keys might be).