Tilerator should purge Varnish cache
Open, HighPublic

Description

Whenever tilerator updates a tile, it should invalidate it in Varnish. Since we are currently generating about 1700 tiles per second (the whole cluster), this seems a bit excessive on Varnishes (@BBlack?). Also, generation a tile should invalidate the whole pyramid of tiles underneath it - otherwise lets say tile at z6 was generated, and it was partially (one quarter) over water - this means that out of the 4 tiles at the same location at z7, 3 might not be stored in cassandra at all (blank space), which means when users request z7, the system will use overzoom to take data from z6.

We could allow zoom level invalidation - anything that starts with /7/* gets invalidated.

Or, we could introduce "range invalidation", where we send the range of tiles (fromIndex, untilIndex). This requires varnish to do map-specific indexing -- internally all our tiles have an index 0 .. (4^zoom-1) for a given zoom level. We could expose this indexing, and let varnish do the [x,y] coordinate conversion into an index (simple bit manipulation). Thus, all internal requests would become /zoom/index.png, and if varnish can treat that index as an integer, purging is simple. Or we could pad the index string with zeroes, and let varnish do purging based on the string comparison - [from <= index < until].

Yurik created this task.Aug 20 2015, 9:45 PM
Yurik updated the task description. (Show Details)
Yurik raised the priority of this task from to Needs Triage.
Yurik added subscribers: Yurik, MaxSem, BBlack.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptAug 20 2015, 9:45 PM

Is this 1700 tiles/sec 24/7? Or only during certain batch update periods? 1700 purges per second isn't completely unreasonable in the generic case. Either way, we should assign a separate multicast address for maps HTCP from the one in use by text/upload.

You'll have to explain the overzoom and pyramid invalidation a little better, too:

  • Are you saying that Cassandra never stored data for blank water tiles explicitly?
  • Who is "the system" that does the overzoom? The renderers on maps-test200x?
  • Would non-blank tiles at deeper zoom be regenerated and invalidated directly by tilerator at some point in the update process?

Stepping out a bit from all of that: it seems like non-seamless tile regeneration is a generally-hard problem, and it would be helpful to understand the rest of the process.

Are we updating in batches based on periodic datestamped dumps of the entire dataset? Or is more like a continuous stream of small diffs at the geospatial SQL level, for which we don't know without reprocessing which tiles are actually affected?

1700 is the maximum speed for high zoom level tile updates. It should be lower in practice because lower zoom levels will also be refreshed. A continuous stream should happen when we batch regenerate certain zoom levels/tiles, in general operations it will fire in short bursts as replication happens.

Yurik added a comment.Aug 21 2015, 1:11 AM

Is this 1700 tiles/sec 24/7? Or only during certain batch update periods? 1700 purges per second isn't completely unreasonable in the generic case. Either way, we should assign a separate multicast address for maps HTCP from the one in use by text/upload.

1700 during peak performance. We will scale it down once the servers start doing image rendering for the users.

You'll have to explain the overzoom and pyramid invalidation a little better, too:

  • Are you saying that Cassandra never stored data for blank water tiles explicitly?

Correct, we store 13% at z12, less than 10% of zoom13, about 7% of zoom 14, etc. In GB that means 4.5GB at 12, 7GB at 13, 15.5GB at z14, etc

  • Who is "the system" that does the overzoom? The renderers on maps-test200x?

Yes. When user requests a tile image, Kartotherian either take corresponding vector tile from Cassandra, or if it's not there, it looks at the lower zooms until there is a tile, and extract the needed portion.

  • Would non-blank tiles at deeper zoom be regenerated and invalidated directly by tilerator at some point in the update process?

Non blanks - yes. But with overzoom, the ones that are missing from Cassandra are the ones that also need to be invalidated because the origin of their data has changed (Tile from lower zoom). Granted that those changes are fairly rare - if there was nothing on the tile(water,forest,etc), the chances are low for something to appear

Stepping out a bit from all of that: it seems like non-seamless tile regeneration is a generally-hard problem, and it would be helpful to understand the rest of the process.

Not sure what you mean

Are we updating in batches based on periodic datestamped dumps of the entire dataset? Or is more like a continuous stream of small diffs at the geospatial SQL level, for which we don't know without reprocessing which tiles are actually affected?

My understanding: We subscribe to updates, could do it in smaller or larger batches. The SQL importer generates a list of the affected tiles. We will feed that list to the tilerator to regenerate and purge varnish.

Thanks for the info!

So, a few thoughts (but I'm still thinking, I think):

  • Are the updates time-critical? I get the feeling they'll be async in terms of map consistency regardless (e.g. in an area with no blank tiles, we may get the tilerator update -> cache invalidation for a z14 tile and the covering z6 tile at very different times for the same basic geospatial change, so it will always be the case that there's minor temporal inconsistency in the images?). If that's the case, we could also experiment with the idea of simply keeping cache lifetimes capped to reasonability (e.g. a few days), not purging on normal update flow, and letting the cache lifetimes be part of the update delay for data to end-users? I imagine the update pipeline isn't super high speed from user reports of corrections -> OSM -> us to begin with.
  • Assuming we purge: Having varnish auto-purge the pyramid beneath a purged tile seems like it would be very inefficient in the cases where the underlying tiles weren't blanks. Also, they'd be re-purged several times in one update as we processed the true purges at various levels of the related pyramid. Could tilerator (or related software) that's generating the purge easily generate a list of only the blanks within the underlying pyramid and queue up purges for those (but not non-blanks)?
  • Rates: while peak 1700/s is "doable", if the rate is spiky/batchy, it might make sense to implement some app-level queueing here to smooth out the rate and avoid bursting socket/network buffers with multicast spikes, etc. Instead of firing off HTCP in realtime, fire those into a purge queue and have something process the purge queue with an outbound ratelimiter in place to cap at, say, 500/s or whatever we decide is reasonable, which smears out the spikes.
  • On that same front: assuming we're not so spiky that we overflow socket/network buffers and drop multicast: if varnish ends up limiting the purge rate, vhtcpd (the daemon that receives the multicast and sends them to varnish) does internal queueing as well and can absorb some fairly tremendous spikes too - but it has no concept of rate-limiting the queue drainage either; it would just be limited by the rate at which varnish accepts them.

Thanks for the info!

So, a few thoughts (but I'm still thinking, I think):

  • Are the updates time-critical? I get the feeling they'll be async in terms of map consistency regardless (e.g. in an area with no blank tiles, we may get the tilerator update -> cache invalidation for a z14 tile and the covering z6 tile at very different times for the same basic geospatial change, so it will always be the case that there's minor temporal inconsistency in the images?). If that's the case, we could also experiment with the idea of simply keeping cache lifetimes capped to reasonability (e.g. a few days), not purging on normal update flow, and letting the cache lifetimes be part of the update delay for data to end-users? I imagine the update pipeline isn't super high speed from user reports of corrections -> OSM -> us to begin with.

Not purging always has a chance of the following scenario: a change spans two adjacent tiles. One of them is cached, another is not. As a result, users see a feature cut off.

  • Rates: while peak 1700/s is "doable", if the rate is spiky/batchy, it might make sense to implement some app-level queueing here to smooth out the rate and avoid bursting socket/network buffers with multicast spikes, etc. Instead of firing off HTCP in realtime, fire those into a purge queue and have something process the purge queue with an outbound ratelimiter in place to cap at, say, 500/s or whatever we decide is reasonable, which smears out the spikes.

Yes, this should be doable for replication updates, especially since we will need a purge queue anyway - to accomodate for Cassandra replication lag. I wonder though what about batch regeneration that has no spikes - maybe we can sigh and not purge for these updates.

Yurik added a comment.Aug 21 2015, 6:58 PM

Tilerator already uses a job queing system that supports delayed execution, so it should be very easy to add a delay (how long of a delay is a much harder question). It can also be used for regulating the rate of purges.

Not purging always has a chance of the following scenario: a change spans two adjacent tiles. One of them is cached, another is not. As a result, users see a feature cut off.

Isn't this always a problem even with naive purging, as there could be a significant time-span between updating->purging two related tiles for a given visible feature change (either due to adjacency at a certain zoom, or part of the same zoom pyramid containing the object)?

Yurik added a project: Maps.Nov 7 2015, 7:34 AM
Restricted Application added a project: Discovery. · View Herald TranscriptNov 7 2015, 7:34 AM
Restricted Application added a subscriber: StudiesWorld. · View Herald Transcript
Yurik moved this task from All map-related tasks to Kartotherian on the Maps board.Nov 7 2015, 7:35 AM
Deskana moved this task from Needs triage to WDQS on the Discovery board.Dec 31 2015, 5:16 AM
BBlack moved this task from WDQS to Maps on the Discovery board.Jan 1 2016, 5:45 PM
Yurik moved this task from Kartotherian to Tilerator on the Maps board.Feb 7 2016, 10:17 PM
Restricted Application added a project: Operations. · View Herald TranscriptMay 4 2016, 9:14 AM
chasemp triaged this task as High priority.May 5 2016, 8:43 PM
Yurik added a comment.May 5 2016, 11:22 PM

Some stats to consider: the db update script reports 5-6 million z16 affected tiles per day. That's about 7million (5m/4 + 5m/16 + 5m/64...) with all of the lower zooms, plus we also need to update z17 and z18 (which instantly makes it into 107 million) -- that's about 1,200 tile updates per second. Sending that many UDPs might cause some issues.

BBlack added a comment.May 6 2016, 2:47 AM

Yes, invalidating 1200 tiles/sec is ridiculous no matter how we do it. We definitely agree there.

First, let's explore how you're accepting and processing updates, because I really don't have a firm understanding for discussion purposes. Are you using daily changsets? hourly? They even publish per-minute changesets. Is there a natural transactional unit within the flow of changes, where a given change-transaction only makes consistent updates to all affected tiles, avoiding buggy tile edges? I'd like to understand all of this better. Also, is it assumed that zoomed-out tiles are affected when they really aren't due to loss of detail? e.g. does the single tile at z0 get constantly updated even though very few changes affect the actual image at it's level of graphical detail? That doesn't matter for the purge-rate problem since there's so few tiles at the top, but it does suck if that tile is constantly being pointlessly invalidated when it's also likely to be very hot for viewing, and ditto for for really all of z0 to z4 or so.

As for invalidations themselves: I'd step back first and take a look at what these z16 and beyond numbers really mean. Updates to ~5M z16 tiles per day seems reasonable: that's only somewhere around 0.1% of the geographical area at that zoom level. However, the last estimate stats we see at http://wiki.openstreetmap.org/wiki/Tile_disk_usage says that even with general-purpose usage, we really only expect ~10% of tiles at z16 to ever, ever, ever be viewed at all, because most of them are uninteresting to humans. We'd expect an even smaller fraction to be viewed on any given day. So sending purges for every single theoretically-touched tile daily would be extremely wasteful (nevermind z17/18, which are the same with even worse stats in that direction).

There are a few different strategies you could use (or combine) for dealing with that general sort of problem:

  1. You can keep a single-bit flag in your database (or whatever, even a separate sqlite DB on-disk with just a list of viewed tiles?) that flags whether a given high-zoom tile was ever rendered (at varnish's request) since the last update. You set the flag on-render. When you process a batch update, you only send invalidations for the affected tiles actually viewed since the previous update and wipe the flags down as you invalidate them. Should cut down the purges sent dramatically.
  1. Beyond a certain zoom cutoff, you could decrease the TTL and not purge at all. e.g. with some arbtirary tunable choices here and assuming per-hour or per-minute update pulls: you could decide that in general you set 7day TTLs on z0-z10 and purge them explicitly always (assume they get viewed often enough). For z11-14 you set 1-day TTLs and use the "was it rendered?" flagging strategy above to cut down on pointless invalidations, and for z15+ you set 1 hour TTLs and don't purge at all, letting them naturally expire and never be more than an hour out of date.
  1. In combination with any of the above (or not): you could also explicitly batch-invalidate beyond a certain zoom using special headers. For example, for tiles at z11+, when you output a rendered image, you also include an XKey: z10tile:XXX header, which indicates the z10 tile this tile is contained within. Then you only explicitly purge individual tiles to the z10 level, and make use of these indices to purge z11+ tiles in z10-sized batches, one invalidation request per affected z10. That is, of course, assuming we can work around the LOD problem described at the top (are some z16 updates not affecting their enclosing z10 image, and there's no easy way to summarize them as a z10-based invalidation?).

Another thing we could look at doing is, of course, simply not setting a very high TTL and not invalidating at all. Most of the time it would be fine to use e.g. a 1day TTL. May it wouldn't be very often that users run into a jarring transition from an update, and nothing will ever be more than a day behind the update stream. That's basically what we're doing today. Varnish is capping TTLs at 1d and we're not purging at all.

Yurik added a comment.May 6 2016, 3:41 AM

Are you using daily changsets? hourly? They even publish per-minute changesets.

We are using daily, but I would like to switch to 1-15 minute ones.

Is there a natural transactional unit within the flow of changes, where a given change-transaction only makes consistent updates to all affected tiles, avoiding buggy tile edges?

Not to my knowledge

Also, is it assumed that zoomed-out tiles are affected when they really aren't due to loss of detail?

No, most of the time they don't change. I heard that others re-generate low zooms on schedule rather than on update.

One other thing we can do is to binary-compare the new generated tile with what we had in its place before, because even though there is an update, it is possible that nothing actually changed for that location at that zoom level.

Really like your analysis and the ideas. Bit database has been poking in my head for a while, will think about it some more (especially because we don't have to create the whole 512MB file for z16 - we can create 1MB chunks for the requested tiles only, and most of them will probably never get created.

Rendering time once we have the tile is very small, so maybe its really not worth it to UDP-invalidate anything for high zooms, and simply let them expire.

ema moved this task from Triage to Caching on the Traffic board.Sep 30 2016, 2:40 PM
Yurik removed a project: Maps.Dec 15 2016, 4:36 AM

I'd rather see max-age significantly reduced and stale-while-revalidate set to the current max-age value. This avoids the need to invalidate the cache during routine operations but keeps everything fresh.

When a user loads a tile, varnish would serve immediately from cache while sending a GET request with the If-Modified-Since header. kartotherian would either return 304 or a 200 and new tile based on the age of the vector data.

Mholloway added a project: Maps-Sprint.
Mholloway moved this task from Backlog to In progress on the Maps-Sprint board.

Change 456463 had a related patch set uploaded (by Mholloway; owner: Mholloway):
[operations/puppet@production] Add variables for map tile invalidation

https://gerrit.wikimedia.org/r/456463

Change 456474 had a related patch set uploaded (by Mholloway; owner: Mholloway):
[maps/tilerator/deploy@master] Add variables for cache invalidation of map tiles

https://gerrit.wikimedia.org/r/456474

This is ready to deploy to beta as soon as the deploy config template and puppet changes land.

Change 456463 merged by Gehel:
[operations/puppet@production] Add variables for map tile invalidation

https://gerrit.wikimedia.org/r/456463

Change 457512 had a related patch set uploaded (by Gehel; owner: Gehel):
[operations/puppet@production] maps: configure maps-test for new tile invalidation variables

https://gerrit.wikimedia.org/r/457512

Change 457512 merged by Gehel:
[operations/puppet@production] maps: configure maps-test for new tile invalidation variables

https://gerrit.wikimedia.org/r/457512

Change 456474 merged by jenkins-bot:
[maps/tilerator/deploy@master] Add variables for cache invalidation of map tiles

https://gerrit.wikimedia.org/r/456474

This is now ready to be deployed to beta, but note that deploying to production is blocked on T198622: migrate maps servers to stretch with the current style.

Mentioned in SAL (#wikimedia-releng) [2018-09-17T17:24:59Z] <mdholloway> deployment-maps04 updated kartotherian and tilerator to latest (T109776)

Confirmed that events are being sent by Tilerator upon tile generation, and received and produced to kafka by eventlogging-service on beta:

mholloway-shell@deployment-kafka-main-1:/var/log/eventlogging$ kafka-console-consumer --bootstrap-server localhost:9092 --topic eqiad.resource_change
{"meta": {"domain": "maps-beta.wmflabs.org", "dt": "2018-09-18T12:28:41.096Z", "id": "60308887-bb3e-11e8-b4b1-12d89242bb88", "schema_uri": "resource_change/2", "topic": "resource_change", "uri": "https://maps-beta.wmflabs.org/osm/8/153/104.png"}, "tags": ["tilerator"]}
{"meta": {"domain": "maps-beta.wmflabs.org", "dt": "2018-09-18T12:28:41.096Z", "id": "60308888-bb3e-11e8-9e5d-ad2abe896fda", "schema_uri": "resource_change/2", "topic": "resource_change", "uri": "https://maps-beta.wmflabs.org/osm-intl/8/153/104.png"}, "tags": ["tilerator"]}

However, we still need to update the production changeprop config with the change earlier approved for config.example.wikimedia.yaml so that changeprop acts on these events with a purge request.

Change 461141 had a related patch set uploaded (by Mholloway; owner: Mholloway):
[mediawiki/services/change-propagation/deploy@master] Add map tile purge rule

https://gerrit.wikimedia.org/r/461141

Change 461141 merged by Ppchelko:
[mediawiki/services/change-propagation/deploy@master] Add map tile purge rule

https://gerrit.wikimedia.org/r/461141

Mholloway added a comment.EditedOct 2 2018, 8:40 AM

This needs some work:

  • We need to document that this should be disabled on initial data import / tile generation (or, better yet, enforce that in code somehow)
  • These events should probably also be sent in batches of n tiles, rather than in a separate HTTP request for each tile. (Need to talk to services or analytics to figure out what the optimal value of n is)

Finally, I should note that there was a misconfiguration sending all HTTP requests from Kartotherian and Tilerator, even internal ones, through a proxy at http://url-downloader.[eqiad,codfw].wikimedia.org:8080. This proxy blocks requests to EventBus, and should not have been used in any case. This proxy setting is now updated in the deploy repo configs and has been fixed in beta and on maps1004, but is still in place in the remainder of the maps cluster until the stretch update is complete and we can redeploy.

Blocking this until the service/config is up to date on all hosts.