Page MenuHomePhabricator

Set up proper edge Varnish caching for maps cluster
Closed, ResolvedPublic

Description

In T105076, Varnish caching servers were deployed in a very creative way in order to get maps deployed for testing, hosted on production hardware (note: this is not the same as "deployed as a production-ready service"!). This ticket is about deploying the necessary hardware to make it ready for real production usage.

  • Adding a note here: decom cp104[34] when the rest of this is resolved and they're freed up from their experimental usage.

Event Timeline

Yurik raised the priority of this task from to Needs Triage.
Yurik updated the task description. (Show Details)
Yurik subscribed.
Yurik set Security to None.
Yurik added subscribers: BBlack, MaxSem.

To import some questions from IRC earlier:

  1. Does maps needs its own cache cluster?
    • My opinion is that yes, it does, especially in early days when we don't understand where the project will go and what kind of limitations we can place on external usage and data set growth.
    • This project as conceived today is designed to rely on the varnish cache layer to offload most of the repetitive load, with the service backend providing dynamic rendering for cache misses. All things considered, I think that's a reasonable design choice here, but it does mean the cache layer considerations are important in this case.
    • The potential of the dataset in terms of cache objects and total storage is substantial. If it shares a cluster with another service like text or upload, it has the potential to dramatically impact the (generally quite high and desirable) cache hitrate of those caches for non-maps traffic.
    • Also, upload in particular is the subject of ongoing discussions about re-architecting. Throwing another service into its caches would muddy those statistics and decisions, and possibly create undesirable inter-dependencies between maps + upload-re-arch plans.
    • To get a rough feel for the dataset, OSM's got data on how theirs was looking in 2012 here: http://wiki.openstreetmap.org/wiki/Tile_disk_usage . Using those numbers as a rough guide we'd expect:
      • With broad legitimate use: ~1.6 billion tiles (cache objects), consuming ~1.2TB of storage space
        • This is in-range for a 2layer cache with 4 machines per site to contain the entire dataset in the 2nd-level SSD chash'd cache, but not by a very big margin (~2x).
      • Potential total dataset: ~56x the above size (92 billion objects, 71TB)
        • This is not in-range for anything we offer in terms of cache storage size today
      • The difference between the two sets of figures is basically all of the junk tiles that possibly exist by simple multiplication of the space x zoom levels, versus those that are actually commonly viewed. Many of the potential tiles are uninteresting zooms of random spots in the ocean, for example, and the odds of them ever being rendered are remote.
      • However, there's a potential for abuse here. Someone could, for example, set a botnet off on a mission to ask us to render randomly-positioned maximum zoom tiles. Our LRU eviction with current software isn't well-equipped to isolate that from impacting legit caching.
    • None of the above considers whatever multipliers may end up being in effect for various different tile-sets / layers / overlays we may support beyond the set of basic tiles.
  1. What's a minimum "proper" deployment of traffic infrastructure for maps?
    • As a general rule for a new cache cluster of this nature: 4x servers per datacenter. This is mostly about the ability to manage load, latency, cache miss-rate, and availability in the face of various overlapping incident scenarios at the machine, rack, site, or link level, plus DDoS and/or legitimate traffic spikes, etc. At the 4x min for those concerns, we've also got very substantial capacity to handle large production traffic loads as well. We can scale further if eventually warranted, but the amount of traffic it takes to outgrow a 4x cluster is pretty big. As noted above, the 4x size also happens to work out well in terms of 2nd-layer persistent storage capacity for maps.
    • Given 4x datacenters today (Tier1: eqiad+codfw, Tier2: ulsfo+esams) that means 16x cache machines total.
    • We prefer to keep cache hardware standardized. Trying to optimize cost on a per-cluster basis with special hardware configs works against us in the long run in terms of flexibility for re-arranging and re-architecting the edge layer or re-balancing cluster loads between projects/services. It also adds needless complexity with a real long-term cost. There are real low-level hardware concerns in the internet-facing cache clusters for optimization and configuration. We can't avoid having overlapping generational differences in cache hardware regardless, so intentional deviations just multiply the set of hardware we have to manage.
    • Our standard cache hardware is not cheap, but exact pricing and config is not in scope for a public ticket like this.
  1. Do we have to buy 16 machines (just at the edge layer - I don't even know what the final codfw+eqiad plan looks like for sufficient backend service capacities) to get this going?
    • Not necessarily, if we're willing to delay a bit on unrelated dependencies.
    • There are other ongoing efforts to consolidate and optimize the cache cluster deployments which could free up enough capacity from existing clusters to provide the above. Rough expectation is completion sometime in the next couple of months (let's say: sometime in 2015). The machines in question would likely come from what is currently the independent cache_mobile cluster, which will probably be folded into the cache_text cluster without requiring any new hardware in cache_text.
    • I think there's substantial timeline left in testing and refining this new maps service before broad production usage regardless, so we should have that time.
akosiaris triaged this task as Medium priority.Aug 27 2015, 10:45 AM

I agree with the statements above. A couple of notes.

  • A botnet asking all high zoom level tiles would indeed cause cache eviction but high zoom tiles get generated really fast. Vector tiles are at a rate of ~50/s for z14 IIRC correctly, PNG tiles even faster ? @Yurik is that correct? There are also a number of tricks done by @Yurik to avoid generating unneeded vector tiles. I am not sure about low level zoom tiles though.
  • At the beginning we will be only serving PNG tiles due to security considerations but later on, vector tiles will be also served. In fact the plan is to make Vector tiles the standard in the future IIRC. That might change a lot of calcuations. So the service is still in a state of flux as far as this goes and the number from OSM are quite probably not accurate. Our current Vector tile total in cassandra is around ~40G for v1 keyspace, ~80G for v2 keyspace. Not sure though how this helps to make more accurate calculations
  • A botnet asking all high zoom level tiles would indeed cause cache eviction but high zoom tiles get generated really fast. Vector tiles are at a rate of ~50/s for z14 IIRC correctly, PNG tiles even faster ? @Yurik is that correct? There are also a number of tricks done by @Yurik to avoid generating unneeded vector tiles. I am not sure about low level zoom tiles though.

Vector tiles are slow because each tile requires about 15 slow geoindex SQL queries - and that's mostly for tilerator, unrelated to this. PNGs are a simple conversion of a vector tile (stored in Cassandra, fetched by ID) into an image - orders of magnitude faster. If vector tile does not exist, it will pull that tile from the lower zoom levels, so maybe a few more extra requests as it searches.

  • At the beginning we will be only serving PNG tiles due to security considerations but later on, vector tiles will be also served. In fact the plan is to make Vector tiles the standard in the future IIRC. That might change a lot of calcuations. So the service is still in a state of flux as far as this goes and the number from OSM are quite probably not accurate. Our current Vector tile total in cassandra is around ~40G for v1 keyspace, ~80G for v2 keyspace. Not sure though how this helps to make more accurate calculations

Not exactly - at this point we already give both the vector tiles, or if the user is not capable of converting vector to PNG, we do it for them and give them PNGs. Asking for vector tile is a much faster process because the server does not need to convert them. For most zoom levels, vector tiles are much smaller than PNGs.

P.S. keyspace v1 can be safely dropped - we don't need it. We are using v2 only.

Change 268233 had a related patch set uploaded (by BBlack):
cache_maps: define tier-2 backending

https://gerrit.wikimedia.org/r/268233

Change 268234 had a related patch set uploaded (by BBlack):
cache_maps: define global service IPs

https://gerrit.wikimedia.org/r/268234

Change 268235 had a related patch set uploaded (by BBlack):
cache_maps: add all sites to ganglia

https://gerrit.wikimedia.org/r/268235

Change 268236 had a related patch set uploaded (by BBlack):
cache_maps: re-role old mobile servers

https://gerrit.wikimedia.org/r/268236

Change 268237 had a related patch set uploaded (by BBlack):
cache_maps: remove cp104[34] test caches

https://gerrit.wikimedia.org/r/268237

Change 268238 had a related patch set uploaded (by BBlack):
cache_maps: add all sites in LVS

https://gerrit.wikimedia.org/r/268238

Change 268239 had a related patch set uploaded (by BBlack):
maps DNS 1/2: define at all DCs

https://gerrit.wikimedia.org/r/268239

Change 268240 had a related patch set uploaded (by BBlack):
maps DNS 2/2: enable geodns routing

https://gerrit.wikimedia.org/r/268240

Change 268239 merged by BBlack:
maps DNS 1/2: define at all DCs

https://gerrit.wikimedia.org/r/268239

Change 268233 merged by BBlack:
cache_maps: define tier-2 backending

https://gerrit.wikimedia.org/r/268233

Change 268234 merged by BBlack:
cache_maps: define global service IPs

https://gerrit.wikimedia.org/r/268234

Change 268235 merged by BBlack:
cache_maps: add all sites to ganglia

https://gerrit.wikimedia.org/r/268235

Change 274746 had a related patch set uploaded (by BBlack):
r::c::instances - hack around cache_maps eqiad-only

https://gerrit.wikimedia.org/r/274746

Change 274746 merged by BBlack:
r::c::instances - hack around cache_maps eqiad-only

https://gerrit.wikimedia.org/r/274746

Mentioned in SAL [2016-04-25T19:27:28Z] <gehel> start configuration of new maps caching servers (T109162)

Change 268236 merged by Gehel:
cache_maps: re-role old mobile servers

https://gerrit.wikimedia.org/r/268236

Change 285241 had a related patch set uploaded (by BBlack):
fixup maps cache routing: T109162

https://gerrit.wikimedia.org/r/285241

Change 285245 had a related patch set uploaded (by BBlack):
esams cache_maps puppetization fixup

https://gerrit.wikimedia.org/r/285245

Change 285245 merged by Gehel:
esams cache_maps puppetization fixup

https://gerrit.wikimedia.org/r/285245

Mentioned in SAL [2016-04-25T21:02:43Z] <gehel> adding cp10(46|47|59|60)\.eqiad\.wmnet to maps caching cluster (T109162)

Change 268237 merged by BBlack:
cache_maps: remove cp104[34] test caches

https://gerrit.wikimedia.org/r/268237

Change 268238 merged by Gehel:
cache_maps: add all sites in LVS

https://gerrit.wikimedia.org/r/268238

Change 268240 merged by Gehel:
maps DNS 2/2: enable geodns routing

https://gerrit.wikimedia.org/r/268240

Varnish Maps cluster now fully configured, some traffic can already be seen on https://grafana.wikimedia.org/dashboard/db/varnish-aggregate-client-status-codes. Closing this as resolved. Feel free to reopen if needed.