Background: https://wikitech.wikimedia.org/wiki/Thumbor#Throttling
Why?
Thumbor is using Memcached via Nutcracker. Right now, Thumbor's Memcached backend is a group of small Memcached instances running on each physical Thumbor host, accessed (and sharded) via Nutcracker. Nutcraker is a software we would like to completely retire from our infrastructure, while, with Thumbor's migration to K8s, the physical servers will be going away.
- about ~1 MB allocated on each host https://grafana-rw.wikimedia.org/d/000000316/memcache?orgId=1&var-datasource=eqiad%20prometheus%2Fops&var-cluster=thumbor&var-instance=All
- ~40 objects stored in each server https://grafana-rw.wikimedia.org/d/000000317/memcache-slabs?orgId=1&var-datasource=eqiad%20prometheus%2Fops&var-cluster=thumbor&var-instance=thumbor1005&var-slab=All
- Memcached here is used for rate limiting, basically not continuously keep trying to process files we failed to process before
What?
While we are still working on the details as to how Mcrouter will exist in kubernetes, we can proceed in liberating the bare metal thumbor servers from the weight of having a memcached instance.
Possible solutions:
- Create 2 VMs in each DC. One server to be the main memcached host for thumbor, and the second to be the "gutter pool".
- Use our main memcached cluster
- Use the wikifunctions cluster
- Thumbor needs to be migrated to mcrouter, unless there are good reasons not to
We should evaluate the solutions for thumbor's needs, as well as if the first proposed solution (use mcrouter+memcached VMs), is like bringing a steamroller to cover a hole in the ground
How?
- update thumbor's chart and helmfile to include mcrouter
- temporarily, we will use mediawiki's memcached cluster for this (with complete understanding that this is an overkill)
- start mcrouter on a different port, and ensure it works as it should
- make the appropriate changes so thumbor will start using mcrouter
- kill nutcracker with fire
Extras:
- we could consider spinning up 2 small VMs to server thumbor's memached needs, but this is TBD