Thumbor's servers are getting older, and since Thumbor is a stateless service, it makes sense to move it to Kubernetes. There are a few challenges ahead to figure out. The general goal is to have running Thumbor under Bullseye in Kubernetes on both datacentres.
- Create Thumbor Helm chart and helmfile definition (will need nutcracker sidecar, network rules and passwords for Swift and memcached)
- Create Thumbor LVS service or use ingress - will leverage existing Thumbor LVS service.
- Make decision on using statsd gateway or whether the Prometheus plugin is suitable
- Plan for number of instances in each DC - currently thumbor runs 160 instances per DC (40 instances of Thumbor on 4 physical hosts. 160 pods is probably not suitable or correct.
- Cutover plan between new instances and old
Things I see as outstanding concerns:
- We will need an alternative strategy for fc-list as we can't reliably use the current approach within pods. Could we just use a CronJob in Kubernetes?
- Scaling/capacity
- PID limits in Kubernetes given the amount of system calls Thumbor itself uses. Not sure if this is an issue at all
Thumbor currently runs in firejail, do we lose anything by dropping it in k8s?There's an existing nutcracker sharding configuration in hieradata/role/eqiad/thumbor/mediawiki.yaml