Page MenuHomePhabricator

Migrate thumbor to Kubernetes
Closed, ResolvedPublic

Description

Thumbor's servers are getting older, and since Thumbor is a stateless service, it makes sense to move it to Kubernetes. There are a few challenges ahead to figure out. The general goal is to have running Thumbor under Bullseye in Kubernetes on both datacentres.

  • Create Thumbor Helm chart and helmfile definition (will need nutcracker sidecar, network rules and passwords for Swift and memcached)
  • Create Thumbor LVS service or use ingress - will leverage existing Thumbor LVS service.
  • Make decision on using statsd gateway or whether the Prometheus plugin is suitable
  • Plan for number of instances in each DC - currently thumbor runs 160 instances per DC (40 instances of Thumbor on 4 physical hosts. 160 pods is probably not suitable or correct.
  • Cutover plan between new instances and old

Things I see as outstanding concerns:

  • We will need an alternative strategy for fc-list as we can't reliably use the current approach within pods. Could we just use a CronJob in Kubernetes?
  • Scaling/capacity
  • PID limits in Kubernetes given the amount of system calls Thumbor itself uses. Not sure if this is an issue at all
  • Thumbor currently runs in firejail, do we lose anything by dropping it in k8s?
  • There's an existing nutcracker sharding configuration in hieradata/role/eqiad/thumbor/mediawiki.yaml

Details

SubjectRepoBranchLines +/-
operations/deployment-chartsmaster+2 -2
operations/deployment-chartsmaster+8 -1
operations/puppetproduction+15 -2
operations/deployment-chartsmaster+8 -3
operations/deployment-chartsmaster+16 -2
operations/deployment-chartsmaster+1 -1
operations/puppetproduction+1 -0
operations/puppetproduction+36 -0
operations/deployment-chartsmaster+16 -16
operations/deployment-chartsmaster+4 -4
operations/deployment-chartsmaster+24 -2
operations/deployment-chartsmaster+6 -6
operations/deployment-chartsmaster+30 -30
operations/deployment-chartsmaster+1 -1
operations/deployment-chartsmaster+1 -1
operations/software/thumbor-pluginsmaster+4 -2
operations/deployment-chartsmaster+1 -1
operations/deployment-chartsmaster+4 -16
operations/deployment-chartsmaster+5 -2
operations/deployment-chartsmaster+9 -4
operations/software/thumbor-pluginsmaster+3 -1
operations/software/thumbor-pluginsmaster+4 -3
operations/deployment-chartsmaster+1 -1
operations/software/thumbor-pluginsmaster+6 -5
operations/deployment-chartsmaster+16 -3
operations/software/thumbor-pluginsmaster+1 -1
operations/deployment-chartsmaster+8 -8
operations/deployment-chartsmaster+3 -34
operations/software/thumbor-pluginsmaster+14 -1
operations/deployment-chartsmaster+3 -1
operations/puppetproduction+4 -0
operations/deployment-chartsmaster+251 -0
operations/deployment-chartsmaster+1 -0
operations/deployment-chartsmaster+1 K -0
operations/docker-images/production-imagesmaster+7 -1
operations/software/thumbor-pluginsmaster+2 -0
operations/docker-images/production-imagesmaster+9 -2
operations/deployment-chartsmaster+9 -2
operations/docker-images/production-imagesmaster+20 -0
Show related patches Customize query in gerrit

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Change 832289 abandoned by Hnowlan:

[operations/deployment-charts@master] changeprop: add num_workers support for jobqueue

Reason:

Dupe change, wrong bug.

https://gerrit.wikimedia.org/r/832289

Change 832235 merged by Hnowlan:

[operations/docker-images/production-images@master] haproxy: use haproxy24 component

https://gerrit.wikimedia.org/r/832235

Change 839548 had a related patch set uploaded (by Hnowlan; author: Hnowlan):

[operations/software/thumbor-plugins@master] Add missing prod dependencies

https://gerrit.wikimedia.org/r/839548

Change 839548 merged by jenkins-bot:

[operations/software/thumbor-plugins@master] Add missing prod dependencies

https://gerrit.wikimedia.org/r/839548

Change 841477 had a related patch set uploaded (by Hnowlan; author: Hnowlan):

[operations/docker-images/production-images@master] haproxy: fix apt repository path

https://gerrit.wikimedia.org/r/841477

Change 841477 merged by Hnowlan:

[operations/docker-images/production-images@master] haproxy: fix apt repository path

https://gerrit.wikimedia.org/r/841477

Change 823143 merged by jenkins-bot:

[operations/deployment-charts@master] thumbor: new service chart

https://gerrit.wikimedia.org/r/823143

Change 824473 merged by jenkins-bot:

[operations/deployment-charts@master] admin: add thumbor namespace

https://gerrit.wikimedia.org/r/824473

Change 824519 merged by jenkins-bot:

[operations/deployment-charts@master] helmfile.d: add thumbor configuration

https://gerrit.wikimedia.org/r/824519

Change 849591 had a related patch set uploaded (by Hnowlan; author: Hnowlan):

[operations/puppet@production] kubernetes: add deployment_services entry for thumbor

https://gerrit.wikimedia.org/r/849591

Change 849591 merged by Hnowlan:

[operations/puppet@production] kubernetes: add deployment_services entry for thumbor

https://gerrit.wikimedia.org/r/849591

Change 849605 had a related patch set uploaded (by Hnowlan; author: Hnowlan):

[operations/deployment-charts@master] thumbor: disable TLS for now

https://gerrit.wikimedia.org/r/849605

Change 849605 merged by jenkins-bot:

[operations/deployment-charts@master] thumbor: disable TLS for now

https://gerrit.wikimedia.org/r/849605

Change 851608 had a related patch set uploaded (by Hnowlan; author: Hnowlan):

[operations/software/thumbor-plugins@master] Generate thumbor.key via prod entrypoint script

https://gerrit.wikimedia.org/r/851608

Change 851609 had a related patch set uploaded (by Hnowlan; author: Hnowlan):

[operations/deployment-charts@master] thumbor: don't manage thumbor.key within Helm

https://gerrit.wikimedia.org/r/851609

Change 851608 merged by jenkins-bot:

[operations/software/thumbor-plugins@master] Generate thumbor.key via prod entrypoint script

https://gerrit.wikimedia.org/r/851608

Change 851609 merged by jenkins-bot:

[operations/deployment-charts@master] thumbor: don't manage thumbor.key within Helm

https://gerrit.wikimedia.org/r/851609

Change 852240 had a related patch set uploaded (by Hnowlan; author: Hnowlan):

[operations/deployment-charts@master] kask, thumbor: update invalid base requests

https://gerrit.wikimedia.org/r/852240

Change 852240 merged by jenkins-bot:

[operations/deployment-charts@master] kask, thumbor: update invalid base requests

https://gerrit.wikimedia.org/r/852240

Change 852911 had a related patch set uploaded (by Hnowlan; author: Hnowlan):

[operations/software/thumbor-plugins@master] poolcounter: Await connect coroutine

https://gerrit.wikimedia.org/r/852911

Change 852911 merged by jenkins-bot:

[operations/software/thumbor-plugins@master] poolcounter: Await connect coroutine

https://gerrit.wikimedia.org/r/852911

Change 852953 had a related patch set uploaded (by Hnowlan; author: Hnowlan):

[operations/deployment-charts@master] thumbor: reduce memory limit, add service nodeport

https://gerrit.wikimedia.org/r/852953

Change 852953 merged by jenkins-bot:

[operations/deployment-charts@master] thumbor: reduce memory limit, add service nodeport

https://gerrit.wikimedia.org/r/852953

Change 852958 had a related patch set uploaded (by Hnowlan; author: Hnowlan):

[operations/software/thumbor-plugins@master] Encode messages written to poolcounter stream

https://gerrit.wikimedia.org/r/852958

Change 852958 merged by jenkins-bot:

[operations/software/thumbor-plugins@master] Encode messages written to poolcounter stream

https://gerrit.wikimedia.org/r/852958

Change 853271 had a related patch set uploaded (by Hnowlan; author: Hnowlan):

[operations/deployment-charts@master] thumbor: image bump

https://gerrit.wikimedia.org/r/853271

Change 853271 merged by jenkins-bot:

[operations/deployment-charts@master] thumbor: image bump

https://gerrit.wikimedia.org/r/853271

Change 853944 had a related patch set uploaded (by Hnowlan; author: Hnowlan):

[operations/software/thumbor-plugins@master] Allow additional parameters to be passed to prod entrypoint

https://gerrit.wikimedia.org/r/853944

Change 853944 merged by jenkins-bot:

[operations/software/thumbor-plugins@master] Allow additional parameters to be passed to prod entrypoint

https://gerrit.wikimedia.org/r/853944

Change 854026 had a related patch set uploaded (by Hnowlan; author: Hnowlan):

[operations/deployment-charts@master] thumbor: enable setting log level, set staging to debug

https://gerrit.wikimedia.org/r/854026

Change 854029 had a related patch set uploaded (by Hnowlan; author: Hnowlan):

[operations/software/thumbor-plugins@master] Encode before using hashlib

https://gerrit.wikimedia.org/r/854029

Change 854029 merged by jenkins-bot:

[operations/software/thumbor-plugins@master] Encode before using hashlib

https://gerrit.wikimedia.org/r/854029

Change 854026 merged by jenkins-bot:

[operations/deployment-charts@master] thumbor: enable setting log level, set staging to debug

https://gerrit.wikimedia.org/r/854026

Change 854512 had a related patch set uploaded (by Hnowlan; author: Hnowlan):

[operations/deployment-charts@master] thumbor: Use environment variables for config

https://gerrit.wikimedia.org/r/854512

Change 854512 merged by jenkins-bot:

[operations/deployment-charts@master] thumbor: Use environment variables for config

https://gerrit.wikimedia.org/r/854512

Change 855024 had a related patch set uploaded (by Hnowlan; author: Hnowlan):

[operations/deployment-charts@master] thumbor: log according to the configured level

https://gerrit.wikimedia.org/r/855024

Change 855024 merged by jenkins-bot:

[operations/deployment-charts@master] thumbor: log according to the configured level

https://gerrit.wikimedia.org/r/855024

Change 855977 had a related patch set uploaded (by Hnowlan; author: Hnowlan):

[operations/deployment-charts@master] thumbor: bump version number

https://gerrit.wikimedia.org/r/855977

Change 855977 merged by jenkins-bot:

[operations/deployment-charts@master] thumbor: bump version number

https://gerrit.wikimedia.org/r/855977

Change 856515 had a related patch set uploaded (by Hnowlan; author: Hnowlan):

[operations/software/thumbor-plugins@master] result_storage.swift: re-enable logging of statements during swift requests

https://gerrit.wikimedia.org/r/856515

Change 856515 merged by jenkins-bot:

[operations/software/thumbor-plugins@master] result_storage.swift: re-enable logging of statements during swift requests

https://gerrit.wikimedia.org/r/856515

Change 856526 had a related patch set uploaded (by Hnowlan; author: Hnowlan):

[operations/deployment-charts@master] thumbor: bump image version

https://gerrit.wikimedia.org/r/856526

Change 856526 merged by jenkins-bot:

[operations/deployment-charts@master] thumbor: bump image version

https://gerrit.wikimedia.org/r/856526

Change 856941 had a related patch set uploaded (by Hnowlan; author: Hnowlan):

[operations/deployment-charts@master] thumbor: version bump

https://gerrit.wikimedia.org/r/856941

Change 856941 merged by jenkins-bot:

[operations/deployment-charts@master] thumbor: version bump

https://gerrit.wikimedia.org/r/856941

Change 859106 had a related patch set uploaded (by Hnowlan; author: Hnowlan):

[operations/deployment-charts@master] thumbor: fix metrics prefix

https://gerrit.wikimedia.org/r/859106

Change 859106 merged by jenkins-bot:

[operations/deployment-charts@master] thumbor: fix metrics prefix

https://gerrit.wikimedia.org/r/859106

Change 860072 had a related patch set uploaded (by Hnowlan; author: Hnowlan):

[operations/deployment-charts@master] thumbor: lower memory limits and requests

https://gerrit.wikimedia.org/r/860072

Change 860072 merged by jenkins-bot:

[operations/deployment-charts@master] thumbor: lower memory limits and requests

https://gerrit.wikimedia.org/r/860072

Change 860580 had a related patch set uploaded (by Hnowlan; author: Hnowlan):

[operations/software/thumbor-plugins@master] Add tinyrgb colour profile

https://gerrit.wikimedia.org/r/860580

Change 862230 had a related patch set uploaded (by Hnowlan; author: Hnowlan):

[operations/deployment-charts@master] admin_ng: set thumbor max memory limit higher

https://gerrit.wikimedia.org/r/862230

Change 862230 merged by jenkins-bot:

[operations/deployment-charts@master] admin_ng: set thumbor max memory limit higher

https://gerrit.wikimedia.org/r/862230

Change 865054 had a related patch set uploaded (by Hnowlan; author: Hnowlan):

[operations/deployment-charts@master] thumbor: change exposed port to 8800

https://gerrit.wikimedia.org/r/865054

Change 865054 merged by jenkins-bot:

[operations/deployment-charts@master] thumbor: change exposed port to 8800

https://gerrit.wikimedia.org/r/865054

Change 866445 had a related patch set uploaded (by Hnowlan; author: Hnowlan):

[operations/puppet@production] conftool: add kubernetes nodes as thumbor nodes

https://gerrit.wikimedia.org/r/866445

Change 867186 had a related patch set uploaded (by Hnowlan; author: Hnowlan):

[operations/deployment-charts@master] thumbor: fix metric labels

https://gerrit.wikimedia.org/r/867186

Change 867186 merged by jenkins-bot:

[operations/deployment-charts@master] thumbor: fix metric labels

https://gerrit.wikimedia.org/r/867186

Change 866445 merged by Hnowlan:

[operations/puppet@production] conftool: add kubernetes nodes as thumbor nodes

https://gerrit.wikimedia.org/r/866445

Change 867681 had a related patch set uploaded (by Hnowlan; author: Hnowlan):

[operations/puppet@production] kubernetes: add thumbor to lvs pools for workers

https://gerrit.wikimedia.org/r/867681

Change 867681 merged by Hnowlan:

[operations/puppet@production] kubernetes: add thumbor to lvs pools for workers

https://gerrit.wikimedia.org/r/867681

Change 868075 had a related patch set uploaded (by Hnowlan; author: Hnowlan):

[operations/deployment-charts@master] thumbor: increase cpu limit to 1.5 per instance

https://gerrit.wikimedia.org/r/868075

Change 868075 merged by jenkins-bot:

[operations/deployment-charts@master] thumbor: increase cpu limit to 1.5 per instance

https://gerrit.wikimedia.org/r/868075

Change 878957 had a related patch set uploaded (by Hnowlan; author: Hnowlan):

[operations/deployment-charts@master] thumbor: set maxSurge

https://gerrit.wikimedia.org/r/878957

Change 878957 merged by jenkins-bot:

[operations/deployment-charts@master] thumbor: set maxSurge

https://gerrit.wikimedia.org/r/878957

Change 880498 had a related patch set uploaded (by Hnowlan; author: Hnowlan):

[operations/deployment-charts@master] thumbor: move liveness check to hit haproxy

https://gerrit.wikimedia.org/r/880498

Change 880498 merged by jenkins-bot:

[operations/deployment-charts@master] thumbor: move liveness check to hit haproxy

https://gerrit.wikimedia.org/r/880498

Change 880898 had a related patch set uploaded (by Hnowlan; author: Hnowlan):

[operations/puppet@production] thumbor: add and use haproxy healthz lvs check

https://gerrit.wikimedia.org/r/880898

Change 881635 had a related patch set uploaded (by Hnowlan; author: Hnowlan):

[operations/deployment-charts@master] thumbor: add failure condition to health check

https://gerrit.wikimedia.org/r/881635

Change 880898 merged by Hnowlan:

[operations/puppet@production] thumbor: add and use haproxy healthz lvs check

https://gerrit.wikimedia.org/r/880898

Change 881635 merged by jenkins-bot:

[operations/deployment-charts@master] thumbor: add failure condition to health check

https://gerrit.wikimedia.org/r/881635

Joe added a parent task: Restricted Task.Mar 20 2023, 12:26 PM

Thumbor-k8s is now pooled in both datacentres and, some kind of major issue notwithstanding, will remain pooled. Given the sheer age/size of this ticket, I have moved the final stages of work to T334488.