Page MenuHomePhabricator

add thumbor to production infrastructure
Closed, ResolvedPublic

Description

tracking here what needs to happen on the production infrastructure to happen for thumbor:

  • hardware provisioning
  • LVS setup
  • modify rewrite.py to duplicate a fraction of thumbnail requests to thumbor (some of it already in mediawiki-vagrant) T139484
  • thumbor puppetization (includes module, role, etc)
  • grant thumbor-admins access https://gerrit.wikimedia.org/r/#/c/302471/
  • provision swift account for thumbor and provide write access to relevant containers
  • monitoring and alarming
  • setup firejail for thumbor, similar to imagescaler

HW provisioning

thumbor is expected to be purely CPU/network bound. With ideally less, but at least similar, requirements as the current image scalers. We can order dedicated hardware and in the mean time re-use some of recently decommissioned hardware.

rewrite.py integration

Implementation is part of mediawiki-vagrant and should be ported to production and its configuration puppetized: T139484

Details

Related Gerrit Patches:
operations/puppet : productionswift: enable thumbor on commons
operations/puppet : productionthumbor: add prometheus::node_exporter
operations/puppet : productionthumbor: enable most wikis from top30, excluding commons
operations/puppet : productionRevert "swift: disable thumbor shadow traffic"
operations/puppet : productionthumbor: tune nginx next_upstream behaviour
operations/puppet : productionintroduce thumbor-admins group
operations/puppet : productionswift: enable shadow thumb requests for small wikis
operations/puppet : productionthumbor: don't double log timestamps
mediawiki/vagrant : masterWrap Thumbor in firejail
operations/puppet : productionthumbor: add firejail profile
operations/puppet : productionthumbor: use 'mw' as thumbor account
operations/puppet : productionhieradata: add thumbor swift account
operations/puppet : productionlvs: add thumbor to lvs
operations/puppet : productionsite: add thumbor100[12]
operations/puppet : productionpuppetization for thumbor
operations/dns : masteradd thumbor service IPs
operations/puppet : productionclaim mw129[12] for thumbor
operations/dns : masterclaim mw129[12] for thumbor

Related Objects

StatusAssignedTask
ResolvedGilles
ResolvedGilles
ResolvedGilles
ResolvedGilles
ResolvedNone
Resolvedfgiunchedi
ResolvedGilles
Resolvedfgiunchedi
Resolvedfgiunchedi
ResolvedGilles
ResolvedGilles
DeclinedGilles
ResolvedGilles
ResolvedGilles
ResolvedGilles
ResolvedGilles
ResolvedGilles
DeclinedGilles
ResolvedGilles
ResolvedGilles
ResolvedGilles
ResolvedGilles
ResolvedGilles
ResolvedGilles
InvalidGilles
ResolvedGilles
ResolvedGilles
ResolvedGilles
ResolvedGilles
ResolvedGilles
ResolvedGilles
ResolvedGilles
ResolvedGilles
ResolvedGilles
ResolvedGilles
DuplicateGilles
ResolvedGilles
DuplicateGilles
ResolvedGilles
ResolvedGilles
ResolvedGilles
ResolvedGilles
ResolvedGilles
ResolvedGilles
ResolvedGilles
DuplicateGilles
DuplicateGilles
DeclinedGilles
ResolvedGilles
DeclinedGilles
ResolvedGilles
ResolvedGilles
ResolvedGilles
ResolvedGilles
ResolvedGilles
ResolvedGilles
ResolvedGilles
ResolvedGilles
ResolvedGilles
ResolvedGilles
ResolvedGilles

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

"monitoring and alarming" is still unchecked on this task. Is that true? Is there still something to do there? I thought we had paging alarms raised before when disk usage filled up.

Gilles moved this task from Backlog to Doing on the Thumbor board.Oct 13 2016, 5:54 AM
Dzahn removed a subscriber: Dzahn.Oct 18 2016, 5:56 PM
elukey removed a subscriber: elukey.Nov 24 2016, 11:30 AM
Gilles removed fgiunchedi as the assignee of this task.Nov 24 2016, 1:42 PM
Gilles moved this task from Doing to Backlog on the Thumbor board.Nov 24 2016, 4:00 PM
Gilles moved this task from Backlog to Doing on the Thumbor board.Dec 6 2016, 2:25 PM
Gilles added a comment.Dec 6 2016, 2:29 PM

I'm going to move all the remaining tasks to the parent, since for all intents and purposes, Thumbor has been running in production for some time now, just not serving its results yet.