Page MenuHomePhabricator

Production Dispatch Infrastructure
Closed, DeclinedPublic

Description

Checklist for deployment of initial "production" dispatch infrastructure

  • Obtain dedicated WMCS horizon project space/quota (we'll use onfire project)
  • Containers (or puppetized service with VMs)
  • Postgres backend
  • DNS entries
  • Service/host monitoring T326842
  • Data protection (e.g. data exported and backed up) T326843
  • HA/maintenance considerations
20220823

Status update: with https://gerrit.wikimedia.org/r/c/operations/puppet/+/824449 and https://gerrit.wikimedia.org/r/c/operations/puppet/+/824448 we can stand up a database primary and dispatch's frontend (behind SSO) and scheduler. Items pending:

  • Write a dispatch plugin to read username from a request header.
  • Ship said plugin with dispatch container
  • Write a sync script between LDAP and dispatch API to set user permissions based on group (e.g. ops is dispatch admins)
  • Make sure an admin user is created for API access for the script above

Details

SubjectRepoBranchLines +/-
operations/docker-images/production-imagesmaster+532 -1
operations/docker-images/production-imagesmaster+531 -2
operations/puppetproduction+11 -0
operations/puppetproduction+413 -0
operations/puppetproduction+2 -1
operations/puppetproduction+5 -0
operations/puppetproduction+22 -37
operations/puppetproduction+83 -41
operations/puppetproduction+7 -5
operations/docker-images/production-imagesmaster+7 -1
operations/puppetproduction+1 -1
operations/dnsmaster+2 -0
operations/puppetproduction+4 -0
operations/puppetproduction+9 -0
operations/puppetproduction+2 -0
operations/puppetproduction+3 -1
operations/puppetproduction+7 -0
operations/puppetproduction+2 -2
operations/puppetproduction+1 -0
operations/puppetproduction+2 -0
operations/puppetproduction+159 -0
operations/docker-images/production-imagesmaster+10 -1
operations/puppetproduction+1 -1
operations/puppetproduction+222 -0
operations/puppetproduction+5 -0
operations/puppetproduction+6 -0
operations/puppetproduction+24 -2
operations/cookbooksmaster+2 -2
operations/puppetproduction+11 -3
operations/puppetproduction+4 -1
operations/puppetproduction+11 -13
operations/puppetproduction+14 -0
operations/docker-images/production-imagesmaster+64 -0
Show related patches Customize query in gerrit

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Change 824447 merged by Filippo Giunchedi:

[operations/puppet@production] postgresql: resync_replica improvements and fixes

https://gerrit.wikimedia.org/r/824447

Change 824486 had a related patch set uploaded (by Filippo Giunchedi; author: Filippo Giunchedi):

[operations/cookbooks@master] postgresql: default to autodetecting pg version

https://gerrit.wikimedia.org/r/824486

Change 824450 merged by Filippo Giunchedi:

[operations/puppet@production] docker: use ExecStartPre to implement --pull=always

https://gerrit.wikimedia.org/r/824450

Change 824451 merged by Filippo Giunchedi:

[operations/puppet@production] service: use --env-file for docker

https://gerrit.wikimedia.org/r/824451

Change 824486 merged by Filippo Giunchedi:

[operations/cookbooks@master] postgresql: default to autodetecting pg version

https://gerrit.wikimedia.org/r/824486

Change 825253 had a related patch set uploaded (by Filippo Giunchedi; author: Filippo Giunchedi):

[operations/puppet@production] wmflib: introduce pythonloglevel type

https://gerrit.wikimedia.org/r/825253

Change 825253 merged by Filippo Giunchedi:

[operations/puppet@production] wmflib: introduce pythonloglevel type

https://gerrit.wikimedia.org/r/825253

Change 832263 had a related patch set uploaded (by Herron; author: Herron):

[operations/puppet@production] install_server: add dhcp/netboot records for dispatch-be1001

https://gerrit.wikimedia.org/r/832263

Change 832263 merged by Herron:

[operations/puppet@production] install_server: add dhcp/netboot records for dispatch-be1001

https://gerrit.wikimedia.org/r/832263

Change 832505 had a related patch set uploaded (by Herron; author: Herron):

[operations/puppet@production] dispatch-be1001: apply role::insetup

https://gerrit.wikimedia.org/r/832505

Change 832505 merged by Herron:

[operations/puppet@production] dispatch-be1001: apply role::insetup

https://gerrit.wikimedia.org/r/832505

Change 848191 had a related patch set uploaded (by Filippo Giunchedi; author: Filippo Giunchedi):

[operations/puppet@production] dispatch: assign backend role

https://gerrit.wikimedia.org/r/848191

Change 824448 merged by Filippo Giunchedi:

[operations/puppet@production] dispatch: add backend role

https://gerrit.wikimedia.org/r/824448

Change 848228 had a related patch set uploaded (by Filippo Giunchedi; author: Filippo Giunchedi):

[operations/docker-images/production-images@master] dispatch: update to latest upstream

https://gerrit.wikimedia.org/r/848228

Change 848191 merged by Filippo Giunchedi:

[operations/puppet@production] dispatch: assign backend role

https://gerrit.wikimedia.org/r/848191

Change 848228 merged by Filippo Giunchedi:

[operations/docker-images/production-images@master] dispatch: update to latest upstream

https://gerrit.wikimedia.org/r/848228

Change 849021 had a related patch set uploaded (by Filippo Giunchedi; author: Filippo Giunchedi):

[operations/puppet@production] alerting_host: include dispatch profile

https://gerrit.wikimedia.org/r/849021

Change 824449 merged by Filippo Giunchedi:

[operations/puppet@production] dispatch: introduce profile

https://gerrit.wikimedia.org/r/824449

Change 849021 merged by Filippo Giunchedi:

[operations/puppet@production] alerting_host: include dispatch profile

https://gerrit.wikimedia.org/r/849021

Change 850178 had a related patch set uploaded (by Filippo Giunchedi; author: Filippo Giunchedi):

[operations/puppet@production] hieradata: add dispatch db_hostname

https://gerrit.wikimedia.org/r/850178

Change 850178 merged by Filippo Giunchedi:

[operations/puppet@production] hieradata: add dispatch db_hostname

https://gerrit.wikimedia.org/r/850178

Change 850993 had a related patch set uploaded (by Filippo Giunchedi; author: Filippo Giunchedi):

[operations/puppet@production] hieradata: don't monitor /run/docker on alerting_host

https://gerrit.wikimedia.org/r/850993

Change 850999 had a related patch set uploaded (by Filippo Giunchedi; author: Filippo Giunchedi):

[operations/puppet@production] dispatch: enforce ssl for dispatch DB user

https://gerrit.wikimedia.org/r/850999

Change 851000 had a related patch set uploaded (by Filippo Giunchedi; author: Filippo Giunchedi):

[operations/puppet@production] dispatch: run the scheduler on active host only

https://gerrit.wikimedia.org/r/851000

Change 851001 had a related patch set uploaded (by Filippo Giunchedi; author: Filippo Giunchedi):

[operations/puppet@production] idp: fix vhost_settings for dispatch_port

https://gerrit.wikimedia.org/r/851001

Change 851001 merged by Filippo Giunchedi:

[operations/puppet@production] idp: fix vhost_settings for dispatch_port

https://gerrit.wikimedia.org/r/851001

Change 851003 had a related patch set uploaded (by Filippo Giunchedi; author: Filippo Giunchedi):

[operations/puppet@production] hieradata: add dispatch.w.o to IDP

https://gerrit.wikimedia.org/r/851003

Change 851004 had a related patch set uploaded (by Filippo Giunchedi; author: Filippo Giunchedi):

[operations/puppet@production] hieradata: fix Prometheus IDP entry

https://gerrit.wikimedia.org/r/851004

Change 851003 merged by Filippo Giunchedi:

[operations/puppet@production] hieradata: add dispatch.w.o to IDP

https://gerrit.wikimedia.org/r/851003

Change 851004 merged by Filippo Giunchedi:

[operations/puppet@production] hieradata: fix Prometheus IDP entry

https://gerrit.wikimedia.org/r/851004

Change 850999 merged by Filippo Giunchedi:

[operations/puppet@production] dispatch: enforce ssl for dispatch DB user

https://gerrit.wikimedia.org/r/850999

Change 851000 merged by Filippo Giunchedi:

[operations/puppet@production] dispatch: run the scheduler on active host only

https://gerrit.wikimedia.org/r/851000

Change 850993 merged by Filippo Giunchedi:

[operations/puppet@production] hieradata: don't monitor /run/docker on alerting_host

https://gerrit.wikimedia.org/r/850993

Change 851619 had a related patch set uploaded (by Filippo Giunchedi; author: Filippo Giunchedi):

[operations/docker-images/production-images@master] dispatch: add ipython for 'dispatch server shell'

https://gerrit.wikimedia.org/r/851619

Change 851620 had a related patch set uploaded (by Filippo Giunchedi; author: Filippo Giunchedi):

[operations/puppet@production] dispatch: run wrapper with interactive/tty support

https://gerrit.wikimedia.org/r/851620

Change 851632 had a related patch set uploaded (by Filippo Giunchedi; author: Filippo Giunchedi):

[operations/dns@master] wikimedia.org: add dispatch.w.o

https://gerrit.wikimedia.org/r/851632

Change 851632 merged by Filippo Giunchedi:

[operations/dns@master] wikimedia.org: add dispatch.w.o

https://gerrit.wikimedia.org/r/851632

Change 851645 had a related patch set uploaded (by Filippo Giunchedi; author: Filippo Giunchedi):

[operations/puppet@production] dispatch: configure http header auth provider

https://gerrit.wikimedia.org/r/851645

Change 851620 merged by Filippo Giunchedi:

[operations/puppet@production] dispatch: run wrapper with interactive/tty support

https://gerrit.wikimedia.org/r/851620

Change 851619 merged by Filippo Giunchedi:

[operations/docker-images/production-images@master] dispatch: add ipython for 'dispatch server shell'

https://gerrit.wikimedia.org/r/851619

Change 851645 merged by Filippo Giunchedi:

[operations/puppet@production] dispatch: configure http header auth provider

https://gerrit.wikimedia.org/r/851645

Change 851672 had a related patch set uploaded (by Filippo Giunchedi; author: Filippo Giunchedi):

[operations/puppet@production] dispatch: move frontend to its own module

https://gerrit.wikimedia.org/r/851672

Change 851693 had a related patch set uploaded (by Filippo Giunchedi; author: Filippo Giunchedi):

[operations/puppet@production] dispatch: refactor/simplify db profile

https://gerrit.wikimedia.org/r/851693

Change 851672 merged by Filippo Giunchedi:

[operations/puppet@production] dispatch: move frontend to its own module

https://gerrit.wikimedia.org/r/851672

Change 851693 merged by Filippo Giunchedi:

[operations/puppet@production] dispatch: refactor/simplify db profile

https://gerrit.wikimedia.org/r/851693

Change 852992 had a related patch set uploaded (by Filippo Giunchedi; author: Filippo Giunchedi):

[operations/puppet@production] dispatch: sync user role and info from LDAP

https://gerrit.wikimedia.org/r/852992

Change 853268 had a related patch set uploaded (by Jcrespo; author: Jcrespo):

[operations/puppet@production] bacula: Add missing fileset definition dispatch-postgres

https://gerrit.wikimedia.org/r/853268

Change 853268 merged by Jcrespo:

[operations/puppet@production] bacula: Add missing fileset definition dispatch-postgres

https://gerrit.wikimedia.org/r/853268

I have started https://wikitech.wikimedia.org/wiki/Dispatch with some initialization steps, the page will need expanding of course!

Change 854486 had a related patch set uploaded (by Filippo Giunchedi; author: Filippo Giunchedi):

[operations/puppet@production] hieradata: set dispatch-be2001 as replica

https://gerrit.wikimedia.org/r/854486

Change 854486 merged by Filippo Giunchedi:

[operations/puppet@production] hieradata: set dispatch-be2001 as replica

https://gerrit.wikimedia.org/r/854486

Change 852992 merged by Filippo Giunchedi:

[operations/puppet@production] dispatch: sync user role and info from LDAP

https://gerrit.wikimedia.org/r/852992

Change 856612 had a related patch set uploaded (by Herron; author: Herron):

[operations/puppet@production] dispatch: add apache redirect from default org to wikimedia org

https://gerrit.wikimedia.org/r/856612

Change 856612 merged by Herron:

[operations/puppet@production] dispatch: add apache redirect from default org to wikimedia org

https://gerrit.wikimedia.org/r/856612

Change 857781 had a related patch set uploaded (by Herron; author: Herron):

[operations/docker-images/production-images@master] dispatch: upgrade to 20221110 and build with local config.js

https://gerrit.wikimedia.org/r/857781

Is there a reason these hosts are still being setup under Buster even though it's a new service? The designated end date for Buster is September 2023, so let's not build a new service with an OS which is at the end of the support life cycle?

Change 857781 merged by Herron:

[operations/docker-images/production-images@master] dispatch: upgrade to 20221110 and build with local config.js

https://gerrit.wikimedia.org/r/857781

Is there a reason these hosts are still being setup under Buster even though it's a new service? The designated end date for Buster is September 2023, so let's not build a new service with an OS which is at the end of the support life cycle?

The hosts are paired with the alerting_host frontends which are Buster. Agreed though, I'll have a closer look at running the backends on Bullseye. And the alert frontends will have to be upgraded in the relatively near future too.

Change 858616 had a related patch set uploaded (by Herron; author: Herron):

[operations/docker-images/production-images@master] dispatch: manage config.js locally

https://gerrit.wikimedia.org/r/858616

Change 858616 merged by Herron:

[operations/docker-images/production-images@master] dispatch: manage config.js locally

https://gerrit.wikimedia.org/r/858616

herron updated the task description. (Show Details)
lmata moved this task from Backlog to Prioritized on the Incident Tooling board.

Should this be closed since AFAIK dispatch has been ruled out?

@BCornwall I'm good with that. Do you want to do the honors?

Closing as dispatch has been ruled out as an option: See T308467 for follow-up discussion of where we're going.

We still have two dispatch-be* hosts running in production these should be properly decommissioned if that's not the solution we'll end up using?

Agree, although let's create a decom task for that as there are some services on the alert hosts to clean up as well

I've created T344937 to track decom efforts

herron changed the task status from Resolved to Declined.