migrate a role
The suggested way to migrate a roll is to first pick some canary host and use the sre.puppet.migrate-host cookbook. The cookbook is pretty straight forward but will require a manual puppet change.
Once the host in a specific role has been migrated and puppet runs successfully you should be able to use the sre.puppet.migrate-role cookbook to migrate the entire role. This run similar process to above.
fix forward
The following are the steps needed to fix forward. depending on when a cookbook failed will determine which of theses steps still need to be completed. however this most often happens when puppet has run after the hiera change has merged and before the cookbook has had a chance to run. in which case the following change should have been made to puppet and puppet7 should be available to install
--- puppet.conf 2023-11-21 09:10:39.027384617 +0000 +++ /etc/puppet/puppet.conf 2023-11-16 12:27:55.442286598 +0000 @@ -9,10 +9,11 @@ ssldir = /var/lib/puppet/ssl rundir = /var/run/puppet factpath = $vardir/lib/facter +certificate_revocation = leaf [agent] -server = puppet -ca_server = puppetmaster1001.eqiad.wmnet +use_srv_records = true +srv_domain = codfw.wmnet daemonize = false http_connect_timeout = 60 http_read_timeout = 960
- check /etc/puppet/puppet.conf has the changes above
- install puppet agent sudo apt-get install puppet-agent
- delete the old key material rm -rf /var/lib/puppet/ssl
- run puppet to regen certs puppet agent -tw1
- on puppetserver1001 sign the new cert puppetserver ca sign --certname $fqdn
- on the old puppetmaster clean the node puppet node clean $fqdn
rollback
Ideally users should try to fix forward to some canary host but if a rollback is needed we have to do the following steps
on the puppetserver1001 run sudo puppetserver ca clean --certname $fqdn
on the puppetmaster run sudo puppet ca destroy $fqdn
#!/bin/sh sudo cp puppet.conf /etc/puppet/puppet.conf sudo rm -rf /var/lib/puppet/ssl sudo rm /etc/apt/preferences.d/apt_pin_puppet.pref sudo rm /etc/apt/sources.list.d/repository_puppet.list sudo apt-get update -y sudo apt-get install --allow-downgrades -y puppet=5.5.22-2 sudo chattr +i /etc/puppet/puppet.conf printf "run the following on puppetmaster1001:\n\tsudo puppe cert sign %s\n" "$(hostname -f)" sudo puppet agent -tw 1 sudo chattr -i /etc/puppet/puppet.conf
Core Platform
- restbase::production
- sessionstore
Data Engineering
- analytics_cluster::airflow::analytics_product
- analytics_cluster::airflow::platform_eng
- analytics_cluster::airflow::research
- analytics_cluster::airflow::search
- analytics_cluster::airflow::wmde
- analytics_cluster::coordinator
- analytics_cluster::coordinator::replica (buster) (role has been removed)
- analytics_cluster::datahub::opensearch
- analytics_cluster::hadoop::client (role removed)
- analytics_cluster::hadoop::master
- analytics_cluster::hadoop::standby
- analytics_cluster::hadoop::ui (buster) (role removed)
- analytics_cluster::hadoop::worker
- analytics_cluster::hadoop::yarn
- analytics_cluster::launcher
- analytics_cluster::mariadb
- analytics_cluster::postgresql
- analytics_cluster::presto::server
- analytics_cluster::turnilo
- analytics_cluster::turnilo::staging
- analytics_cluster::ui::superset
- analytics_cluster::ui::superset::staging
- analytics_cluster::webserver
- analytics_cluster::zookeeper
- analytics_test_cluster::client
- analytics_test_cluster::coordinator
- analytics_test_cluster::hadoop::master
- analytics_test_cluster::hadoop::standby
- analytics_test_cluster::hadoop::ui
- analytics_test_cluster::hadoop::worker
- analytics_test_cluster::presto::server
- aqs
- archiva
- ceph::server
- druid::analytics::worker
- druid::public::worker
- druid::test_analytics::worker
- dse_k8s::master
- dse_k8s::worker
- dumps::generation::server::misccrons
- dumps::generation::server::spare
- dumps::generation::server::xmldumps
- dumps::generation::server::xmlfallback
- dumps::generation::worker::dumper
- dumps::generation::worker::dumper_misc_crons_only
- dumps::generation::worker::dumper_monitor
- dumps::generation::worker::testbed
- dumps::web::htmldumps
- etcd::v3::dse_k8s_etcd
- eventlogging::analytics
- eventschemas::service
- insetup::data_engineering
- kafka::jumbo::broker
- kafka::test::broker
- karapace
- mariadb::analytics_replica
- mariadb::misc::analytics::backup
- matomo (renamed from piwik)
- statistics::explorer
- zookeeper::flink
- zookeeper::test
Data Persistence
- backup
- backup::databases
- backup::es
- backup::offsite
- backup::production
- cassandra_dev
- dbbackups::content
- dbbackups::metadata
- dbbackups::monitoring
- insetup::data_persistence
- mariadb::backup_source
- mariadb::core
- es1
- es2
- es3
- es4
- es5
- s1
- s2 (except db1246 which is down with broken hardware)
- s3
- s4
- s5
- s6
- s7
- s8
- x1
- x2
- mariadb::core_multiinstance
- mariadb::core_test
- mariadb::misc
- mariadb::misc::db_inventory
- mariadb::misc::multiinstance
- mariadb::misc::phabricator
- mariadb::objectstash
- mariadb::parsercache
- mariadb::proxy::master
- mariadb::proxy::replicas (role has been retired)
- mariadb::sanitarium_master
- mariadb::sanitarium_multiinstance
- mediabackup::storage
- mediabackup::worker
- orchestrator
- swift::proxy
- swift::storage
- thanos::backend
- thanos::frontend
Infrastructure Foundations
- apt_repo
- aux_k8s::master
- aux_k8s::worker
- bastionhost
- builder
- cluster::management (blockers (resolved): https://phabricator.wikimedia.org/T350686 , https://phabricator.wikimedia.org/T352974 )
- cluster::unprivmanagement
- config_master
- debmonitor::server
- etcd::v3::aux_k8s_etcd
- failoid
- ganeti
- ganeti_test
- idm
- idm_test
- idp
- idp_test
- insetup::infrastructure_foundations
- installserver
- kerberos::kdc
- mail::mx
- mirrors
- netbox::database
- netbox::frontend
- netbox::standalone
- netinsights
- netmon
- openldap::replica
- openldap::rw
- ping_offload
- pki::multirootca
- pki::root
- puppetboard
- puppetdb
-
puppetmaster::backend- won't migrate -
puppetmaster::frontend- won't migrate - puppetserver
- rpkivalidator
- sretest
- test
- url_downloader
- apt_staging
Machine Learning
- etcd::v3::ml_etcd
- etcd::v3::ml_etcd::staging
- ml_cache::storage
- ml_k8s::master
- ml_k8s::staging::master
- ml_k8s::staging::worker
- ml_k8s::worker
Observability
- alerting_host (blocker: T358506)
- arclamp
- grafana
- graphite::production
- kafka::logging
- kafka::monitoring_bullseye
- logging::mediawiki::udp2log
- logging::opensearch::collector
- logging::opensearch::data
- prometheus
- prometheus::pop
- syslog::centralserver
- titan
- webperf
Search Platform
- apifeatureusage::logstash
- elasticsearch::cirrus
- elasticsearch::cloudelastic
- elasticsearch::relforge
- insetup::search_platform
- search::loader
- wcqs::public
- wdqs::internal
- wdqs::public
- wdqs::test
ServiceOps
- chartmuseum
- configcluster (blocked by https://phabricator.wikimedia.org/T352245)
- deployment_server::kubernetes
- docker_registry_ha::registry
- dragonfly::supernode
- etcd::v3::kubernetes
- etcd::v3::kubernetes::staging
- insetup::serviceops
- kafka::main
- kubernetes::master
- kubernetes::staging::master
- kubernetes::staging::worker
- kubernetes::worker
- mediawiki::appserver (buster)
- mediawiki::appserver::api (role no longer in use)
- mediawiki::jobrunner (buster)
- mediawiki::maintenance (buster)
- mediawiki::memcached
- mediawiki::memcached::gutter
- memcached
- parsoid (role no longer in use)
- parsoid::testing
- parsoid::testreduce
- poolcounter::server
- redis::misc::master
- redis::misc::slave
ServiceOps-Collab
- aphlict
- ci
- doc
- etherpad
- gerrit
- gitlab
- gitlab_runner
- insetup::serviceops_collab
- microsites::peopleweb
- miscweb
- phabricator (buster)
- planet
- releases
- requesttracker
- vrts
- stewards
- lists
Traffic
-
acme_chief- won't migrate (https://phabricator.wikimedia.org/T352242 for setting up second acmechief/Puppet 7 host) - cache::text
- cache::upload
- dnsbox
- durum
- insetup_noferm
- lvs::balancer
- ncredir
- pybaltest
- wikidough
Unowned
- mw_rc_irc
- insetup::unowned
- maps::master (buster)
- maps::replica (buster)
WMCS
- cluster::cloud_management
- dumps::distribution::server
- insetup::wmcs
- wmcs::ceph::mon (buster)
- wmcs::ceph::osd (buster)
- wmcs::cloudgw
- wmcs::cloudlb
- wmcs::db::wikireplicas::analytics_multiinstance
- wmcs::db::wikireplicas::dedicated::analytics_multiinstance
- wmcs::db::wikireplicas::web_multiinstance
- wmcs::openstack::codfw1dev::backups
- wmcs::openstack::codfw1dev::cloudweb
- wmcs::openstack::codfw1dev::control
- wmcs::openstack::codfw1dev::db
- wmcs::openstack::codfw1dev::net
- wmcs::openstack::codfw1dev::services
- wmcs::openstack::codfw1dev::virt_ceph
- wmcs::openstack::codfw1dev::cinder_backups
- wmcs::openstack::eqiad1::cinder_backups
- wmcs::openstack::eqiad1::cloudweb
- wmcs::openstack::eqiad1::control
- wmcs::openstack::eqiad1::instance_backups
- wmcs::openstack::eqiad1::net
- wmcs::openstack::eqiad1::rabbitmq
- wmcs::openstack::eqiad1::services
- wmcs::openstack::eqiad1::virt
- wmcs::openstack::eqiad1::virt_ceph
This page was generated with the following script (also on puppetdb1003:/home/jbond/pql/role_owners.py)
#!/usr/bin/python3 from collections import defaultdict from pypuppetdb import connect db = connect() pql = """ resources [parameters, tags]{ type = 'Class' and title = 'Profile::Contacts' } """ roll_exclusions = { 'apt_repo': 'buster', 'debmonitor::server': 'buster', 'puppetmaster::backend': "won't migrate", 'puppetmaster::frontend': "won't migrate", 'acme_chief': "won't migrate", } puppet7_pql = "resources [tags, parameters]{ type = 'Class' and title = 'Profile::Puppet::Agent' }" puppet7_roles = set() owners = defaultdict(set) for resource in db.pql(puppet7_pql): if not resource['parameters']['force_puppet7']: continue role = [r for r in resource['tags'] if r.startswith('role::')][0] puppet7_roles.add(role) resources = db.pql(pql) for resource in resources: try: owner = resource['parameters']['role_contacts'][0] except IndexError: owner = 'Unknown' role = [r for r in resource['tags'] if r.startswith('role::')][0] owners[owner].add(role) for owner, roles in dict(sorted(owners.items())).items(): print(f'= {owner} =') for role in sorted(roles): completed = 'x' if role in puppet7_roles else '' role = role[6:] role = f"~~{role}~~ - {roll_exclusions[role]} " if role in roll_exclusions else role print(f'[{completed}] {role}')