Move db1176 to m1
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	• Marostegui
	Jan 24 2023, 10:45 AM

Description

m1 needs a switchover.
We need to recloned db1176 to become a m1 replica and install 10.4 back (it is running mariadb 11 at the moment)

Details

	Subject	Repo	Branch	Lines +/-
	mariadb: Move db1176 to m1	operations/puppet	production	+8 -2
	mariadb: Install MariaDB 11 on db1106	operations/puppet	production	+2 -3

Customize query in gerrit

Related Objects
Search...

Status	Assigned	Task
		Restricted Task
		Restricted Task
Open	• Marostegui	T326116 Package and test MariaDB 11
Resolved	• Marostegui	T327762 Move db1176 to m1

Event Timeline

• Marostegui created this task.Jan 24 2023, 10:45 AM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJan 24 2023, 10:45 AM

• Marostegui triaged this task as Medium priority.Jan 24 2023, 10:46 AM

• Marostegui added a parent task: T326116: Package and test MariaDB 11.

• Marostegui moved this task from Triage to In progress on the DBA board.

Change 883133 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] mariadb: Install MariaDB 11 on db1106

https://gerrit.wikimedia.org/r/883133

Change 883133 merged by Marostegui:

[operations/puppet@production] mariadb: Install MariaDB 11 on db1106

https://gerrit.wikimedia.org/r/883133

Change 883136 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] mariadb: Move db1176 to m1

https://gerrit.wikimedia.org/r/883136

Change 883136 merged by Marostegui:

[operations/puppet@production] mariadb: Move db1176 to m1

https://gerrit.wikimedia.org/r/883136

Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1001 for host db1176.eqiad.wmnet with OS bullseye

P43309 db1117:3321 -> 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54

db1176

1	************************* 1. row *********************** Slave_IO_State: Master_Host: db1195.eqiad.wmnet Master_User: repl Master_Port: 3306 Connect_Retry: 60 Master_Log_File: db1195-bin.001769 Read_Master_Log_Pos: 159907434 Relay_Log_File: db1117-relay-bin.000002 Relay_Log_Pos: 43866087 Relay_Master_Log_File: db1195-bin.001769 Slave_IO_Running: No Slave_SQL_Running: No Replicate_Do_DB: Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: Last_Errno: 0 Last_Error: Skip_Counter: 0 Exec_Master_Log_Pos: 159907434 Relay_Log_Space: 43866397 Until_Condition: None Until_Log_File: Until_Log_Pos: 0 Master_SSL_Allowed: Yes Master_SSL_CA_File: Master_SSL_CA_Path: Master_SSL_Cert: Master_SSL_Cipher: Master_SSL_Key: Seconds_Behind_Master: NULL Master_SSL_Verify_Server_Cert: No Last_IO_Errno: 0 Last_IO_Error: Last_SQL_Errno: 0 Last_SQL_Error: Replicate_Ignore_Server_Ids: Master_Server_Id: 172001292 Master_SSL_Crl: Master_SSL_Crlpath: Using_Gtid: Slave_Pos Gtid_IO_Pos: 0-171966484-2731336216,171966484-171966484-7582228474,171974733-171974733-2008457625,171966562-171966562-962004828,171970746-171970746-808478946,171966512-171966512-1959889139,171974884-171974884-9104192396,171966556-171966556-1824632116,171978763-171978763-83528410,172001292-172001292-1589344544 Replicate_Do_Domain_Ids: Replicate_Ignore_Domain_Ids: Parallel_Mode: conservative SQL_Delay: 0 SQL_Remaining_Delay: NULL Slave_SQL_Running_State: Slave_DDL_Groups: 0 Slave_Non_Transactional_Groups: 0 Slave_Transactional_Groups: 63429 ops-monitoring-bot added a comment.Jan 24 2023, 11:25 AM Comment Actions Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1001 for host db1176.eqiad.wmnet with OS bullseye completed: db1176 (WARN*) Downtimed on Icinga/Alertmanager Disabled Puppet Removed from Puppet and PuppetDB if present Deleted any existing Puppet certificate Removed from Debmonitor if present Forced PXE for next reboot Host rebooted via IPMI Host up (Debian installer) Host up (new fresh bullseye OS) Generated Puppet certificate Signed new Puppet certificate Run Puppet in NOOP mode to populate exported resources in PuppetDB Found Nagios_host resource for this host in PuppetDB Downtimed the new host on Icinga/Alertmanager Removed previous downtime on Alertmanager (old OS) First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202301241054_marostegui_57917_db1176.out Checked BIOS boot parameters are back to normal configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production Rebooted Automatic Puppet run was successful Forced a re-check of all Icinga services for the host Icinga status is not optimal, downtime not removed* Updated Netbox data from PuppetDB Maintenance_bot removed a project: Patch-For-Review.Jan 24 2023, 11:31 AM • Marostegui closed this task as Resolved.Jan 24 2023, 4:22 PM Comment Actions This is done Maintenance_bot moved this task from In progress to Done on the DBA board.Jan 24 2023, 4:29 PM Log In to Comment Content licensed under Creative Commons Attribution-ShareAlike (CC BY-SA) 4.0 unless otherwise noted; code licensed under GNU General Public License (GPL) 2.0 or later and other open source licenses. By using this site, you agree to the Terms of Use, Privacy Policy, and Code of Conduct. · Wikimedia Foundation · Privacy Policy · Code of Conduct · Terms of Use · Disclaimer · CC-BY-SA · GPL

*************************** 1. row *************************** Slave_IO_State: Master_Host: db1195.eqiad.wmnet Master_User: repl Master_Port: 3306 Connect_Retry: 60 Master_Log_File: db1195-bin.001769 Read_Master_Log_Pos: 159907434 Relay_Log_File: db1117-relay-bin.000002 Relay_Log_Pos: 43866087 Relay_Master_Log_File: db1195-bin.001769 Slave_IO_Running: No Slave_SQL_Running: No Replicate_Do_DB: Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: Last_Errno: 0 Last_Error: Skip_Counter: 0 Exec_Master_Log_Pos: 159907434 Relay_Log_Space: 43866397 Until_Condition: None Until_Log_File: Until_Log_Pos: 0 Master_SSL_Allowed: Yes Master_SSL_CA_File: Master_SSL_CA_Path: Master_SSL_Cert: Master_SSL_Cipher: Master_SSL_Key: Seconds_Behind_Master: NULL Master_SSL_Verify_Server_Cert: No Last_IO_Errno: 0 Last_IO_Error: Last_SQL_Errno: 0 Last_SQL_Error: Replicate_Ignore_Server_Ids: Master_Server_Id: 172001292 Master_SSL_Crl: Master_SSL_Crlpath: Using_Gtid: Slave_Pos Gtid_IO_Pos: 0-171966484-2731336216,171966484-171966484-7582228474,171974733-171974733-2008457625,171966562-171966562-962004828,171970746-171970746-808478946,171966512-171966512-1959889139,171974884-171974884-9104192396,171966556-171966556-1824632116,171978763-171978763-83528410,172001292-172001292-1589344544 Replicate_Do_Domain_Ids: Replicate_Ignore_Domain_Ids: Parallel_Mode: conservative SQL_Delay: 0 SQL_Remaining_Delay: NULL Slave_SQL_Running_State: Slave_DDL_Groups: 0 Slave_Non_Transactional_Groups: 0 Slave_Transactional_Groups: 63429

Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1001 for host db1176.eqiad.wmnet with OS bullseye completed:

db1176 (WARN)
- Downtimed on Icinga/Alertmanager
- Disabled Puppet
- Removed from Puppet and PuppetDB if present
- Deleted any existing Puppet certificate
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via IPMI
- Host up (Debian installer)
- Host up (new fresh bullseye OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga/Alertmanager
- Removed previous downtime on Alertmanager (old OS)
- First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202301241054_marostegui_57917_db1176.out
- Checked BIOS boot parameters are back to normal
- configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
- Rebooted
- Automatic Puppet run was successful
- Forced a re-check of all Icinga services for the host
- Icinga status is not optimal, downtime not removed
- Updated Netbox data from PuppetDB

Maintenance_bot removed a project: Patch-For-Review.Jan 24 2023, 11:31 AM

This is done

Maintenance_bot moved this task from In progress to Done on the DBA board.Jan 24 2023, 4:29 PM

Move db1176 to m1Closed, ResolvedPublicActions

Description

Details

Related ObjectsSearch...

Event Timeline

Move db1176 to m1
Closed, ResolvedPublic
Actions

Related Objects
Search...