CRITICAL slave_sql_state Slave_SQL_Running: No, Errno: 1062, Errmsg: Error 'Duplicate entry '275496282' for key 'PRIMARY'' on query. Default database: 'frwiki'. [Query snipped]
Description
Details
operations/software : master | s6.hosts: Decommission db1022 |
operations/mediawiki-config : master | db-codfw,db-eqiad.php: Decommission db1022 |
operations/puppet : production | mariadb: Get ready to decomission db1022 |
Status | Assigned | Task | ||
---|---|---|---|---|
Resolved | Joe | T154658 Prepare and improve the datacenter switchover procedure | ||
Resolved | None | T155099 Database maintenance scheduled while eqiad datacenter is non primary (after the DC switchover) | ||
Resolved | None | T134476 Decommission old coredb machines (<=db1050) | ||
Resolved | None | T162699 Decomissions old s2 eqiad hosts (db1018, db1021, db1024, db1036) | ||
Resolved | jcrespo | T162133 Replace some masters in eqiad while it is not active | ||
Resolved | Cmjohnson | T163778 Decommission db1022 (Was: db1022 broke while changing topology on s6- evaluate if to fix or directly decommission) | ||
Resolved | jcrespo | T154485 run pt-table-checksum on s2 (WAS: run pt-table-checksum before decommissioning db1015, db1035,db1044,db1038) | ||
Resolved | Cmjohnson | T164702 Decommission db1024 | ||
Resolved | Cmjohnson | T176215 decommission db1018 | ||
Resolved | Cmjohnson | T176311 decommission db1036 |
Event Timeline
Change 351813 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] mariadb: Get ready to decomission db1022
Change 351814 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/mediawiki-config@master] db-codfw,db-eqiad.php: Decommission db1022
Change 351815 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/software@master] s6.hosts: Decommission db1022
Change 351813 merged by Marostegui:
[operations/puppet@production] mariadb: Get ready to decomission db1022
Change 351814 merged by jenkins-bot:
[operations/mediawiki-config@master] db-codfw,db-eqiad.php: Decommission db1022
Mentioned in SAL (#wikimedia-operations) [2017-05-04T11:36:56Z] <marostegui@naos> Synchronized wmf-config/db-codfw.php: Remove db1022 from config files as it will be decommissioned - T163778 (duration: 01m 25s)
Mentioned in SAL (#wikimedia-operations) [2017-05-04T11:38:36Z] <marostegui@naos> Synchronized wmf-config/db-eqiad.php: Remove db1022 from config files as it will be decommissioned - T163778 (duration: 01m 06s)
Change 351815 merged by jenkins-bot:
[operations/software@master] s6.hosts: Decommission db1022
The host is ready to be decommissioned.
What I have done:
Removed it from prometheus, added it as a spare on site.pp and removed it from dhcp list: https://gerrit.wikimedia.org/r/#/c/351813
Removed it from php config files: https://gerrit.wikimedia.org/r/#/c/351814/
Removed it from the software repo: https://gerrit.wikimedia.org/r/#/c/351815/
MySQL is stopped.
@Cmjohnson the host is all yours now.
Thanks!
This is probably not fully decomissioned yet (dns, puppet, etc.), but I am going to try to remove it from puppet and icinga so it doesn't create garbage on alerts.
Actually, I cannot do all the steps (network changes) without coordinating with DC ops. I do not have any blocker on this getting fully decommed soon, but I would like to remove it from icinga. Let me know if removing it from puppet and icinga only (db1022 and db1023) sounds good enough for you, and dns and network ports can be done later?
db1022 is no longer. removed from site.pp, all dns is removed, killed in puppet, changed racktables, wiped disks, removed from rack
JFTR: The host was still showing up in puppetdb (e.g. via https://servermon.wikimedia.org/hosts/). I ran "puppet node deactivate db1022.eqiad.wmnet" on puppetmaster1001, that should properly remove it.