Page MenuHomePhabricator

Productionize db21[88-95]
Closed, ResolvedPublic

Description

These hosts are new:

reimageproductionizehostnamesection source host note
[x][x]db2188s1db2146repooled
[x][x]db2189s2db2175repooled
[x][x]db2190s3db2149repooled
[x][x]db2191x1db2131repooled
[x][x]db2192s5db2178repooled
[x][x]db2193s6db2180repooled
[x][x]db2194s7db2169repooled
[x][x]db2195s8db2181repooled

Patch for multi-instance done under this task

Details

SubjectRepoBranchLines +/-
operations/cookbooksmaster+1 K -95
operations/software/spicerackmaster+322 -9
operations/puppetproduction+1 -2
operations/puppetproduction+2 -1
operations/puppetproduction+1 -2
operations/puppetproduction+1 -1
operations/puppetproduction+1 -0
operations/puppetproduction+5 -9
operations/puppetproduction+10 -6
operations/puppetproduction+5 -7
operations/puppetproduction+2 -0
operations/puppetproduction+1 -5
operations/puppetproduction+6 -8
operations/puppetproduction+1 -2
operations/puppetproduction+3 -7
operations/puppetproduction+11 -13
operations/puppetproduction+6 -4
operations/puppetproduction+6 -4
operations/puppetproduction+6 -4
operations/puppetproduction+5 -2
operations/puppetproduction+1 -1
operations/puppetproduction+2 -4
operations/cookbooksmaster+1 -1
operations/puppetproduction+0 -1
operations/puppetproduction+8 -5
operations/puppetproduction+2 -1
operations/puppetproduction+3 -2
operations/cookbooksmaster+3 -1
operations/puppetproduction+8 -0
Show related patches Customize query in gerrit

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Change 976953 had a related patch set uploaded (by Arnaudb; author: Arnaudb):

[operations/puppet@production] mariadb: new host on S8

https://gerrit.wikimedia.org/r/976953

Change 976953 merged by Arnaudb:

[operations/puppet@production] mariadb: new host on S8

https://gerrit.wikimedia.org/r/976953

Mentioned in SAL (#wikimedia-operations) [2023-11-23T10:45:12Z] <arnaudb@cumin1001> START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2181.codfw.wmnet with reason: provisionning db2195.codfw.wmnet - T343674

Icinga downtime and Alertmanager silence (ID=7b8e73e7-63a7-4538-94c9-5f95e518e31c) set by arnaudb@cumin1001 for 1 day, 0:00:00 on 1 host(s) and their services with reason: provisionning db2195.codfw.wmnet - T343674

db2181.codfw.wmnet

Mentioned in SAL (#wikimedia-operations) [2023-11-23T10:45:38Z] <arnaudb@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2181.codfw.wmnet with reason: provisionning db2195.codfw.wmnet - T343674

Mentioned in SAL (#wikimedia-operations) [2023-11-23T10:45:42Z] <arnaudb@cumin1001> START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2195.codfw.wmnet with reason: provisionning db2195.codfw.wmnet - T343674

Icinga downtime and Alertmanager silence (ID=33ee920c-86dd-4bea-9b97-10cf56a4ecbf) set by arnaudb@cumin1001 for 1 day, 0:00:00 on 1 host(s) and their services with reason: provisionning db2195.codfw.wmnet - T343674

db2195.codfw.wmnet

Mentioned in SAL (#wikimedia-operations) [2023-11-23T10:45:56Z] <arnaudb@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2195.codfw.wmnet with reason: provisionning db2195.codfw.wmnet - T343674

Mentioned in SAL (#wikimedia-operations) [2023-11-23T10:47:25Z] <arnaudb@cumin1001> dbctl commit (dc=all): 'Cloning db2181 in db2195 for T343674', diff saved to https://phabricator.wikimedia.org/P53739 and previous config saved to /var/cache/conftool/dbconfig/20231123-104724-arnaudb.json

Change 976954 had a related patch set uploaded (by Arnaudb; author: Arnaudb):

[operations/puppet@production] mariadb: notification toggle on dbhosts

https://gerrit.wikimedia.org/r/976954

Change 976954 merged by Arnaudb:

[operations/puppet@production] mariadb: notification toggle on dbhosts

https://gerrit.wikimedia.org/r/976954

Mentioned in SAL (#wikimedia-operations) [2023-11-23T13:41:09Z] <arnaudb@cumin1001> START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2131.codfw.wmnet with reason: provisionning db2191.codfw.wmnet - T343674

Icinga downtime and Alertmanager silence (ID=f43b9f93-e532-491b-b9ec-940b42776b6b) set by arnaudb@cumin1001 for 1 day, 0:00:00 on 1 host(s) and their services with reason: provisionning db2191.codfw.wmnet - T343674

db2131.codfw.wmnet

Mentioned in SAL (#wikimedia-operations) [2023-11-23T13:41:24Z] <arnaudb@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2131.codfw.wmnet with reason: provisionning db2191.codfw.wmnet - T343674

Mentioned in SAL (#wikimedia-operations) [2023-11-23T13:41:31Z] <arnaudb@cumin1001> START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2191.codfw.wmnet with reason: provisionning db2191.codfw.wmnet - T343674

Icinga downtime and Alertmanager silence (ID=7c2310cd-204c-420e-8cb7-a6c3696681de) set by arnaudb@cumin1001 for 1 day, 0:00:00 on 1 host(s) and their services with reason: provisionning db2191.codfw.wmnet - T343674

db2191.codfw.wmnet

Mentioned in SAL (#wikimedia-operations) [2023-11-23T13:41:42Z] <arnaudb@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2191.codfw.wmnet with reason: provisionning db2191.codfw.wmnet - T343674

Mentioned in SAL (#wikimedia-operations) [2023-11-23T13:43:16Z] <arnaudb@cumin1001> dbctl commit (dc=all): 'Cloning db2131 in db2191 for T343674', diff saved to https://phabricator.wikimedia.org/P53766 and previous config saved to /var/cache/conftool/dbconfig/20231123-134316-arnaudb.json

ABran-WMF updated the task description. (Show Details)

Change 976962 had a related patch set uploaded (by Arnaudb; author: Arnaudb):

[operations/puppet@production] mariadb: toggle notifications on cloned hosts

https://gerrit.wikimedia.org/r/976962

Change 976962 merged by Arnaudb:

[operations/puppet@production] mariadb: toggle notifications on cloned hosts

https://gerrit.wikimedia.org/r/976962

Change 978652 had a related patch set uploaded (by Arnaudb; author: Arnaudb):

[operations/puppet@production] mariadb: toggle notifications

https://gerrit.wikimedia.org/r/978652

Change 978652 merged by Arnaudb:

[operations/puppet@production] mariadb: toggle notifications

https://gerrit.wikimedia.org/r/978652

Change 979946 had a related patch set uploaded (by Arnaudb; author: Arnaudb):

[operations/puppet@production] mariadb: add db2194 to multiinstance pool

https://gerrit.wikimedia.org/r/979946

Change 979946 merged by Arnaudb:

[operations/puppet@production] mariadb: add db2194 to multiinstance pool

https://gerrit.wikimedia.org/r/979946

Change 980984 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] mariadb: Remove package declaration

https://gerrit.wikimedia.org/r/980984

Change 980984 merged by Marostegui:

[operations/puppet@production] mariadb: Remove package declaration

https://gerrit.wikimedia.org/r/980984

Icinga downtime and Alertmanager silence (ID=b6032e35-b5fd-4e0f-abc0-37ca0f189387) set by arnaudb@cumin1001 for 20 days, 0:00:00 on 1 host(s) and their services with reason: production freeze will occur before cookbook is finished

db2194.codfw.wmnet
ABran-WMF changed the task status from Open to In Progress.Dec 20 2023, 3:43 PM

Mentioned in SAL (#wikimedia-operations) [2024-01-17T08:22:45Z] <arnaudb@cumin1001> START - Cookbook sre.hosts.downtime for 20 days, 0:00:00 on db2194.codfw.wmnet with reason: debugging something before T343674

Icinga downtime and Alertmanager silence (ID=28664b07-7979-4bc4-92c2-0d4a571ba56d) set by arnaudb@cumin1001 for 20 days, 0:00:00 on 1 host(s) and their services with reason: debugging something before T343674

db2194.codfw.wmnet

Mentioned in SAL (#wikimedia-operations) [2024-01-17T08:23:00Z] <arnaudb@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20 days, 0:00:00 on db2194.codfw.wmnet with reason: debugging something before T343674

Change 992651 had a related patch set uploaded (by Arnaudb; author: Arnaudb):

[operations/puppet@production] mariadb: preparing cloning db2169 to db2194

https://gerrit.wikimedia.org/r/992651

Change 992651 merged by Arnaudb:

[operations/puppet@production] mariadb: preparing cloning db2169 to db2194

https://gerrit.wikimedia.org/r/992651

Mentioned in SAL (#wikimedia-operations) [2024-01-26T10:25:51Z] <arnaudb@cumin1002> dbctl commit (dc=all): 'Cloning db2169 in db2194 for T343674', diff saved to https://phabricator.wikimedia.org/P55737 and previous config saved to /var/cache/conftool/dbconfig/20240126-102550-arnaudb.json

Mentioned in SAL (#wikimedia-operations) [2024-01-26T16:30:58Z] <arnaudb@cumin1002> dbctl commit (dc=all): 'Cloning db2169 in db2194 for T343674', diff saved to https://phabricator.wikimedia.org/P55740 and previous config saved to /var/cache/conftool/dbconfig/20240126-163057-arnaudb.json

Change 995365 had a related patch set uploaded (by Arnaudb; author: Arnaudb):

[operations/puppet@production] mariadb: pooling db1244 db1246, prepare db1235

https://gerrit.wikimedia.org/r/995365

Change 995365 merged by Arnaudb:

[operations/puppet@production] mariadb: pooling db1244 db1246, prepare db1235

https://gerrit.wikimedia.org/r/995365

Change 997490 had a related patch set uploaded (by Arnaudb; author: Arnaudb):

[operations/puppet@production] mariadb: will test converting instances

https://gerrit.wikimedia.org/r/997490

Change 997490 merged by Arnaudb:

[operations/puppet@production] mariadb: will test converting instances

https://gerrit.wikimedia.org/r/997490

Cookbook cookbooks.sre.hosts.reimage was started by arnaudb@cumin1002 for host db2194.codfw.wmnet with OS bookworm

Cookbook cookbooks.sre.hosts.reimage started by arnaudb@cumin1002 for host db2194.codfw.wmnet with OS bookworm completed:

  • db2194 (WARN)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202402061340_arnaudb_2445833_db2194.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB

Change 999015 had a related patch set uploaded (by Arnaudb; author: Arnaudb):

[operations/puppet@production] mariadb: revert db2194

https://gerrit.wikimedia.org/r/999015

Change 999015 merged by Arnaudb:

[operations/puppet@production] mariadb: revert db2194

https://gerrit.wikimedia.org/r/999015

Cookbook cookbooks.sre.hosts.reimage was started by arnaudb@cumin1002 for host db2194.codfw.wmnet with OS bookworm

Cookbook cookbooks.sre.hosts.reimage started by arnaudb@cumin1002 for host db2194.codfw.wmnet with OS bookworm completed:

  • db2194 (WARN)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202402090932_arnaudb_3073348_db2194.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB

Change 1000299 had a related patch set uploaded (by Arnaudb; author: Arnaudb):

[operations/puppet@production] mariadb: last test for new clone cookbook

https://gerrit.wikimedia.org/r/1000299

Change 1000299 abandoned by Arnaudb:

[operations/puppet@production] mariadb: last test for new clone cookbook

Reason:

not needed, will downtime instead

https://gerrit.wikimedia.org/r/1000299

Mentioned in SAL (#wikimedia-operations) [2024-02-12T10:19:51Z] <arnaudb@cumin1002> START - Cookbook sre.hosts.downtime for 6:00:00 on db2180.codfw.wmnet with reason: T343674 testing cloning a single instance node to a multi-instance one

Mentioned in SAL (#wikimedia-operations) [2024-02-12T10:20:05Z] <arnaudb@cumin1002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2180.codfw.wmnet with reason: T343674 testing cloning a single instance node to a multi-instance one

Mentioned in SAL (#wikimedia-operations) [2024-02-12T10:20:46Z] <arnaudb@cumin1002> dbctl commit (dc=all): 'Depool db2190 T343674', diff saved to https://phabricator.wikimedia.org/P56677 and previous config saved to /var/cache/conftool/dbconfig/20240212-102046-arnaudb.json

Cookbook cookbooks.sre.hosts.reimage was started by arnaudb@cumin1002 for host db2194.codfw.wmnet with OS bookworm

Change 1002410 had a related patch set uploaded (by Arnaudb; author: Arnaudb):

[operations/puppet@production] mariadb: disable systematic formatting of /srv

https://gerrit.wikimedia.org/r/1002410

Cookbook cookbooks.sre.hosts.reimage started by arnaudb@cumin1002 for host db2194.codfw.wmnet with OS bookworm completed:

  • db2194 (WARN)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202402121514_arnaudb_3707980_db2194.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB

Change 1002410 merged by Arnaudb:

[operations/puppet@production] mariadb: disable systematic wiping of /srv on db2194

https://gerrit.wikimedia.org/r/1002410

Change 1003864 had a related patch set uploaded (by Arnaudb; author: Arnaudb):

[operations/puppet@production] mariadb: toggle notifications on db2169

https://gerrit.wikimedia.org/r/1003864

Change 1003864 merged by Arnaudb:

[operations/puppet@production] mariadb: toggle notifications on db2169

https://gerrit.wikimedia.org/r/1003864

Change 1004690 had a related patch set uploaded (by Arnaudb; author: Arnaudb):

[operations/puppet@production] mariadb: fix db2194 instances.yaml

https://gerrit.wikimedia.org/r/1004690

Change 1004690 merged by Arnaudb:

[operations/puppet@production] mariadb: fix db2194 instances.yaml

https://gerrit.wikimedia.org/r/1004690

Mentioned in SAL (#wikimedia-operations) [2024-02-19T15:41:51Z] <arnaudb@cumin1002> dbctl commit (dc=all): 'T343674 - db2194 missing config', diff saved to https://phabricator.wikimedia.org/P57099 and previous config saved to /var/cache/conftool/dbconfig/20240219-154148-arnaudb.json

Change 1004691 had a related patch set uploaded (by Arnaudb; author: Arnaudb):

[operations/puppet@production] mariadb: toggle notifications for db2194

https://gerrit.wikimedia.org/r/1004691

Change 1004691 merged by Arnaudb:

[operations/puppet@production] mariadb: toggle notifications for db2194

https://gerrit.wikimedia.org/r/1004691

ABran-WMF updated the task description. (Show Details)

Change 1005531 had a related patch set uploaded (by Arnaudb; author: Arnaudb):

[operations/software/spicerack@master] mariadb: add some logic to allow instance conversion

https://gerrit.wikimedia.org/r/1005531

Change #1005531 merged by jenkins-bot:

[operations/software/spicerack@master] mariadb: rework mariadb_legacy

https://gerrit.wikimedia.org/r/1005531

Change #976709 abandoned by Arnaudb:

[operations/cookbooks@master] mariadb: cookbook draft to clone multiinstance

Reason:

the clone part is handled at https://gerrit.wikimedia.org/r/c/operations/cookbooks/+/1071155 and the reboot/restart sanitarium part: https://gerrit.wikimedia.org/r/c/operations/cookbooks/+/1063167

https://gerrit.wikimedia.org/r/976709