Page MenuHomePhabricator

Productionize db2196-db2220
Closed, ResolvedPublic

Description

These hosts will replace db2[096-120]

  • db2196
  • db2197 - backup source
  • db2198 - backup source
  • db2199 - backup source
  • db2200 - backup source
  • db2201 - backup source
  • db2202 - test-s1
  • db2203
  • db2204
  • db2205
  • db2206
  • db2207 - repooling
  • db2208
  • db2209
  • db2210
  • db2211
  • db2212
  • db2213
  • db2214
  • db2215
  • db2216
  • db2217
  • db2218
  • db2219
  • db2220

Details

SubjectRepoBranchLines +/-
operations/puppetproduction+2 -2
operations/puppetproduction+7 -6
operations/puppetproduction+0 -3
operations/puppetproduction+0 -3
operations/puppetproduction+1 -2
operations/puppetproduction+8 -8
operations/puppetproduction+0 -1
operations/puppetproduction+1 -0
operations/puppetproduction+1 -2
operations/puppetproduction+0 -1
operations/puppetproduction+3 -4
operations/puppetproduction+0 -1
operations/puppetproduction+0 -3
operations/puppetproduction+1 -1
operations/puppetproduction+0 -1
operations/puppetproduction+1 -0
operations/puppetproduction+0 -3
operations/puppetproduction+0 -3
operations/puppetproduction+0 -1
operations/puppetproduction+0 -2
operations/puppetproduction+0 -1
operations/puppetproduction+0 -6
operations/puppetproduction+0 -1
operations/puppetproduction+95 -42
operations/puppetproduction+4 -3
operations/puppetproduction+1 -0
operations/puppetproduction+5 -3
Show related patches Customize query in gerrit

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

@Marostegui I strumbled upon:

$ sudo dbctl --scope codfw instance db2115 get
{
    "db2115": {
        "host_ip": "10.192.32.134",
        "note": "",
        "port": 3306,
        "sections": {
            "x1": {
                "candidate_master": false,  #### this
                "percentage": 100,
                "pooled": false,
                "weight": 100
            }
        }
    },
    "tags": "datacenter=codfw"
}

should I toggle candidate_master for db2215? (its replacement)

Change #1015694 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] db2215: Clarify status

https://gerrit.wikimedia.org/r/1015694

Change #1015694 merged by Marostegui:

[operations/puppet@production] db2215: Clarify status

https://gerrit.wikimedia.org/r/1015694

Icinga downtime and Alertmanager silence (ID=eaedc8ac-a960-4479-a0a5-721ae11647fb) set by arnaudb@cumin1002 for 1 day, 0:00:00 on 1 host(s) and their services with reason: provisionning db2215.codfw.wmnet - T355422

db2115.codfw.wmnet

Icinga downtime and Alertmanager silence (ID=6002b358-65a1-4b7e-8e15-c436d8e9735b) set by arnaudb@cumin1002 for 1 day, 0:00:00 on 1 host(s) and their services with reason: provisionning db2215.codfw.wmnet - T355422

db2215.codfw.wmnet

Change #1016366 had a related patch set uploaded (by Arnaudb; author: Arnaudb):

[operations/puppet@production] mariadb: toggle notifications for db2215

https://gerrit.wikimedia.org/r/1016366

Change #1016367 had a related patch set uploaded (by Arnaudb; author: Arnaudb):

[operations/puppet@production] mariadb: toggle notifications for db2214, db2219, db2220

https://gerrit.wikimedia.org/r/1016367

Change #1016367 merged by Arnaudb:

[operations/puppet@production] mariadb: toggle notifications for db2214, db2219, db2220

https://gerrit.wikimedia.org/r/1016367

Change #1016366 merged by Arnaudb:

[operations/puppet@production] mariadb: toggle notifications for db2215

https://gerrit.wikimedia.org/r/1016366

andrea.denisse subscribed.

Hello, we've been receiving several email alerts for this host since March 27th. I opened T361604 to track it.

I've started to provision db2198 now due to T361037.

Mentioned in SAL (#wikimedia-operations) [2024-04-04T12:08:26Z] <arnaudb@cumin1002> START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2107.codfw.wmnet with reason: provisionning db2207.codfw.wmnet - T355422

Icinga downtime and Alertmanager silence (ID=d4b4922d-2c7e-4e6c-bd87-17e66dd0f4f7) set by arnaudb@cumin1002 for 1 day, 0:00:00 on 1 host(s) and their services with reason: provisionning db2207.codfw.wmnet - T355422

db2107.codfw.wmnet

Mentioned in SAL (#wikimedia-operations) [2024-04-04T12:08:40Z] <arnaudb@cumin1002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2107.codfw.wmnet with reason: provisionning db2207.codfw.wmnet - T355422

Mentioned in SAL (#wikimedia-operations) [2024-04-04T12:08:44Z] <arnaudb@cumin1002> START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2207.codfw.wmnet with reason: provisionning db2207.codfw.wmnet - T355422

Icinga downtime and Alertmanager silence (ID=c35f5aae-9045-4a1e-ac6d-762cce62236d) set by arnaudb@cumin1002 for 1 day, 0:00:00 on 1 host(s) and their services with reason: provisionning db2207.codfw.wmnet - T355422

db2207.codfw.wmnet

Mentioned in SAL (#wikimedia-operations) [2024-04-04T12:08:46Z] <arnaudb@cumin1002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2207.codfw.wmnet with reason: provisionning db2207.codfw.wmnet - T355422

Mentioned in SAL (#wikimedia-operations) [2024-04-04T12:10:09Z] <arnaudb@cumin1002> dbctl commit (dc=all): 'Cloning db2107 in db2207 for T355422', diff saved to https://phabricator.wikimedia.org/P59460 and previous config saved to /var/cache/conftool/dbconfig/20240404-121008-arnaudb.json

Mentioned in SAL (#wikimedia-operations) [2024-04-04T12:15:59Z] <arnaudb@cumin1002> START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2113.codfw.wmnet with reason: provisionning db2213.codfw.wmnet - T355422

Icinga downtime and Alertmanager silence (ID=5a04a524-b95d-42ea-ba51-0f8074dcca50) set by arnaudb@cumin1002 for 1 day, 0:00:00 on 1 host(s) and their services with reason: provisionning db2213.codfw.wmnet - T355422

db2113.codfw.wmnet

Mentioned in SAL (#wikimedia-operations) [2024-04-04T12:16:13Z] <arnaudb@cumin1002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2113.codfw.wmnet with reason: provisionning db2213.codfw.wmnet - T355422

Mentioned in SAL (#wikimedia-operations) [2024-04-04T12:16:16Z] <arnaudb@cumin1002> START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2213.codfw.wmnet with reason: provisionning db2213.codfw.wmnet - T355422

Icinga downtime and Alertmanager silence (ID=2531689f-1ab7-4fb3-ba77-68a6fbfa7aba) set by arnaudb@cumin1002 for 1 day, 0:00:00 on 1 host(s) and their services with reason: provisionning db2213.codfw.wmnet - T355422

db2213.codfw.wmnet

Mentioned in SAL (#wikimedia-operations) [2024-04-04T12:16:30Z] <arnaudb@cumin1002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2213.codfw.wmnet with reason: provisionning db2213.codfw.wmnet - T355422

Mentioned in SAL (#wikimedia-operations) [2024-04-04T12:17:23Z] <arnaudb@cumin1002> dbctl commit (dc=all): 'Cloning db2113 in db2213 for T355422', diff saved to https://phabricator.wikimedia.org/P59463 and previous config saved to /var/cache/conftool/dbconfig/20240404-121722-arnaudb.json

Change #1017057 had a related patch set uploaded (by Jcrespo; author: Jcrespo):

[operations/puppet@production] mariadb: Reenable notifications for backup source host db2198

https://gerrit.wikimedia.org/r/1017057

ABran-WMF updated the task description. (Show Details)

Change #1017066 had a related patch set uploaded (by Arnaudb; author: Arnaudb):

[operations/puppet@production] mariadb: toggle notifications and roles for db2213 db2207

https://gerrit.wikimedia.org/r/1017066

Change #1017066 merged by Arnaudb:

[operations/puppet@production] mariadb: toggle notifications and roles for db2213 db2207

https://gerrit.wikimedia.org/r/1017066

Change #1017057 merged by Jcrespo:

[operations/puppet@production] mariadb: Reenable notifications for backup source host db2198

https://gerrit.wikimedia.org/r/1017057

Mentioned in SAL (#wikimedia-operations) [2024-04-08T07:31:53Z] <arnaudb@cumin1002> START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2114.codfw.wmnet with reason: provisionning db2214.codfw.wmnet - T355422

Icinga downtime and Alertmanager silence (ID=cdf540e8-8a44-45ef-976b-d7b1df46d8ae) set by arnaudb@cumin1002 for 1 day, 0:00:00 on 1 host(s) and their services with reason: provisionning db2214.codfw.wmnet - T355422

db2114.codfw.wmnet

Mentioned in SAL (#wikimedia-operations) [2024-04-08T07:32:07Z] <arnaudb@cumin1002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2114.codfw.wmnet with reason: provisionning db2214.codfw.wmnet - T355422

Icinga downtime and Alertmanager silence (ID=a48cdb9b-8d48-4a1a-a09a-48fbd596e0eb) set by arnaudb@cumin1002 for 1 day, 0:00:00 on 1 host(s) and their services with reason: provisionning db2214.codfw.wmnet - T355422

db2214.codfw.wmnet

Mentioned in SAL (#wikimedia-operations) [2024-04-08T07:32:11Z] <arnaudb@cumin1002> START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2214.codfw.wmnet with reason: provisionning db2214.codfw.wmnet - T355422

Mentioned in SAL (#wikimedia-operations) [2024-04-08T07:32:16Z] <arnaudb@cumin1002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2214.codfw.wmnet with reason: provisionning db2214.codfw.wmnet - T355422

Mentioned in SAL (#wikimedia-operations) [2024-04-08T07:32:40Z] <arnaudb@cumin1002> dbctl commit (dc=all): 'Cloning db2114 in db2214 for T355422', diff saved to https://phabricator.wikimedia.org/P59780 and previous config saved to /var/cache/conftool/dbconfig/20240408-073239-arnaudb.json

Change #1017779 had a related patch set uploaded (by Jcrespo; author: Jcrespo):

[operations/puppet@production] mariadb: Reenable notifications for db2197

https://gerrit.wikimedia.org/r/1017779

Change #1017457 had a related patch set uploaded (by Arnaudb; author: Arnaudb):

[operations/puppet@production] mariadb: prepare future new candidate master for s1

https://gerrit.wikimedia.org/r/1017457

Change #1017457 merged by Kormat:

[operations/puppet@production] mariadb: prepare future new candidate master for s1

https://gerrit.wikimedia.org/r/1017457

Change #1017458 had a related patch set uploaded (by Arnaudb; author: Arnaudb):

[operations/puppet@production] mariadb: revert profile::monitoring::notifications_enabled: false

https://gerrit.wikimedia.org/r/1017458

Change #1017458 merged by Arnaudb:

[operations/puppet@production] mariadb: revert profile::monitoring::notifications_enabled: false

https://gerrit.wikimedia.org/r/1017458

Change #1017779 merged by Jcrespo:

[operations/puppet@production] mariadb: Reenable notifications for db2197

https://gerrit.wikimedia.org/r/1017779

Mentioned in SAL (#wikimedia-operations) [2024-04-09T11:40:46Z] <arnaudb@cumin1002> START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2112.codfw.wmnet with reason: provisionning db2212.codfw.wmnet - T355422

Icinga downtime and Alertmanager silence (ID=187aac8c-2351-4eeb-bf95-859748dd81b1) set by arnaudb@cumin1002 for 1 day, 0:00:00 on 1 host(s) and their services with reason: provisionning db2212.codfw.wmnet - T355422

db2112.codfw.wmnet

Mentioned in SAL (#wikimedia-operations) [2024-04-09T11:41:00Z] <arnaudb@cumin1002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2112.codfw.wmnet with reason: provisionning db2212.codfw.wmnet - T355422

Mentioned in SAL (#wikimedia-operations) [2024-04-09T11:41:03Z] <arnaudb@cumin1002> START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2212.codfw.wmnet with reason: provisionning db2212.codfw.wmnet - T355422

Icinga downtime and Alertmanager silence (ID=fd2fb113-d629-44cc-aa8f-58372b0e3c67) set by arnaudb@cumin1002 for 1 day, 0:00:00 on 1 host(s) and their services with reason: provisionning db2212.codfw.wmnet - T355422

db2212.codfw.wmnet

Mentioned in SAL (#wikimedia-operations) [2024-04-09T11:41:17Z] <arnaudb@cumin1002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2212.codfw.wmnet with reason: provisionning db2212.codfw.wmnet - T355422

Mentioned in SAL (#wikimedia-operations) [2024-04-09T11:43:03Z] <arnaudb@cumin1002> dbctl commit (dc=all): 'Cloning db2112 in db2212 for T355422', diff saved to https://phabricator.wikimedia.org/P60053 and previous config saved to /var/cache/conftool/dbconfig/20240409-114302-arnaudb.json

Change #1018247 had a related patch set uploaded (by Jcrespo; author: Jcrespo):

[operations/puppet@production] mariadb: Migrate db2097 backups to db2197

https://gerrit.wikimedia.org/r/1018247

Change #1018276 had a related patch set uploaded (by Jcrespo; author: Jcrespo):

[operations/puppet@production] mariadb: Migrate db2098 backups to db2198 and upgrade dbprov2002 to 10.6

https://gerrit.wikimedia.org/r/1018276

Change #1018407 had a related patch set uploaded (by Arnaudb; author: Arnaudb):

[operations/puppet@production] mariadb: toggle notifications for db2212

https://gerrit.wikimedia.org/r/1018407

Change #1018407 merged by Arnaudb:

[operations/puppet@production] mariadb: toggle notifications for db2212

https://gerrit.wikimedia.org/r/1018407

remaining node has been pooled, @jcrespo please let me know if I can help you for the remaining ones :-)

Thanks, db2199 and db2200 are almost finished (currently catching up and about to add them to tendril zarcillo and later reenable notifications).

The other 2 go next.

Data loading should be finished this week, it is the service migration what may take some time, as I may use the soon-to-be-decommissioned hosts to help with the 10.6 upgrade/reimage progress.

Change #1018247 merged by Jcrespo:

[operations/puppet@production] mariadb: Migrate db2097 backups to db2197

https://gerrit.wikimedia.org/r/1018247

Change #1018744 had a related patch set uploaded (by Jcrespo; author: Jcrespo):

[operations/puppet@production] mariadb: Reenable notifications for db2199, db2200 after setup

https://gerrit.wikimedia.org/r/1018744

Thanks, db2199 and db2200 are almost finished (currently catching up and about to add them to tendril zarcillo and later reenable notifications).

The other 2 go next.

Data loading should be finished this week, it is the service migration what may take some time, as I may use the soon-to-be-decommissioned hosts to help with the 10.6 upgrade/reimage progress.

No rush here, just add a comment to those you want to be able to keep a bit longer on T358741

Change #1018744 merged by Jcrespo:

[operations/puppet@production] mariadb: Reenable notifications for db2199, db2200 after setup

https://gerrit.wikimedia.org/r/1018744

Change #1019077 had a related patch set uploaded (by Jcrespo; author: Jcrespo):

[operations/puppet@production] mariadb: Reenable notifications for db2201 & db2202

https://gerrit.wikimedia.org/r/1019077

Change #1019077 merged by Jcrespo:

[operations/puppet@production] mariadb: Reenable notifications for db2201 & db2202

https://gerrit.wikimedia.org/r/1019077

jcrespo added a project: database-backups.

This is now done, although it depends on the definition of productionize- as some of the backup sources have the exact same data and config than the original ones, but have not yet taken over the service, and some backups still use the old hosts.

thanks @jcrespo! I'm OK to wait for the full service takeover

I think we can resolve this and track that at T358741, as long as everybody is aware.

Change #1019816 had a related patch set uploaded (by Jcrespo; author: Jcrespo):

[operations/puppet@production] installserver: Setup db and dbprov hosts back to reuse recipe

https://gerrit.wikimedia.org/r/1019816

Change #1019816 merged by Jcrespo:

[operations/puppet@production] installserver: Setup db and dbprov hosts back to reuse recipe

https://gerrit.wikimedia.org/r/1019816