Page MenuHomePhabricator

rdb101[56] implementation tracking
Closed, ResolvedPublic

Description

rdb101[56] implementation tracking

This task is to track the service implementation of serviceops host(s) rdb101[56]

Pair 2

PortDBUsagePoC/tag
63782, 3Netbox tasks (db 2) and Netbox caching (db 3) - *(rdb1015 only)netbox Infrastructure-Foundations
63790changeprop / cpjobqueue MW-Interfaces-Team
63791rest-gatewayMW-Interfaces-Team
63800redioscope MW-Interfaces-Team
63800Ratelimit MW-Interfaces-Team
63810filebackend.php (redisLockManager)MediaWiki-Platform-Team
63820docker-registry ServiceOps new

IP addresses

HostnameIPv4IPv6
rdb1011.eqiad.wmnet 10.64.0.362620:0:861:101:10:64:0:36
rdb1012.eqiad.wmnet 10.64.48.492620:0:861:107:10:64:48:49
rdb1015.eqiad.wmnet10.64.0.92620:0:861:101:10:64:0:9
rdb1016.eqiad.wmnet10.64.133.62620:0:861:10c:10:64:133:6

Event Timeline

Raine changed the task status from Open to Stalled.Mar 4 2026, 12:30 PM
Raine triaged this task as Medium priority.
Raine subscribed.

Un-stall when racking task done.

MLechvien-WMF changed the task status from Stalled to Open.Fri, May 22, 10:34 AM
MLechvien-WMF subscribed.

This is unstalled now and ready to be picked up

I will see if I can work on it next week

Change #1299455 had a related patch set uploaded (by Effie Mouzeli; author: Effie Mouzeli):

[operations/puppet@production] site.pp: reimage rdb1015 and rdb1016 as redis servers

https://gerrit.wikimedia.org/r/1299455

Change #1299458 had a related patch set uploaded (by Effie Mouzeli; author: Effie Mouzeli):

[operations/deployment-charts@master] changeprop: switch to rdb1015 (staging)

https://gerrit.wikimedia.org/r/1299458

Change #1299459 had a related patch set uploaded (by Effie Mouzeli; author: Effie Mouzeli):

[operations/deployment-charts@master] redioscope: switch to rdb1015

https://gerrit.wikimedia.org/r/1299459

Change #1299460 had a related patch set uploaded (by Effie Mouzeli; author: Effie Mouzeli):

[operations/deployment-charts@master] changeprop: switch to rdb1015 (staging)

https://gerrit.wikimedia.org/r/1299460

Change #1299462 had a related patch set uploaded (by Effie Mouzeli; author: Effie Mouzeli):

[operations/deployment-charts@master] changeprop-jobqueue: switch to rdb1015

https://gerrit.wikimedia.org/r/1299462

Change #1299455 merged by Effie Mouzeli:

[operations/puppet@production] site.pp: reimage rdb1015 and rdb1016 as redis servers

https://gerrit.wikimedia.org/r/1299455

Change #1299464 had a related patch set uploaded (by Effie Mouzeli; author: Effie Mouzeli):

[operations/deployment-charts@master] ratelimit: switch to rdb1015

https://gerrit.wikimedia.org/r/1299464

Change #1299466 had a related patch set uploaded (by Effie Mouzeli; author: Effie Mouzeli):

[operations/deployment-charts@master] rest-gateway: switch to rdb1015

https://gerrit.wikimedia.org/r/1299466

Change #1299467 had a related patch set uploaded (by Effie Mouzeli; author: Effie Mouzeli):

[operations/puppet@production] docker-registry: switch to rdb1015

https://gerrit.wikimedia.org/r/1299467

Change #1299468 had a related patch set uploaded (by Effie Mouzeli; author: Effie Mouzeli):

[operations/mediawiki-config@master] ProductionServices.php: switch filebackend.php to rdb2015:6381

https://gerrit.wikimedia.org/r/1299468

Change #1299470 had a related patch set uploaded (by Effie Mouzeli; author: Effie Mouzeli):

[operations/deployment-charts@master] mediawiki-common: add rdb1015 rdb1016

https://gerrit.wikimedia.org/r/1299470

Cookbook cookbooks.sre.hosts.reimage was started by jiji@cumin1003 for host rdb1015.eqiad.wmnet with OS trixie

Cookbook cookbooks.sre.hosts.reimage was started by jiji@cumin1003 for host rdb1016.eqiad.wmnet with OS trixie

Change #1299458 abandoned by Effie Mouzeli:

[operations/deployment-charts@master] changeprop: switch to rdb1015 (staging)

https://gerrit.wikimedia.org/r/1299458

Cookbook cookbooks.sre.hosts.reimage started by jiji@cumin1003 for host rdb1016.eqiad.wmnet with OS trixie completed:

  • rdb1016 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Host successfully migrated to the new VLAN
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced UEFI HTTP Boot for next reboot
    • Host rebooted via Redfish
    • Host up (Debian installer)
    • Host up (new fresh trixie OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202606091240_jiji_2357571_rdb1016.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage started by jiji@cumin1003 for host rdb1015.eqiad.wmnet with OS trixie completed:

  • rdb1015 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Host successfully migrated to the new VLAN
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced UEFI HTTP Boot for next reboot
    • Host rebooted via Redfish
    • Host up (Debian installer)
    • Host up (new fresh trixie OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202606091245_jiji_2356908_rdb1015.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Change #1299519 had a related patch set uploaded (by Effie Mouzeli; author: Effie Mouzeli):

[operations/puppet@production] netbox: switch to rdb1015

https://gerrit.wikimedia.org/r/1299519

Change #1299519 merged by Effie Mouzeli:

[operations/puppet@production] netbox: switch to rdb1015

https://gerrit.wikimedia.org/r/1299519

Change #1299470 merged by jenkins-bot:

[operations/deployment-charts@master] mediawiki-common: add rdb1015 rdb1016 #1

https://gerrit.wikimedia.org/r/1299470

Change #1299468 merged by jenkins-bot:

[operations/mediawiki-config@master] ProductionServices.php: switch filebackend.php to rdb2015:6381 #2

https://gerrit.wikimedia.org/r/1299468

Mentioned in SAL (#wikimedia-operations) [2026-06-09T15:25:25Z] <jiji@deploy1003> Started scap sync-world: Backport for [[gerrit:1299468|ProductionServices.php: switch filebackend.php to rdb2015:6381 #2 (T418918 T291916)]]

Mentioned in SAL (#wikimedia-operations) [2026-06-09T15:32:47Z] <jiji@deploy1003> Finished scap sync-world: Backport for [[gerrit:1299468|ProductionServices.php: switch filebackend.php to rdb2015:6381 #2 (T418918 T291916)]] (duration: 07m 21s)

Change #1299460 merged by jenkins-bot:

[operations/deployment-charts@master] changeprop: switch to rdb1015 #4

https://gerrit.wikimedia.org/r/1299460

Change #1299462 merged by jenkins-bot:

[operations/deployment-charts@master] changeprop-jobqueue: switch to rdb1015 #5

https://gerrit.wikimedia.org/r/1299462

Change #1299464 merged by jenkins-bot:

[operations/deployment-charts@master] ratelimit: switch to rdb1015 #6

https://gerrit.wikimedia.org/r/1299464

Change #1299459 merged by jenkins-bot:

[operations/deployment-charts@master] redioscope: switch to rdb1015 #7

https://gerrit.wikimedia.org/r/1299459

Change #1299466 merged by jenkins-bot:

[operations/deployment-charts@master] rest-gateway: switch to rdb1015 #8

https://gerrit.wikimedia.org/r/1299466

Change #1299467 merged by Effie Mouzeli:

[operations/puppet@production] docker-registry: switch to rdb1015 #3

https://gerrit.wikimedia.org/r/1299467

Cookbook cookbooks.sre.hosts.reimage was started by jiji@cumin1003 for host rdb1014.eqiad.wmnet with OS trixie

Cookbook cookbooks.sre.hosts.reimage started by jiji@cumin1003 for host rdb1014.eqiad.wmnet with OS trixie completed:

  • rdb1014 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Host successfully migrated to the new VLAN
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh trixie OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202606101245_jiji_2797735_rdb1014.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
jijiki updated the task description. (Show Details)