Page MenuHomePhabricator

Initial production deployment of the IDM
Closed, ResolvedPublic

Description

Productionising the IDM:

  • Create Ganeti instances and apply the Puppet roles
  • Setup monitoring for basic availability of the end point

Related Objects

StatusSubtypeAssignedTask
OpenNone
OpenNone
OpenNone
OpenSLyngshede-WMF
ResolvedNone
OpenNone
ResolvedMarostegui
ResolvedAndrew
ResolvedMarostegui
ResolvedAndrew
DeclinedAndrew
ResolvedAndrew
ResolvedAndrew
ResolvedLadsgroup
DuplicateNone
Resolved Bstorm
DeclinedNone
Resolved taavi
ResolvedJdforrester-WMF
DeclinedNone
Openjijiki
OpenNone
OpenFeatureNone
StalledFeatureNone
OpenFeatureSLyngshede-WMF
OpenNone
OpenAndrew
OpenSLyngshede-WMF
ResolvedABran-WMF
Resolved taavi
OpenNone
In ProgressSLyngshede-WMF
ResolvedPRODUCTION ERRORTgr
OpenNone
Resolvedbd808
Resolvedyuvipanda
Resolvedbd808
Resolvedbd808
Resolvedbd808
Open taavi
Resolved taavi
DeclinedNone
OpenNone
ResolvedSLyngshede-WMF
ResolvedSLyngshede-WMF
OpenNone
Open taavi

Event Timeline

Dzahn renamed this task from Initial production deployment to Initial production deployment of the IDM.Nov 4 2022, 7:41 PM

Cookbook cookbooks.sre.ganeti.reimage was started by slyngshede@cumin1001 for host idm1001.wikimedia.org with OS bullseye

Cookbook cookbooks.sre.ganeti.reimage started by slyngshede@cumin1001 for host idm1001.wikimedia.org with OS bullseye completed:

  • idm1001 (PASS)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via gnt-instance
    • Host up (Debian installer)
    • Set boot to disk
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/ganeti/reimage/202302161020_slyngshede_2708067_idm1001.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed

Cookbook cookbooks.sre.ganeti.reimage was started by slyngshede@cumin1001 for host idm2001.wikimedia.org with OS bullseye

Cookbook cookbooks.sre.ganeti.reimage started by slyngshede@cumin1001 for host idm2001.wikimedia.org with OS bullseye completed:

  • idm2001 (PASS)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via gnt-instance
    • Host up (Debian installer)
    • Set boot to disk
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/ganeti/reimage/202302171000_slyngshede_2978437_idm2001.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed

Change 890801 had a related patch set uploaded (by Slyngshede; author: Slyngshede):

[operations/puppet@production] C:idm::deployment use Redis password

https://gerrit.wikimedia.org/r/890801

Change 890801 merged by Slyngshede:

[operations/puppet@production] C:idm::deployment use Redis password

https://gerrit.wikimedia.org/r/890801

Change 890815 had a related patch set uploaded (by Muehlenhoff; author: Muehlenhoff):

[operations/puppet@production] Set ferm access for redis

https://gerrit.wikimedia.org/r/890815

Change 890815 merged by Slyngshede:

[operations/puppet@production] Set ferm access for redis

https://gerrit.wikimedia.org/r/890815

Change 891318 had a related patch set uploaded (by Muehlenhoff; author: Muehlenhoff):

[operations/puppet@production] idm::jobs: Adapt auto restart to only run of idm-rq is active/present

https://gerrit.wikimedia.org/r/891318

Change 891318 merged by Muehlenhoff:

[operations/puppet@production] idm::jobs: Adapt auto restart to only run of idm-rq is active/present

https://gerrit.wikimedia.org/r/891318

Mentioned in SAL (#wikimedia-operations) [2023-02-27T14:33:02Z] <jbond@cumin2002> START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on idm2001.wikimedia.org with reason: host still been configuered - T320797

Mentioned in SAL (#wikimedia-operations) [2023-02-27T14:33:06Z] <jbond@cumin2002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on idm2001.wikimedia.org with reason: host still been configuered - T320797

Mentioned in SAL (#wikimedia-operations) [2023-02-27T14:33:13Z] <jbond@cumin2002> START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on idm2001.wikimedia.org with reason: host still been configuered - T320797

Mentioned in SAL (#wikimedia-operations) [2023-02-27T14:33:19Z] <jbond@cumin2002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on idm2001.wikimedia.org with reason: host still been configuered - T320797

Change 896112 had a related patch set uploaded (by Muehlenhoff; author: Muehlenhoff):

[operations/puppet@production] Add Cumin aliases for IDM

https://gerrit.wikimedia.org/r/896112

Change 896112 merged by Muehlenhoff:

[operations/puppet@production] Add Cumin aliases for IDM

https://gerrit.wikimedia.org/r/896112