Reallocate former image scalers
Open, HighPublic

Description

We have two hosts in eqiad (mw1297 and mw1298) and four hosts in codfw (mw2150, mw2151, mw2244 and mw2245) which were formerly used as image scalers. When the current HHVM/stretch migration (and ideally the merge of job runners/video scalers) is completed, we can repurpose them for other mw* roles (and since they are currently unused, maybe also use the opportunity to move them to other racks if that helps balancing rows).

  • mw1297 reinstalled
  • mw1297 reallocated
  • mw1298 reinstalled
  • mw1298 reallocated
  • mw2151 reinstalled (was jessie unlike others)
  • mw2151 reallocated
  • mw2244 reinstalled
  • mw2244 reallocated
  • mw2245 reinstalled
  • mw2245 reallocated
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptApr 18 2018, 2:45 PM
RobH triaged this task as High priority.May 1 2018, 2:36 PM

Change 430518 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/dns@master] rename wmf6936 from mw1297 to mwmaint1001

https://gerrit.wikimedia.org/r/430518

Change 430518 merged by Dzahn:
[operations/dns@master] rename wmf6936 from mw1297 to mwmaint1001

https://gerrit.wikimedia.org/r/430518

Dzahn claimed this task.Oct 10 2018, 3:51 PM

mwmaint1001 should be reinstalled as mw1297 and go back into the pool.

but this is after https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/461492/ (and https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/465645/)

Change 465685 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] mariadb: remove mwmaint1001 from prod-m5 SQL grants

https://gerrit.wikimedia.org/r/465685

Change 465686 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] network::constants: remove mwmaint1001

https://gerrit.wikimedia.org/r/465686

Change 465689 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/dns@master] Revert "rename wmf6936 from mw1297 to mwmaint1001"

https://gerrit.wikimedia.org/r/465689

Change 465689 abandoned by Dzahn:
Revert "rename wmf6936 from mw1297 to mwmaint1001"

Reason:
cant rebase cleanly and for some reason "fatal: Couldn't find remote ref refs/changes/89/465689/2" for me right now

https://gerrit.wikimedia.org/r/465689

Change 465689 restored by Dzahn:
Revert "rename wmf6936 from mw1297 to mwmaint1001"

https://gerrit.wikimedia.org/r/465689

Change 466773 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] re-add mw1297 to site.pp and DHCP, formerly mwmaint1001

https://gerrit.wikimedia.org/r/466773

Change 466773 merged by Dzahn:
[operations/puppet@production] re-add mw1297 to site.pp and DHCP, remove mwmaint1001

https://gerrit.wikimedia.org/r/466773

Mentioned in SAL (#wikimedia-operations) [2018-10-11T22:30:02Z] <mutante> mwmaint1001 - shutting down after final backup of /home, renaming back to mw1297 in DNS and DHCP, and reinstalling (T192457)

Change 465689 merged by Dzahn:
[operations/dns@master] Revert "rename wmf6936 from mw1297 to mwmaint1001"

https://gerrit.wikimedia.org/r/465689

Mentioned in SAL (#wikimedia-operations) [2018-10-11T22:50:35Z] <mutante> netbox - renamed mwmaint1001 to mw1279, changed status to inventory, renamed in DNS - T192457

Mentioned in SAL (#wikimedia-operations) [2018-10-11T22:53:17Z] <mutante> netbox - correction, mwmaint1001 to status "Staged", following new lifecycle docs T192457

Script wmf-auto-reimage was launched by dzahn on neodymium.eqiad.wmnet for hosts:

['mw1297.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201810112309_dzahn_14010.log.

Script wmf-auto-reimage was launched by dzahn on neodymium.eqiad.wmnet for hosts:

['mw1297.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201810112318_dzahn_16644.log.

mw1297: done, renamed in DNS/DHCP, reinstalled, in Icinga again, renamed in netbox, changed netbox status to "Staged" per new lifecycle docs

https://icinga.wikimedia.org/cgi-bin/icinga/status.cgi?host=mw1297

https://netbox.wikimedia.org/dcim/devices/653/

[mw1297:~] $ uptime
23:51:06 up 1 min,

Dzahn updated the task description. (Show Details)Oct 11 2018, 11:52 PM

Completed auto-reimage of hosts:

['mw1297.eqiad.wmnet']

and were ALL successful.

Dzahn updated the task description. (Show Details)Oct 11 2018, 11:53 PM

Change 465686 merged by Dzahn:
[operations/puppet@production] network::constants: remove mwmaint1001

https://gerrit.wikimedia.org/r/465686

Change 466947 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] DHCP: fix mwmaint1001 -> mw1297 fixed address

https://gerrit.wikimedia.org/r/466947

Change 466947 merged by Dzahn:
[operations/puppet@production] DHCP: fix mwmaint1001 -> mw1297 fixed address

https://gerrit.wikimedia.org/r/466947

Mentioned in SAL (#wikimedia-operations) [2018-10-16T08:42:03Z] <moritzm> removed mwmaint1001 from debmonitor (T192457)

Change 465685 merged by Marostegui:
[operations/puppet@production] mariadb: remove mwmaint1001 from prod-m5 SQL grants

https://gerrit.wikimedia.org/r/465685

Dzahn added a comment.Oct 27 2018, 1:09 AM

@Joe I think you have a preference already what these should be used for, right?

Dzahn reassigned this task from Dzahn to Joe.Oct 27 2018, 1:13 AM
Dzahn added a subscriber: Dzahn.

I had this to get former mwmaint1001 back into the "spare" pool. That is done. Happy to also help reinstalling the others but you know which role you wanted them for. Feel free to assign back after commenting.

jijiki added a subscriber: jijiki.Nov 2 2018, 1:44 PM

Script wmf-auto-reimage was launched by dzahn on cumin1001.eqiad.wmnet for hosts:

['mw2151.codfw.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201901092334_dzahn_58052.log.

Mentioned in SAL (#wikimedia-operations) [2019-01-09T23:39:58Z] <mutante> mw2151 - change netbox status from active to staged - it's not actually active, it's role(spare) and was jessie (T192457)

Completed auto-reimage of hosts:

['mw2151.codfw.wmnet']

and were ALL successful.

Dzahn updated the task description. (Show Details)Thu, Jan 10, 5:41 PM

Change 483476 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] site: add mw2151 as another jobrunner host

https://gerrit.wikimedia.org/r/483476

Script wmf-auto-reimage was launched by dzahn on cumin1001.eqiad.wmnet for hosts:

['mw1298.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201901102157_dzahn_89266.log.

Completed auto-reimage of hosts:

['mw1298.eqiad.wmnet']

and were ALL successful.

Script wmf-auto-reimage was launched by dzahn on cumin1001.eqiad.wmnet for hosts:

['mw2244.codfw.wmnet', 'mw2245.codfw.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201901102307_dzahn_106668.log.

Completed auto-reimage of hosts:

['mw2244.codfw.wmnet', 'mw2245.codfw.wmnet']

and were ALL successful.

Dzahn updated the task description. (Show Details)Fri, Jan 11, 12:12 AM