Page MenuHomePhabricator

request to assign spare systems as terbium equivalent
Closed, ResolvedPublic

Description

We want to convert the Mediawiki maintenance servers to stretch. ( T192092 )

We can't just take down terbium and upgrade it in place.

So per Service Operations meeting we need to spin up a replacement maintenance server using stretch and have it running in parallel for a while to replace terbium and ultimately decom terbium.

Checking racktables i see there is an unassigned server in the same rack, right above terbium, WMF 3565.

https://racktables.wikimedia.org/index.php?page=object&object_id=1775

They are both Dell PowerEdge R420 and have the same purchase date. (Though, one of them says they are out of warranty and the other doesn't. )(?)

Requesting to assign WMF3565 as the terbium replacement.

Event Timeline

Dzahn updated the task description. (Show Details)
Dzahn added a subscriber: Joe.

The name would be nihonium, element 113.

Dzahn triaged this task as High priority.Apr 17 2018, 11:54 PM

WMF3565 is > 5 years old, so there's really no point in setting hardware that old right now.

How urgent is this task? We have a task open for procuring new hardware for (among other servers) terbium (T189317), but we could figure out a different solution if this needs to happen sooner than that.

Dzahn removed Dzahn as the assignee of this task.EditedApr 18 2018, 12:38 AM

I think it kind of blocks the "never use PHP5" / "switch to PHP7" migration which also affects appservers and deployment servers. In the last Service Operations meeting it was brought up that we need to replace terbium as part of that. But let me confirm the urgency with @Joe

The name would be nihonium, element 113.

I'd rather use a functional name here, e.g. mwmaint1001.eqiad.wmnet. These hosts have a functional equivalent in codfw using using a functional name is more obvious and at some point I also would to work on adding a mwmaint1002/2002 standin (e.g. in Ganeti) which we can easily switch to in case maintenance on the primary script runner (reboots of terbium are currently really painful to coordinate and implement)

WMF3565 is > 5 years old, so there's really no point in setting hardware that old right now.

How urgent is this task? We have a task open for procuring new hardware for (among other servers) terbium (T189317), but we could figure out a different solution if this needs to happen sooner than that.

We also have the two former image scalers which are under warranty (and even more powerful than terbium). The plan is to allocate them as mw* hosts later on (T192457), but we could just as well repurpose one of them as the stretch-based standin for terbium (since we need to have both in parallel for a small time window).

WMF3565 is > 5 years old, so there's really no point in setting hardware that old right now.

How urgent is this task? We have a task open for procuring new hardware for (among other servers) terbium (T189317), but we could figure out a different solution if this needs to happen sooner than that.

We also have the two former image scalers which are under warranty (and even more powerful than terbium). The plan is to allocate them as mw* hosts later on (T192457), but we could just as well repurpose one of them as the stretch-based standin for terbium (since we need to have both in parallel for a small time window).

I fully agree, let's do that.

Let's just use both of them to also set up the stand-in that you mentioned above?

(approved)

Dzahn lowered the priority of this task from High to Medium.

Change 430518 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/dns@master] rename wmf6936 from mw1297 to mwmaint1001

https://gerrit.wikimedia.org/r/430518

Change 430519 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] rename mw1297 to mwmaint1001, assign mw-maint role

https://gerrit.wikimedia.org/r/430519

RobH renamed this task from request to assign WMF3565 as terbium equivalent to request to assign spare systems as terbium equivalent.May 3 2018, 3:07 PM

Change 430519 merged by Dzahn:
[operations/puppet@production] rename mw1297 to mwmaint1001, partman for mwmaint*

https://gerrit.wikimedia.org/r/430519

Change 430518 merged by Dzahn:
[operations/dns@master] rename wmf6936 from mw1297 to mwmaint1001

https://gerrit.wikimedia.org/r/430518

Mentioned in SAL (#wikimedia-operations) [2018-05-03T19:18:42Z] <mutante> mw1297 - puppet node clean, puppet node deactivate - renaming to mwmaint1001 (T192185)

wmf6936 (mw1297) assigned and renamed to mwmaint1001

i renamed in racktables and left a comment there too

racktables object 3003

https://racktables.wikimedia.org/index.php?page=object&tab=edit&object_id=3003

from here setup will continue on T192092

@RobH I think this hardware request can be closed as resolved, unless there is another place where you are keeping track of those?

Change 465689 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/dns@master] Revert "rename wmf6936 from mw1297 to mwmaint1001"

https://gerrit.wikimedia.org/r/465689

Change 465689 abandoned by Dzahn:
Revert "rename wmf6936 from mw1297 to mwmaint1001"

Reason:
cant rebase cleanly and for some reason "fatal: Couldn't find remote ref refs/changes/89/465689/2" for me right now

https://gerrit.wikimedia.org/r/465689

Change 465689 restored by Dzahn:
Revert "rename wmf6936 from mw1297 to mwmaint1001"

https://gerrit.wikimedia.org/r/465689

Change 466773 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] re-add mw1297 to site.pp and DHCP, formerly mwmaint1001

https://gerrit.wikimedia.org/r/466773

Change 466773 merged by Dzahn:
[operations/puppet@production] re-add mw1297 to site.pp and DHCP, remove mwmaint1001

https://gerrit.wikimedia.org/r/466773

Change 465689 merged by Dzahn:
[operations/dns@master] Revert "rename wmf6936 from mw1297 to mwmaint1001"

https://gerrit.wikimedia.org/r/465689