Page MenuHomePhabricator

request to assign wmf6937 (mw1298, former imagescaler) (now: wmf4727) as phab1002
Closed, ResolvedPublic

Description

In T190568 we want to upgrade Phabricator servers to stretch.

In order to upgrade the eqiad server, phab1001, we would have to take production Phabricator down because we can't failover to codfw. (That is blocked on lack of database cluster in codfw, T137928 is open)

To avoid downtime for Phab users and avoid risk that something goes wrong with the stretch upgrade, let's instead bring up phab1002 with stretch, test, switch over and then upgrade phab1001 and go back.

As suggested by Moritz on T190568#4230710 this is a request to take mw1298 and rename it since the hardware specs are pretty close.

Event Timeline

Change 435211 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/dns@master] rename wmf6937 from mw1298 to phab1002

https://gerrit.wikimedia.org/r/435211

Dzahn triaged this task as Medium priority.May 25 2018, 8:25 PM
RobH moved this task from Backlog to Pending Approval on the hardware-requests board.
RobH added subscribers: mark, RobH.

@mark: Is this something you would want to approve? If it was a permanent allocation, I know it would be. Since it is a temp allocation, I'm not certain.

Please advise.

@mark: Is this something you would want to approve? If it was a permanent allocation, I know it would be. Since it is a temp allocation, I'm not certain.

Please advise.

(yes)

I approve the usage of a temporary server for this migration. However I dislike the use of a server from the existing MediaWiki appserver cluster for this... Isn't there perhaps a spare that would work? I assume the hardware specs are not that critical here.

It has been suggested by Moritz because the hardware specs are quite similar to the existing phab server. ("The specs are roughly the same, phab1001 has a slightly more powerful CPU than mw1298, but both have 64 GB RAM and looking at Prometheus CPU usage is usually ~ 25% so that be fine.").

But yes, they are not that critical.

We don't have any spare systems in eqiad with 64GB of RAM.

WMF4727 has 32GB of RAM, dual 3GHz/4C and 4*4TB, so more than enough storage. Just not sure if the 32GB ram is acceptable.

However I dislike the use of a server from the existing MediaWiki appserver cluster for this...

It's just a former imagescaler, not taking one of the currently used appservers. It already uses the spare role. Does this make it a spare?

# Row B (B6)
node /^mw129[8]\.eqiad\.wmnet$/ {
    role(spare::system)

IRC sync update:

Ok, the non-mw spare pool system with 32GB for @Dzahn's use is WMF4727. I'll create the setup task now.

Dzahn renamed this task from request to assign wmf6937 (mw1298, former imagescaler) as phab1002 to request to assign wmf6937 (mw1298, former imagescaler) (now: wmf4727) as phab1002.May 30 2018, 10:19 PM

Change 435211 abandoned by Dzahn:
assign wmf4727 as phab1002

https://gerrit.wikimedia.org/r/435211

Vvjjkkii renamed this task from request to assign wmf6937 (mw1298, former imagescaler) (now: wmf4727) as phab1002 to c9baaaaaaa.Jul 1 2018, 1:07 AM
Vvjjkkii reopened this task as Open.
Vvjjkkii removed mark as the assignee of this task.
Vvjjkkii raised the priority of this task from Medium to High.
Vvjjkkii updated the task description. (Show Details)
Vvjjkkii removed a subscriber: gerritbot.
CommunityTechBot renamed this task from c9baaaaaaa to request to assign wmf6937 (mw1298, former imagescaler) (now: wmf4727) as phab1002.Jul 2 2018, 4:16 AM
CommunityTechBot closed this task as Resolved.
CommunityTechBot assigned this task to mark.
CommunityTechBot lowered the priority of this task from High to Medium.
CommunityTechBot updated the task description. (Show Details)
CommunityTechBot added a subscriber: gerritbot.
Dzahn removed mark as the assignee of this task.

@RobH We noticed this server has 32GB of RAM but phab1001 has 64GB RAM and Mukunda said that it won't be enough to have 32GB for more than a day before Phab would have to be restarted.

We would like to request to upgrade this server to 64GB RAM or use a different one that already has 64GB RAM. I am not sure which option is easier or makes things complicated. I checked netbox and the support expiry date for this system was in December 2018.

Should i reopen this or make a new one? Is it possible to upgrade at all? Thanks!

Just assigning since i have questions for you.

Ok, I reviewed this in IRC with @Dzahn and have the following action items:

  • we dont upgrade memory on existing systems, and this host is about to exit its warranty coverage next month (so not really worth doing a lot of upgrades to it, with over half its lifespan gone.)
  • create sub-task to track the decommission of wmf6937 as phab1002 and then reimage it as mw1298. also link this new sub-task to T192457
  • create sub-task for hardware request of new spare pool system in eqiad for phab1002 use.

T215332 and T215335 filed as followup, resolving this task.

Change 496116 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] set phab1002 as a spare::system

https://gerrit.wikimedia.org/r/496116

Change 496116 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] set phab1002 as a spare::system

https://gerrit.wikimedia.org/r/496116

Change 496116 abandoned by Dzahn:
set phab1002 as a spare::system

Reason:
duplicate, done in https://gerrit.wikimedia.org/r/c/operations/puppet/ /504959

https://gerrit.wikimedia.org/r/496116