Page MenuHomePhabricator

Site: eqiad | hardware request for a dedicated stat analytics host for the Research team
Closed, DeclinedPublic

Description

During the last Analytics/Research offsite we discussed about the possibility of having a separate host for the Research team, able to host only their workloads (since all the stat boxes are currently very busy).

Another thing that we'd like to finally make it work is the GPU installed on stat1005 (context in T148843), that for a lot of reason (Drivers, etc..) has not been productionized yet. In order to make it work we'd need to experiment drivers and various settings, that involves reboots and invasive action on the host, something that we can't afford now on stat1005 since a lot of users rely on it on a daily basis. So the idea would be, if feasible for you guys, to move the GPU from stat1005 to the new stat box (stat1007?) and test the GPU on it with the Research team's help.

We don't have a clear idea about the hardware specs for this new host, so I'd say that keeping stat1005/6 as baseline is ok.

Event Timeline

elukey created this task.May 31 2018, 3:25 PM
Restricted Application added a project: Operations. · View Herald TranscriptMay 31 2018, 3:25 PM
Nuria triaged this task as Normal priority.May 31 2018, 4:33 PM
Nuria moved this task from Incoming to Operational Excellence on the Analytics board.
RobH assigned this task to elukey.Jun 1 2018, 6:04 PM
RobH added a subscriber: RobH.

We CANNOT move the GPU between hosts. It is in that chassis (stat1005), specifically ordered to house it. It cannot move into another host. It has to live in that host, due to both warranty and chassis formfactor reasons. The GPU simply won't fit into any other chassis but this singular R730. We have no other R730s.

@elukey: I'd suggest we modify this task to replace the stat1005 services used by users onto a new host, and perhaps this request be for that new host?

Assigning to you for feedback.

That is not a bad idea. Although moving folks between stat boxes is not the easiest thing to do... :)

RobH added a comment.EditedJun 3 2018, 4:39 PM

That is not a bad idea. Although moving folks between stat boxes is not the easiest thing to do... :)

Understood, however there isn't another way to give you a GPU enabled host without buying another GPU enabled host new. That is quite expensive, compared to buying a non-gpu enabled server to house the non-GPU duties of stat1005.

Edit addition: Unfortunately rack mount servers aren't as easily customized after the fact, when the aftermarket GPUs tend to require a full height/length bays and added power requirements. So we had to purchase the R730 specifically to house the GPU.

Would Analytics like to create a new task with the specifications needed for a new stat box to migrate user services from stat1005? We'll then decline this task in favor of that one. (Can just update this task, but it already has a discussion about the GPU machine stat1005 and GPU movement, either works though.)

elukey closed this task as Declined.Jun 4 2018, 10:24 AM

Sure we can decline and start another one. For the specs we don't have specific requirements, anything similar to stat1005/6 would be enough. I think that moving people away from stat1005 might be tricky but I completely get your point Rob, and also I'd really love to see that GPU running and crunching data eventually (rather than sitting there getting dust :)).

I'll open a new task asap!

elukey mentioned this in Unknown Object (Task).Jun 4 2018, 10:27 AM
Vvjjkkii renamed this task from Site: eqiad | hardware request for a dedicated stat analytics host for the Research team to nwbaaaaaaa.Jul 1 2018, 1:06 AM
Vvjjkkii reopened this task as Open.
Vvjjkkii removed elukey as the assignee of this task.
Vvjjkkii raised the priority of this task from Normal to High.
Vvjjkkii updated the task description. (Show Details)
CommunityTechBot renamed this task from nwbaaaaaaa to Site: eqiad | hardware request for a dedicated stat analytics host for the Research team.Jul 1 2018, 3:19 PM
CommunityTechBot closed this task as Declined.
CommunityTechBot assigned this task to elukey.
CommunityTechBot raised the priority of this task from High to Needs Triage.
CommunityTechBot updated the task description. (Show Details)
CommunityTechBot triaged this task as Normal priority.Jul 3 2018, 2:24 AM