Page MenuHomePhabricator

additional graphite machines request, 1x per DC
Closed, ResolvedPublic

Description

We'll need to add one additional graphite machine per primary datacenter (codfw/eqiad) to expand the existing graphite deployment. We should also look into expanding the 600GB SSDs to 1TB SSDs. Specifications can be similar or greater than graphite[1-2]001. Those systems were purchased on RT 9105. Please note that the quote alone on that task isn't the full purchase, as we had to additionally purchase both Intel SSDs and drive sleds.

graphite[1-2]001 specifications:

  • Dual Intel(R) Xeon(R) CPU E5-2450 v2 @ 2.50GHz/8 cores
  • 64GB RAM
  • Quad Intel S3500 600GB SSDSC2BB60
  • 1GBE NIC

There are potential spare allocations in eqiad, but we lack the same options in codfw.

In EQIAD, the upcoming restbase1001-1006 will go into our spares pool at the resolution of T125842. However, when they do, they will lack SSDs in their 4 bays (but will have the sleds.) The pricing for these 4 SSDs can be viewed on the procurement task XXXXX.

Those restbase1001-1006 systems have the following:

  • Dual Intel® Xeon® Processor E5-2630 v3 2.4GHz/8cores,
  • 64 GB RAM,
  • 4 disk bays to house SSDs. (see above note regarding SSDs for these systems.)
  • QUAD Intel S3610 SSDs on procurement task T127361.
  • 1GB NIC

In CODFW, we don't have any in warranty spare systems that can house SSDs. The only spare in warranty systems are our recent dual cpu misc systems, but they use 4 * 4 TB SATA in LFF format. So this system for CODFW will need to be quoted out to the following:

  • Dual Intel® Xeon® Processor E5-2630 v3 2.4GHz/8cores,
  • 64 & 128GB RAM options
  • 4 * 4 Intel S3610 1TB SSD (both 600GB and 1TB options)
  • 1GB NIC

The above codfw quote request will be made for both Dell & HP system vendors on their respective tasks. These procurement tasks will all be blockers to this hardware-requests. The server quote could be used to place orders for both sites, if we didn't want to allocate one of the restbase1001-1006 upcoming spares to this.

Event Timeline

fgiunchedi claimed this task.
fgiunchedi raised the priority of this task from to Medium.
fgiunchedi updated the task description. (Show Details)
fgiunchedi added projects: SRE, hardware-requests.
fgiunchedi added subscribers: fgiunchedi, mark, Aklapper.

not sure why this got self-assigned, anyways up for grabs

Perhaps we can use one of restbase1001-1006 for this?

@RobH: can you see what's needed to move this forward?

I wasn't triaging it since it was assigned to the person who requested it; I assumed @fgiunchedi was holding onto it to add more info (bad assumption.) Sorry about that.

I'm working on this now. It will involve some task description updates to spell out the specifications on the existing systems, potential onsite spares, and then link into any procurement tasks for pricing/quotes.

RobH set Security to None.
RobH mentioned this in Unknown Object (Task).Feb 18 2016, 8:20 PM
RobH added a subtask: Unknown Object (Task).
RobH updated the task description. (Show Details)

IIRC the restbase systems we'll get as spares have 64GB ram and 10gbit ethernet, ram is fine for graphite and 10gbit ethernet overkill but it would work anyways of course

I'm assigning this task to @mark for his approval to allocate one of the upcoming 6 restbase spare systems (1001-1006) for this. Once approved, we can move on the sub-task for ssd purchase T127361. (Please assign back to @RobH at that time.)

This task will also soon have sub-tasks linked in for the codfw ordering of a system to meet this requirement.

mark removed mark as the assignee of this task.Mar 15 2016, 11:07 AM

Approved.

RobH mentioned this in Unknown Object (Task).Mar 15 2016, 5:01 PM
RobH added a subtask: Unknown Object (Task).

I've added T128910 as a blocker, as the codfw allocation will require one of these proposed spare pool systems.

RobH changed the task status from Open to Stalled.Mar 24 2016, 8:23 PM
RobH edited subtasks, added: Unknown Object (Task); removed: Unknown Object (Task).

Summary: The order of spare systems on T130743 in codfw will use one of these for this allocation. Additionally, the eqiad allocation will use one of the old resetbase1001-1006 systems being reclaimed on T130752.

I'm setting this task to stalled until those tasks are advanced.

mark closed subtask Unknown Object (Task) as Resolved.Apr 14 2016, 10:27 AM

resolving, as all new graphite hosts have setup tasks and have been allocated.

RobH closed subtask Unknown Object (Task) as Resolved.Oct 12 2016, 5:48 PM
RobH closed subtask Unknown Object (Task) as Resolved.