Page MenuHomePhabricator

Estimate hardware requirements for WDQS upgrade
Closed, ResolvedPublic

Description

In order to be able to integrate WDQS with more on-wiki data sources and visualisations, @Smalyshev informs me that some scaling up of the hardware is required. He estimated that we needed one extra node. This task is for us to put that estimate together so it can be passed on to Katie Horn.

Hardware estimate:

Hardware should be similar to the current wdqs200[12] configuration:

CPU: dual Intel(R) Xeon(R) CPU E5-2620 v3
Disk: 800GB raw raided space SSD (we could make do with 400GB)
RAM: 96GB

number of servers:
eqiad: 1
codfw: 1 (we might be able to do without the codfw server in the short term if that is an issue)

Related Objects

StatusSubtypeAssignedTask
ResolvedDeskana
ResolvedCmjohnson
ResolvedGehel
ResolvedRobH

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Side note: at this point, the need to increase hardware is more for availability than for scalability.

Constraints:

  • We want to be able to continue operations in the case where we loose a datacenter, including normal maintenance operations.
  • Each data center is serving traffic in a completely isolated manner, users are sent to a specific datacenter based on network proximity. The implication here is that we don't want to rely on the secondary datacenter for the reliability of the main cluster.
  • WDQS sometimes need full data reload, where a server being reloaded needs to be taken out of traffic for multiple days. At the moment, our clusters are composed of 2 nodes, which means that during maintenance, we are running on a single node.

Goal:

  • Increase the size of both eqiad and codfw WDQS clusters to 3 nodes (adding 1 node to each cluster)

Notes:

  • If budget is an issue, adding a node to eqiad should be the priority. That way, we can operate the eqiad cluster comfortably and in case we need to switch to codfw we accept to be in a degraded situation where major maintenance operations might be delayed. This is only a transitory solution. Once we have active/active clusters, codfw will need to be operable just as much as eqiad.
  • This request is similar to T139482

Sizing:

Hardware should be similar to the current wdqs200[12] configuration:

CPU: dual Intel(R) Xeon(R) CPU E5-2620 v3
Disk: 800GB raw raided space SSD
RAM: 96GB

number of servers:
eqiad: 1
codfw: 1 (we might be able to do without the codfw server in the short term if that is an issue)

@Deskana I reassign this to you. Let me know if you need more details or if you want me to move forward and open a hardware request for this.

Note: I don't think we need 800G of diskspace there. Somewhere around 400 G would be enough for at least some time.

@Gehel Thanks! I'll ask Katie where she's at with this.

RobH created subtask Unknown Object (Task).Oct 27 2016, 7:04 PM
RobH changed the task status from Open to Stalled.Oct 31 2016, 8:14 PM
RobH added subscribers: K4-713, mark, RobH.

I received an IRC notice from Mark to start working on this, from an out of band coversation between @mark and @K4-713.

I've created the sub-tasks in the procurement S4 space, due to private pricing data. All quotes will need to be reviewed on sub-task T149351.

As it seems that is all the info needed on this, I'm going to set it to stalled, while quote selection is in progress on the sub-task.

mark closed subtask Unknown Object (Task) as Resolved.Dec 13 2016, 1:41 PM
RobH reopened subtask Unknown Object (Task) as Open.Dec 14 2016, 7:16 PM
Cmjohnson closed subtask Unknown Object (Task) as Resolved.Dec 14 2016, 7:16 PM

As both wdqs expansion systems have been ordered and have setup tasks, I'm resolving this hardware-request.