Page MenuHomePhabricator

Estimate hardware requirements for WDQS upgrade
Closed, ResolvedPublic

Description

In order to be able to integrate WDQS with more on-wiki data sources and visualisations, @Smalyshev informs me that some scaling up of the hardware is required. He estimated that we needed one extra node. This task is for us to put that estimate together so it can be passed on to Katie Horn.

Hardware estimate:

Hardware should be similar to the current wdqs200[12] configuration:

CPU: dual Intel(R) Xeon(R) CPU E5-2620 v3
Disk: 800GB raw raided space SSD (we could make do with 400GB)
RAM: 96GB

number of servers:
eqiad: 1
codfw: 1 (we might be able to do without the codfw server in the short term if that is an issue)

Related Objects

StatusAssignedTask
ResolvedDeskana
ResolvedCmjohnson
ResolvedGehel
ResolvedRobH

Event Timeline

Deskana created this task.Oct 20 2016, 3:27 PM
Restricted Application added a project: Wikidata. · View Herald TranscriptOct 20 2016, 3:27 PM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript
Deskana added a subscriber: Gehel.Oct 20 2016, 3:28 PM
Addshore moved this task from incoming to monitoring on the Wikidata board.Oct 22 2016, 2:40 PM
Gehel claimed this task.Oct 25 2016, 1:51 PM

Side note: at this point, the need to increase hardware is more for availability than for scalability.

Constraints:

  • We want to be able to continue operations in the case where we loose a datacenter, including normal maintenance operations.
  • Each data center is serving traffic in a completely isolated manner, users are sent to a specific datacenter based on network proximity. The implication here is that we don't want to rely on the secondary datacenter for the reliability of the main cluster.
  • WDQS sometimes need full data reload, where a server being reloaded needs to be taken out of traffic for multiple days. At the moment, our clusters are composed of 2 nodes, which means that during maintenance, we are running on a single node.

Goal:

  • Increase the size of both eqiad and codfw WDQS clusters to 3 nodes (adding 1 node to each cluster)

Notes:

  • If budget is an issue, adding a node to eqiad should be the priority. That way, we can operate the eqiad cluster comfortably and in case we need to switch to codfw we accept to be in a degraded situation where major maintenance operations might be delayed. This is only a transitory solution. Once we have active/active clusters, codfw will need to be operable just as much as eqiad.
  • This request is similar to T139482

Sizing:

Hardware should be similar to the current wdqs200[12] configuration:

CPU: dual Intel(R) Xeon(R) CPU E5-2620 v3
Disk: 800GB raw raided space SSD
RAM: 96GB

number of servers:
eqiad: 1
codfw: 1 (we might be able to do without the codfw server in the short term if that is an issue)

Gehel reassigned this task from Gehel to Deskana.Oct 25 2016, 1:52 PM

@Deskana I reassign this to you. Let me know if you need more details or if you want me to move forward and open a hardware request for this.

Note: I don't think we need 800G of diskspace there. Somewhere around 400 G would be enough for at least some time.

MaxSem moved this task from Needs triage to Ops on the Discovery board.Oct 26 2016, 9:26 PM
Deskana updated the task description. (Show Details)Oct 27 2016, 4:17 PM

@Gehel Thanks! I'll ask Katie where she's at with this.

Restricted Application added a project: Operations. · View Herald TranscriptOct 27 2016, 5:53 PM
RobH created subtask Unknown Object (Task).Oct 27 2016, 7:04 PM
RobH changed the task status from Open to Stalled.Oct 31 2016, 8:14 PM
RobH added subscribers: K4-713, mark, RobH.

I received an IRC notice from Mark to start working on this, from an out of band coversation between @mark and @K4-713.

I've created the sub-tasks in the procurement S4 space, due to private pricing data. All quotes will need to be reviewed on sub-task T149351.

As it seems that is all the info needed on this, I'm going to set it to stalled, while quote selection is in progress on the sub-task.

Smalyshev reopened this task as Stalled.Nov 1 2016, 1:17 AM
OakleyAlways1 closed this task as Invalid.Nov 1 2016, 1:18 AM

Really stop hacking me

Smalyshev reopened this task as Stalled.Nov 1 2016, 1:21 AM
mark closed subtask Unknown Object (Task) as Resolved.Dec 13 2016, 1:41 PM
RobH reopened subtask Unknown Object (Task) as Open.Dec 14 2016, 7:16 PM
Cmjohnson closed subtask Unknown Object (Task) as Resolved.Dec 14 2016, 7:16 PM
RobH closed this task as Resolved.Dec 16 2016, 5:37 PM

As both wdqs expansion systems have been ordered and have setup tasks, I'm resolving this hardware-request.