Page MenuHomePhabricator

Wikidata Query Service hardware
Closed, ResolvedPublic

Description

For a deployment of the upcoming wikidata query service we'll need some hardware.

Initially, we probably need one work machine and one spare for disaster recovery (full recovery loading from dump takes several days, so unless we want to live the life of danger, we'd need a hot spare). Exact specs TBD soon.

Event Timeline

GWicke raised the priority of this task from to Medium.
GWicke updated the task description. (Show Details)

Do we want to have the Cassandra and Titan nodes be in the same rack as I assume that query performance is very latency sensitive?

Joe claimed this task.Jan 12 2015, 7:37 PM
Joe set Security to None.
GWicke added a comment.EditedJan 12 2015, 7:47 PM

Do we want to have the Cassandra and Titan nodes be in the same rack as I assume that query performance is very latency sensitive?

I don't think that there is a huge enough difference in latency between rows that would make it worth compromising availability. Cassandra nodes are spread out across racks (rows in our case) in order to improve availability by avoiding correlated failures that affect a single row. I think we should do the same for the Titan nodes.

mark added a subscriber: mark.Jan 15 2015, 2:04 PM

Different rows for availability is one thing. We need to think about how to distribute the service to two different data centers as well, and should build this out in both from the start. Latency certainly is a significant factor there.

faidon changed the task status from Open to Stalled.Feb 9 2015, 10:59 AM
faidon added a subscriber: faidon.

Based on what we heard regarding Titan/WQS lately, I think we can safely put this on hold and mark it as stalled, correct?

Joe added a comment.Feb 9 2015, 11:01 AM

Yes, I was waiting for this evening's WQS meeting before reassessing priority/status, but marking it stalled is fair.

Restricted Application added a project: Discovery. · View Herald TranscriptJul 2 2015, 5:04 PM
Restricted Application added a subscriber: Matanya. · View Herald Transcript
Smalyshev changed the task status from Stalled to Open.Jul 6 2015, 5:15 PM
Smalyshev updated the task description. (Show Details)
Smalyshev removed subscribers: Eloquence, GWicke.
Smalyshev moved this task from Needs triage to WDQS on the Discovery board.Jul 7 2015, 8:55 PM

Based on T104879, I think what we need is:

  • 64G memory
  • 300 G SSD
  • 4-8 cores with 2.5 GHz min

For starters, 2 servers.

RobH added a subscriber: RobH.
RobH added a comment.Jul 15 2015, 4:33 PM

Update from IRC discussion:

We'll be allocating two systems for this:

Dell PowerEdge R420, Dual Intel Xeon E5-2440, 64GB Memory, Dual 300GB SSD, H310 Mini Raid Card

wmf3543 - currently in d3-eqiad, and staying there
wmf3544 - currently in d3-eqiad, and relocating into another row

As such, this task will have a blocker to completion, being the relocation of the second host and its network setup, etc...

Joe added a comment.Jul 20 2015, 7:30 AM

Imaging wmf3543 as wdqs1001

Change 225848 had a related patch set uploaded (by Giuseppe Lavagetto):
wdqs: install as jessie, add partman recipe

https://gerrit.wikimedia.org/r/225848

Change 225848 merged by Giuseppe Lavagetto:
wdqs: install as jessie, add partman recipe

https://gerrit.wikimedia.org/r/225848

Joe added a comment.Jul 31 2015, 10:36 AM

I am currently installing wdqs1001; upon validation of the install, I'll add wdqs1002 as well.

wdqs1001 installed just fine (after I figured out I needed at least one deploy for trebuchet to work).

Installing wdqs1002 now, then I'll set up the LVS config.

Joe closed this task as Resolved.Jul 31 2015, 4:54 PM
Joe added a comment.Jul 31 2015, 4:56 PM

Both servers are up and running and the wdqs-blazegraph service is running and the banner page shows on port 80.

I will finish this work (adding LVS and eventually varnish support).

Stas and James can now log into the machine and play with it - it's still not exposed to the public internet