Page MenuHomePhabricator

Wikidata Query Service hardware
Closed, ResolvedPublic

Description

For a deployment of the upcoming wikidata query service we'll need some hardware.

Initially, we probably need one work machine and one spare for disaster recovery (full recovery loading from dump takes several days, so unless we want to live the life of danger, we'd need a hot spare). Exact specs TBD soon.

Event Timeline

GWicke raised the priority of this task from to Medium.
GWicke updated the task description. (Show Details)

Do we want to have the Cassandra and Titan nodes be in the same rack as I assume that query performance is very latency sensitive?

Joe set Security to None.

Do we want to have the Cassandra and Titan nodes be in the same rack as I assume that query performance is very latency sensitive?

I don't think that there is a huge enough difference in latency between rows that would make it worth compromising availability. Cassandra nodes are spread out across racks (rows in our case) in order to improve availability by avoiding correlated failures that affect a single row. I think we should do the same for the Titan nodes.

Different rows for availability is one thing. We need to think about how to distribute the service to two different data centers as well, and should build this out in both from the start. Latency certainly is a significant factor there.

faidon changed the task status from Open to Stalled.Feb 9 2015, 10:59 AM
faidon subscribed.

Based on what we heard regarding Titan/WQS lately, I think we can safely put this on hold and mark it as stalled, correct?

Yes, I was waiting for this evening's WQS meeting before reassessing priority/status, but marking it stalled is fair.

Restricted Application added a subscriber: Matanya. · View Herald Transcript
Smalyshev changed the task status from Stalled to Open.Jul 6 2015, 5:15 PM
Smalyshev updated the task description. (Show Details)
Smalyshev removed subscribers: Eloquence, GWicke.

Based on T104879, I think what we need is:

  • 64G memory
  • 300 G SSD
  • 4-8 cores with 2.5 GHz min

For starters, 2 servers.

Update from IRC discussion:

We'll be allocating two systems for this:

Dell PowerEdge R420, Dual Intel Xeon E5-2440, 64GB Memory, Dual 300GB SSD, H310 Mini Raid Card

wmf3543 - currently in d3-eqiad, and staying there
wmf3544 - currently in d3-eqiad, and relocating into another row

As such, this task will have a blocker to completion, being the relocation of the second host and its network setup, etc...

Imaging wmf3543 as wdqs1001

Change 225848 had a related patch set uploaded (by Giuseppe Lavagetto):
wdqs: install as jessie, add partman recipe

https://gerrit.wikimedia.org/r/225848

Change 225848 merged by Giuseppe Lavagetto:
wdqs: install as jessie, add partman recipe

https://gerrit.wikimedia.org/r/225848

I am currently installing wdqs1001; upon validation of the install, I'll add wdqs1002 as well.

wdqs1001 installed just fine (after I figured out I needed at least one deploy for trebuchet to work).

Installing wdqs1002 now, then I'll set up the LVS config.

Both servers are up and running and the wdqs-blazegraph service is running and the banner page shows on port 80.

I will finish this work (adding LVS and eventually varnish support).

Stas and James can now log into the machine and play with it - it's still not exposed to the public internet