Page MenuHomePhabricator

BlazeGraph Finalization: Scale out plans
Closed, ResolvedPublic

Description

BlazeGraph doesn't support clustering - only high availability. So we need to think more up front about scaling out to more users and more data. Is federated queries an answer?

Event Timeline

Manybubbles raised the priority of this task from to Medium.
Manybubbles updated the task description. (Show Details)

One option might be to keep the "truthy" dump on nice SSDs and make sure those are super duper fast but to support the more reified forms only on other machines. Maybe those machines use spinning disks and allow more time between writes?

BlazeGraph doesn't support clustering - only high availability

Does this refer to update scaling, or just query scaling? Blazegraph supports replication clustering, which allows horizontal/linear query scaling.

Is federated queries an answer?

Blazegraph's scale-out architecture supports dynamic partitioning and service discovery, which could help to distribute data in a way that promotes distributed querying.


Which problem are we more concerned about: supporting lots and lots of data, or supporting lots and lots of queries?

  • If the former, note that Systap recommends a scale-up architecture (i.e. single-node or replication cluster) for data sets with less than 50B statements.
  • If the latter: can we dig into Wikidata's statistics to get concrete numbers for current usage?

Already asked to get statistics on current usage but I imagine we can add
new nodes if we need to. I think we'll have to see what load is like once
we get it for wikigrok and see what its like to run in laba. For more data
scaling up looks to be the thing. I think maybe we can have two instances -
one with just truthy claims which is small and one with fully reified
claims and truthy claims. The small one might be faster because it'll have
better cache hit rate. Maybe not needed. Just an idea.

This depends on how you model the reified RDF data. However, the inlined statements about statements are not in the same part of the statement indices as the ground statements. This is because the IVs all have a prefix byte that includes whether the IV is a Literal, URI, Blank Node or Statement (inlined statements about statements support). So the statement indices are partitioned on each component of the key in terms of whether that key component is a Literal, URI, Statement, etc.

The way this works normally, you have a ground statement like:

<:a :knows :b>

You then have statements about that statement:

<<<:a :knows :b>> :source :facebook>
<<<:a :knows :b>> :date "1/12/2015"^^xsd:dateTime>

Those statements about statements have the inlined statement as the first component of the key for the SPO(C) index. Thus they are clustered together, but not cluster with the original ground statement.

Thanks,
Bryan

@Thompsonbry.systap currently our reification model is pretty close to one described here: http://korrekt.org/papers/Wikidata-RDF-export-2014.pdf except that we also have direct link between entity and value in addition to fully reified representation.

Additional complication is that we have more than two levels, if you look in the paper - i.e. we have entity -> value, and then <<entity->value>>->qualifiers, but both value and qualifiers can be compound values - i.e. value may be:

value a wikidata:Value;
  wikidata:amount 1234;
  wikidata:precision 10;
  wikidata:unit "1";

etc. and the quailifiers each may be composed of values too. So I wonder if that still would be fine.

BlazeGraph supports arbitrary nesting of statements on statements, so, yes,
that would be fine.

This isn't really a scale out thing but i'm putting it here any way: One option for installation is keep all the BlazeGraph servers independently up to date using the same mechanism that we'd use to sync data to a cluster. In this case we don't need HA at all - just one copy of the change poller per server. No zookeeper. No quorum. Just independent machines keeping themselves up to date. Its nice in its simplicity. I dunno if its the right way for us to go though. @Joe, what do you think of that?

Manybubbles claimed this task.

Resolving this. Our official scale out plans are as follows:

  1. Each node will hold all data. We should have tons of spare room to grow here.
  2. We may or may not use HA. If we don't use HA we'll just keep each on up to date with its own poller.

When it comes to sharding style stuff we'll cross that bridge when we get closer - BlazeGraph has something for this. It'll need more review.