Page MenuHomePhabricator

[Epic] Evaluate alternatives to Blazegraph
Open, MediumPublic

Description

Since Blazegraph project seems to not be active anymore (last commit 2 years ago at https://github.com/blazegraph/database) we need to evaluate if we want to switch to graph DB project that is more actively supported/developed.

The requirements should be:

  • Full SPARQL 1.1 support, including SPARQL Update
  • Open source
  • Can load and run queries on full Wikidata database

Event Timeline

Smalyshev created this task.Oct 9 2018, 6:27 PM
Restricted Application added a project: Wikidata. · View Herald TranscriptOct 9 2018, 6:27 PM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript

I would clarify the requirements to “SPARQL support, including SPARQL Update”. For example, Sage boasts stable response times and general responsiveness, which would be useful for us, but its backing store is HDT, a read-only RDF serialization format: since HDT files cannot be efficiently updated, Sage is read-only, so we can’t use it for a live-updating query service.

Smalyshev updated the task description. (Show Details)Oct 10 2018, 3:29 PM
Smalyshev triaged this task as Medium priority.Oct 17 2018, 4:25 PM
Gehel added a subscriber: Gehel.Oct 19 2018, 1:42 PM

A few wishes I have from an operations point of view for any replacement. Those are not necessarily mandatory, but we should evaluate them at some point:

  • ability to scale both read and write load across multiple nodes
  • ability to limit resource consumption to fail gracefully
Akuckartz added a subscriber: Akuckartz.
Gehel moved this task from Scaling to Epics on the Wikidata-Query-Service board.Jun 24 2020, 12:53 PM
TomT0m added a subscriber: TomT0m.Jun 26 2020, 10:02 AM

What are the requirements?

Akuckartz added a comment.EditedAug 16 2020, 1:55 PM

I think it is important to keep in mind that significant efforts are being made to unite the RDF and Property Graph communities. One aspect of this is the development of "RDF*" and "SPARQL*" (SPARQL star). BlazeGraph played and continues to play a positive role in this. This is the main paper explaining the concepts behind RDF* and SPARQL*: https://arxiv.org/pdf/1406.3399.pdf

A more recent position paper by Olaf Hartig:
https://blog.liu.se/olafhartig/2019/01/10/position-statement-rdf-star-and-sparql-star/

I represent a research group at the Computer Science and Artificial Intelligence Laboratory at MIT. For the past year, we've been doing a lot of work with local copies Wikidata and have experienced our own share of frustrations with Blazegraph. We are currently looking for alternatives as well. We'd love to hear about the directions being taken here, give our own input as to what capabilities we would hope to find in an alternative, and, perhaps, volunteer our services to help with the development/transition. Our work is a little niche so our recommendations may not be representative of the general need and our group is manned mostly by undergraduate researchers but we'd love to help if we can.

Please let me know who I can talk to specifically about this: jecummin@csail.mit.edu

Gehel added a comment.Thu, Aug 27, 7:05 PM

I represent a research group at the Computer Science and Artificial Intelligence Laboratory at MIT. For the past year, we've been doing a lot of work with local copies Wikidata and have experienced our own share of frustrations with Blazegraph. We are currently looking for alternatives as well. We'd love to hear about the directions being taken here, give our own input as to what capabilities we would hope to find in an alternative, and, perhaps, volunteer our services to help with the development/transition. Our work is a little niche so our recommendations may not be representative of the general need and our group is manned mostly by undergraduate researchers but we'd love to help if we can.

Please let me know who I can talk to specifically about this: jecummin@csail.mit.edu

We're always interested in collaborations! We don't have documented formal requirements (that's part of what we need to do), but what comes to mind right now:

  • horizontal scaling
  • supports SPARQL (this might be a constraint that we could drop if we can't find a solution, but this would mean a world of pain for our users and for the whole ecosystem around WDQS)
  • supports SPARQL services, or a way to emulate them (we might want to review this requirement as well, having a backend service having dependencies on external services is problematic, we might want to implement services as a frontend)
  • OpenSource (obviously)
  • good performances in a context with both heavy reads and heavy writes
  • probably a lot of other things as well (it's late here, we need more time to formalize)

Feel free to join the Search Platform office hours to discuss this more synchronously!