Page MenuHomePhabricator

Hannah_Bast
User

Projects

User does not belong to any projects.

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Monday

  • Clear sailing ahead.

User Details

User Since
Sep 8 2021, 4:13 AM (5 w, 3 d)
Availability
Available
LDAP User
Unknown
MediaWiki User
Hannah Bast [ Global Accounts ]

Recent Activity

Yesterday

Hannah_Bast added a comment to T206560: [Epic] Evaluate alternatives to Blazegraph.

I imported the wikidata-DB into neo4j and it works quite well.

Fri, Oct 15, 3:07 PM · MediaWiki-Stakeholders-Group, Wikidata, Epic, Wikidata-Query-Service

Fri, Oct 8

Hannah_Bast added a comment to T291903: Evaluate QLever as a time lagging SPARQL backend to offload the BlazeGraph cluster.

@DD063520: You find some details at https://github.com/ad-freiburg/qlever/blob/master/docs/quickstart.md .

Fri, Oct 8, 12:50 PM · Wikidata, Wikidata-Query-Service
Hannah_Bast added a comment to T291903: Evaluate QLever as a time lagging SPARQL backend to offload the BlazeGraph cluster.

Oh, ok. Could you give an example of a query that has no "highly selective triples" so I can test it on QLever vs. BG?

Fri, Oct 8, 6:16 AM · Wikidata, Wikidata-Query-Service

Sat, Oct 2

Hannah_Bast added a comment to T291903: Evaluate QLever as a time lagging SPARQL backend to offload the BlazeGraph cluster.

@Justin0x2004 Thanks, Justin. QLever already supports something like named subqueries. You can simply have the same subquery in multiple places and it will be evaluated only once and for the other occurrences, the result will be reused.

Sat, Oct 2, 2:46 AM · Wikidata, Wikidata-Query-Service

Thu, Sep 30

Hannah_Bast added a comment to T290839: Evaluate a double backend strategy for WDQS.

@So9q I have commented on your comments concerning Rya in the "Evaluate Apache Rya as alternative to Blazegraph": https://phabricator.wikimedia.org/T289561#7321936

Thu, Sep 30, 11:53 PM · Wikidata, Wikidata-Query-Service
Hannah_Bast added a comment to T291903: Evaluate QLever as a time lagging SPARQL backend to offload the BlazeGraph cluster.

I have now revised QLever's Quickstart page: https://github.com/ad-freiburg/qlever

Thu, Sep 30, 11:43 PM · Wikidata, Wikidata-Query-Service
Hannah_Bast added a comment to T289561: Evaluate Apache Rya as alternative to Blazegraph.

We looked a bit into Apache Rya. A couple of observations:

Thu, Sep 30, 11:25 PM · Wikidata, Wikidata-Query-Service

Tue, Sep 28

Hannah_Bast added a comment to T291903: Evaluate QLever as a time lagging SPARQL backend to offload the BlazeGraph cluster.

I will provide a detailed reply lated today, also to the other thread. Four things for now:

Tue, Sep 28, 8:51 AM · Wikidata, Wikidata-Query-Service

Fri, Sep 17

Hannah_Bast added a comment to T290839: Evaluate a double backend strategy for WDQS.

It's of course up to you (the Wikidata team) to decide this. But I wouldn't dismiss this idea so easily.

Fri, Sep 17, 12:08 AM · Wikidata, Wikidata-Query-Service

Sep 15 2021

Hannah_Bast added a comment to T290839: Evaluate a double backend strategy for WDQS.

Wikibase doesn’t store data in RDF, so dumping the data set means parsing the native representation (JSON) and writing it out again as RDF, including some metadata for each page.

Sep 15 2021, 7:08 PM · Wikidata, Wikidata-Query-Service
Hannah_Bast added a comment to T290839: Evaluate a double backend strategy for WDQS.

Can you or anyone else explain why the data dump takes so long, Lukas? One would expect that it is much easier to dump a (snapshot of a) dataset than to build a complex data structure from it. Also, dumping and compression are easily parallelized. And the pure volume isn't that large (< 100 GB compressed).

Sep 15 2021, 12:49 PM · Wikidata, Wikidata-Query-Service
Hannah_Bast added a comment to T206561: Evaluate Virtuoso as alternative to Blazegraph.

I agree with Kingsley that you don't need a distributed SPARQL engine when the knowledge graph fits on a single machine and will do so also in the future. Which is clearly the case for Wikidata, since it's even the case for the ten times larger UniProt (which at the time of this writing already contains over 90 billion triples).

Sep 15 2021, 5:41 AM · Wikidata, Wikidata-Query-Service
Hannah_Bast added a comment to T206560: [Epic] Evaluate alternatives to Blazegraph.

For whoever is interested, I wrote more about the QLever SPARQL engine on this thread: https://phabricator.wikimedia.org/T290839 .

Sep 15 2021, 2:51 AM · MediaWiki-Stakeholders-Group, Wikidata, Epic, Wikidata-Query-Service
Hannah_Bast added a comment to T290839: Evaluate a double backend strategy for WDQS.

PS: Note that large query throughputs are not a problem for a SPARQL engine that runs on a single standard PC or server. Depending on the overall demand, you can just run multiple instances on separate machines and trivially distribute the queries. What's more important, I think, is the processing time for individual queries because you cannot easily distribute the processing of an individual query. And it does make quite a difference for the user experience whether a query takes seconds, minutes, or hours. The current SPARQL endpoint for Wikidata (realized using Blazegraph) times out a lot when the queries are a bit harder.

Sep 15 2021, 2:43 AM · Wikidata, Wikidata-Query-Service
Hannah_Bast added a comment to T290839: Evaluate a double backend strategy for WDQS.

Yes, QLever is developed in our group at the University of Freiburg. I presented it to the Wikidata team in March. You can try out a demo on the complete Wikidata on https://qlever.cs.uni-freiburg.de/wikidata . You can also select other interesting large knowledge graphs there, for example, the complete OpenStreetMap data.

Sep 15 2021, 2:35 AM · Wikidata, Wikidata-Query-Service

Sep 14 2021

Hannah_Bast added a comment to T290839: Evaluate a double backend strategy for WDQS.

I have already talked about Sage with Lukas last November. I don't think that Sage is an option for Wikidata. The focus of Sage is on the ability to pause and resume SPARQL queries (which is a very useful feature), not on efficiency. For example, if you run the people-professions query from https://phabricator.wikimedia.org/T206560 on their demo instance of Wikidata http://sage.univ-nantes.fr/#query (which has only 2.3B triples), it takes forever. Also simple queries are quite slow. For example, the following query (all humans) produces results at a rate of around a thousand rows per second:

Sep 14 2021, 12:40 PM · Wikidata, Wikidata-Query-Service

Sep 13 2021

Hannah_Bast added a comment to T290839: Evaluate a double backend strategy for WDQS.

Would it be an option that one of these two backends uses a SPARQL engine that does not support the SPARQL Update operation, but instead rebuilds its index periodically, for example, every 24 hours?

Sep 13 2021, 6:00 AM · Wikidata, Wikidata-Query-Service

Sep 9 2021

Hannah_Bast added a comment to T206560: [Epic] Evaluate alternatives to Blazegraph.

Thanks, Kingsley, that explains it!

Sep 9 2021, 6:54 PM · MediaWiki-Stakeholders-Group, Wikidata, Epic, Wikidata-Query-Service

Sep 8 2021

Hannah_Bast added a comment to T206560: [Epic] Evaluate alternatives to Blazegraph.

[1] Our Live Wikidata SPARQL Query Endpoint

Sep 8 2021, 4:37 AM · MediaWiki-Stakeholders-Group, Wikidata, Epic, Wikidata-Query-Service