Page MenuHomePhabricator

Hannah_Bast
User

Projects

User does not belong to any projects.

Today

  • No visible events.

Tomorrow

  • No visible events.

Friday

  • No visible events.

User Details

User Since
Sep 8 2021, 4:13 AM (231 w, 6 d)
Availability
Available
LDAP User
Unknown
MediaWiki User
Hannah Bast [ Global Accounts ]

Recent Activity

Dec 15 2025

Hannah_Bast added a comment to T406429: No Wikidata dumps for Week 40 of 2025 (recurring issue).

@BTullis @Jakob_WMDE Thank you for the workaround. I wasn't aware that these were always symlinks to files in one of these folders.

Dec 15 2025, 12:04 PM · Data-Platform-SRE (2025.10.17 - 2025.11.07), Data-Engineering, Essential-Work, Wikibase Reuse Team, Wikidata data dumps, Wikidata, Dumps-Generation

Dec 14 2025

Hannah_Bast added a comment to T406429: No Wikidata dumps for Week 40 of 2025 (recurring issue).

@BTullis Thanks a lot for fixing this, and it has been running smoothly for a while now. But this week, most of the dumps on https://dumps.wikimedia.org/wikidatawiki/entities/ are broken again:

dcatap.rdf                                         26-Sep-2025 08:15               89494
latest-all.json.bz2                                10-Dec-2025 07:32         99149780869
latest-all.json.gz                                 10-Dec-2025 01:15        150450422786
latest-all.nt.bz2                                  11-Dec-2025 04:29                  90
latest-all.nt.gz                                   10-Dec-2025 20:24                  89
latest-all.ttl.bz2                                 10-Dec-2025 20:14                  91
latest-all.ttl.gz                                  10-Dec-2025 17:23                  90
latest-lexemes.json.bz2                            10-Dec-2025 03:46                  91
latest-lexemes.json.gz                             10-Dec-2025 03:44                  90
latest-lexemes.nt.bz2                              12-Dec-2025 23:48                  94
latest-lexemes.nt.gz                               12-Dec-2025 23:43                  93
latest-lexemes.ttl.bz2                             12-Dec-2025 23:43                  95
latest-lexemes.ttl.gz                              12-Dec-2025 23:42                  94
latest-truthy.nt.bz2                               12-Dec-2025 21:44                  93
latest-truthy.nt.gz                                12-Dec-2025 18:35                  92
Dec 14 2025, 2:22 PM · Data-Platform-SRE (2025.10.17 - 2025.11.07), Data-Engineering, Essential-Work, Wikibase Reuse Team, Wikidata data dumps, Wikidata, Dumps-Generation

Oct 16 2025

Hannah_Bast updated subscribers of T406429: No Wikidata dumps for Week 40 of 2025 (recurring issue).

@BTullis @dcausse In the meantime, would it be an option to just ask Blazegraph the query CONSTRUCT WHERE { ?s ?p ?o } without timeout? I understand that this would probably take a long time in Blazegraph, but (a) it would finish eventually, and (b) assuming that Blazegraph implements the SPARQL standard correctly, it would actually provide a snapshot of the data, unlike the weekly dumps.

Oct 16 2025, 3:55 AM · Data-Platform-SRE (2025.10.17 - 2025.11.07), Data-Engineering, Essential-Work, Wikibase Reuse Team, Wikidata data dumps, Wikidata, Dumps-Generation

Oct 12 2025

Hannah_Bast added a comment to T406429: No Wikidata dumps for Week 40 of 2025 (recurring issue).

Dear all, any update on this? In particular, is there a chance that there will be a new dump this week?

Oct 12 2025, 10:28 PM · Data-Platform-SRE (2025.10.17 - 2025.11.07), Data-Engineering, Essential-Work, Wikibase Reuse Team, Wikidata data dumps, Wikidata, Dumps-Generation

Oct 7 2025

Hannah_Bast added a comment to T406436: `rdf:type` triples for references missing in messages from Wikidata update stream.

@matej_suchanek @dcausse Thank you for the explanations + it would indeed be very useful if you provided the munged RDF dumps. Is there anything that speaks against it from your side?

Oct 7 2025, 4:17 AM · Data-Engineering, Wikidata-Query-Service, EventStreams, Wikidata

Oct 6 2025

Hannah_Bast created T406437: Skolemization vs. blank nodes in Wikidata triples.
Oct 6 2025, 1:56 AM · Wikidata-Query-Service, Wikidata
Hannah_Bast created T406436: `rdf:type` triples for references missing in messages from Wikidata update stream.
Oct 6 2025, 1:46 AM · Data-Engineering, Wikidata-Query-Service, EventStreams, Wikidata

Oct 5 2025

Hannah_Bast added a project to T406429: No Wikidata dumps for Week 40 of 2025 (recurring issue): Wikidata.
Oct 5 2025, 3:47 PM · Data-Platform-SRE (2025.10.17 - 2025.11.07), Data-Engineering, Essential-Work, Wikibase Reuse Team, Wikidata data dumps, Wikidata, Dumps-Generation
Hannah_Bast created T406429: No Wikidata dumps for Week 40 of 2025 (recurring issue).
Oct 5 2025, 3:41 PM · Data-Platform-SRE (2025.10.17 - 2025.11.07), Data-Engineering, Essential-Work, Wikibase Reuse Team, Wikidata data dumps, Wikidata, Dumps-Generation

Jul 26 2025

Hannah_Bast added a comment to T398756: No wikidata dumps this week (20250630).

It's the same thing again this week: https://dumps.wikimedia.org/wikidatawiki/entities

Jul 26 2025, 3:16 PM · Data-Platform-SRE (2025.07.05 - 2025.07.25), Data-Engineering, Dumps-Generation, Wikidata

Jul 5 2025

Hannah_Bast added a comment to T386255: wmf.wikidata_item_page_link and wmf.wikidata_entity snapshots stuck at 2025-01-20.

@Aklapper As a "Bug Report" or as a "Production Error"?

Jul 5 2025, 3:42 PM · GrowthExperiments-NewcomerTasks, Data-Platform-SRE (2025.03.01 - 2025.03.21), Wikidata, Data-Engineering (Q3 2025 January 1st - March 31th), Wikipedia-iOS-App-Backlog, Wikipedia-Android-App-Backlog, Section-Level-Image-Suggestions, Image-Suggestions, Growth-Structured-Tasks, Structured-Data-Backlog, Growth-Team
Hannah_Bast added a comment to T386255: wmf.wikidata_item_page_link and wmf.wikidata_entity snapshots stuck at 2025-01-20.

The latest dump is overdue: https://dumps.wikimedia.org/wikidatawiki/entities . Any ideas what is going on?

Jul 5 2025, 10:47 AM · GrowthExperiments-NewcomerTasks, Data-Platform-SRE (2025.03.01 - 2025.03.21), Wikidata, Data-Engineering (Q3 2025 January 1st - March 31th), Wikipedia-iOS-App-Backlog, Wikipedia-Android-App-Backlog, Section-Level-Image-Suggestions, Image-Suggestions, Growth-Structured-Tasks, Structured-Data-Backlog, Growth-Team

Mar 28 2025

Hannah_Bast added a comment to T389787: The latest wikidata entity dump (latest-all.ttl.bz2) contains each triple twice.

Good news, the latest version of latest-all.ttl.bz2 in https://dumps.wikimedia.org/wikidatawiki/entities, which is from 27-Mar-2025 06:55, has a normal size and number of triples again!

Mar 28 2025, 6:54 PM · Wikidata, Data-Engineering (Q3 2025 January 1st - March 31th)

Mar 24 2025

Hannah_Bast added a comment to T389787: The latest wikidata entity dump (latest-all.ttl.bz2) contains each triple twice.

I agree and will check as soon as the new dump is there!

Mar 24 2025, 10:43 PM · Wikidata, Data-Engineering (Q3 2025 January 1st - March 31th)

Mar 23 2025

Hannah_Bast added a comment to T386401: No wikidata dumps last week (20250203).

I have updated https://qlever.cs.uni-freiburg.de/wikidata based on the latest data from https://dumps.wikimedia.org/wikidatawiki/entities and it worked, thanks a lot. However, I noticed the following:

Mar 23 2025, 12:36 PM · Data-Platform-SRE (2025.03.01 - 2025.03.21), Wikibase Reuse Team, Dumps-Generation, Wikidata

Mar 20 2025

Hannah_Bast added a comment to T386401: No wikidata dumps last week (20250203).

I see a new latest-all.json.gz (from 20-Mar-2025 04:42).

Mar 20 2025, 11:44 AM · Data-Platform-SRE (2025.03.01 - 2025.03.21), Wikibase Reuse Team, Dumps-Generation, Wikidata

Mar 13 2025

Hannah_Bast added a comment to T386401: No wikidata dumps last week (20250203).

@Ahoelzl The latest RDF dump for latest-all is from 29.01.2025, so now one-and-a-half months old already, see https://dumps.wikimedia.org/wikidatawiki/entities

Mar 13 2025, 5:51 PM · Data-Platform-SRE (2025.03.01 - 2025.03.21), Wikibase Reuse Team, Dumps-Generation, Wikidata

Mar 2 2025

Hannah_Bast added a comment to T330525: Migrate Wikidata off of Blazegraph.

@dcausse Quick question regarding the weekly Wikidata dumps on https://dumps.wikimedia.org/wikidatawiki/entities . The last dump of latest-all.ttl.bz2 is from 29.01.2025, that is, over a month ago. Did something go wrong or are these dumps no longer supported?

Mar 2 2025, 6:07 PM · Wikidata, Wikidata-Query-Service

Dec 13 2024

Hannah_Bast added a comment to T330525: Migrate Wikidata off of Blazegraph.

@Sj and @Pfps Thank you very much for conducting this benchmark. The detailed account of how to install and run the various engines is very useful. Here are some comments regarding the current benchmark queries and benchmarking in general:

Dec 13 2024, 7:56 AM · Wikidata, Wikidata-Query-Service

Nov 9 2024

Hannah_Bast added a comment to T294133: [EPIC] Expose rdf-streaming-updater.mutation content through EventStreams.

@dcausse Thank you for your reply. Do I understand you correctly that the current best way, or at least a feasible and correct way, for us to perform updates would be:

Nov 9 2024, 10:06 AM · Discovery-Search, Data-Engineering-Icebox, Data-Engineering, Epic, Event-Platform, Analytics, Wikidata, EventStreams

Sep 13 2024

Hannah_Bast added a comment to T330525: Migrate Wikidata off of Blazegraph.

@Sj Getting the updates in batches would be perfectly fine. But how do you want to verify that it works without having a reference endpoint to compare to?

Sep 13 2024, 12:06 AM · Wikidata, Wikidata-Query-Service

Sep 6 2024

Hannah_Bast added a comment to T330525: Migrate Wikidata off of Blazegraph.

@tfmorris It was a showstopper for using it as a drop-in replacement for Blazegraph two years ago. SPARQL 1.1 Update was always on QLever's agenda (already two years ago), a first proof of concept was implemented in March 2023, a functional version has been available since May 2024, and we are currently in the process of fully integrating it into the main branch. Unfortunately, Wikidata still does not provide a publicly accessible update stream (this is difficult for a variety of reasons). As soon as that is available, we could provide a SPARQL endpoint that is in sync with the public Wikidata SPARQL endpoint.

Sep 6 2024, 6:15 PM · Wikidata, Wikidata-Query-Service

Feb 21 2024

Hannah_Bast added a comment to T339347: qlever dblp endpoint for wikidata federated query nomination.

@RKemper Is your point that the queries should return a result? Neither DBLP nor Wikidata have the predicate foaf:name, so it's clear that both SERVICE queries return an empty result. Here is an example for a query that gives a result:

Feb 21 2024, 11:48 PM · Data-Platform-SRE (2024.04.15 - 2024.05.05), Wikidata-Query-Service, Wikidata

Feb 11 2024

Hannah_Bast added a comment to T294133: [EPIC] Expose rdf-streaming-updater.mutation content through EventStreams.

@Harej Thanks for the quick reply, James! Are you saying that the script is scraping + parsing https://www.wikidata.org/wiki/Special:RecentChanges to obtain the triples to be added and deleted? Or is there a different way to access that page, which gives you the added and deleted triples in a more machine-friendly format?

Feb 11 2024, 6:22 AM · Discovery-Search, Data-Engineering-Icebox, Data-Engineering, Epic, Event-Platform, Analytics, Wikidata, EventStreams
Hannah_Bast added a comment to T294133: [EPIC] Expose rdf-streaming-updater.mutation content through EventStreams.

I just looked into https://github.com/wikimedia/wikidata-query-rdf, which provides a tool runUpdate.sh. When I run it for a Blazegraph instance with exactly one triple of the form <http://www.wikidata.org> <http://schema.org/dateModified> "2024-02-11T05:42Z"^^xsd:dateTime, it will continuously update the instance with all changes since that date. I have two questions:

Feb 11 2024, 6:01 AM · Discovery-Search, Data-Engineering-Icebox, Data-Engineering, Epic, Event-Platform, Analytics, Wikidata, EventStreams

Feb 1 2024

Hannah_Bast added a comment to T339347: qlever dblp endpoint for wikidata federated query nomination.

Yes, https://qlever.cs.uni-freiburg.de/api/dblp is the URL for API calls, whereas https://qlever.cs.uni-freiburg.de/dblp (without the /api) is the URL of the QLever UI. Same for all the other endpoints.

Feb 1 2024, 12:12 AM · Data-Platform-SRE (2024.04.15 - 2024.05.05), Wikidata-Query-Service, Wikidata

Sep 27 2023

Hannah_Bast added a comment to T339347: qlever dblp endpoint for wikidata federated query nomination.

@dcausse @Gehel @WolfgangFahl QLever can now also produce application/sparql-results+xml. Here is an example:

Sep 27 2023, 9:24 PM · Data-Platform-SRE (2024.04.15 - 2024.05.05), Wikidata-Query-Service, Wikidata

Sep 8 2023

Hannah_Bast added a comment to T339347: qlever dblp endpoint for wikidata federated query nomination.

@dcausse I am confused, where does https://data.nlg.gr/sparql come from? I thought the endpoint in question were https://qlever.cs.uni-freiburg.de/api/dblp and https://qlever.cs.uni-freiburg.de/api/wikidata, where the following command lines work just fine:

Sep 8 2023, 2:11 PM · Data-Platform-SRE (2024.04.15 - 2024.05.05), Wikidata-Query-Service, Wikidata
Hannah_Bast added a comment to T339347: qlever dblp endpoint for wikidata federated query nomination.

@Gehel Thanks for the reply! But to clarify, what I am asking is not to do something different for different deferation endpoints. It's the same for every federation endpoint, namely sending the header

Sep 8 2023, 8:13 AM · Data-Platform-SRE (2024.04.15 - 2024.05.05), Wikidata-Query-Service, Wikidata

Aug 30 2023

Hannah_Bast added a comment to T339347: qlever dblp endpoint for wikidata federated query nomination.

Is it possible to configure Blazegraph to send the following Accept header:

Aug 30 2023, 1:06 PM · Data-Platform-SRE (2024.04.15 - 2024.05.05), Wikidata-Query-Service, Wikidata

Feb 26 2022

Hannah_Bast added a comment to T291903: Evaluate QLever as a time lagging SPARQL backend to offload the BlazeGraph cluster.

To add to this, the two-index approach has another rather beautiful property:

Feb 26 2022, 7:34 PM · Wikidata, Wikidata-Query-Service

Feb 5 2022

Hannah_Bast added a comment to T289621: Evaluate Halyard as alternative to Blazegraph.

@Pulquero AFAIK the two main problems with Blazegraph are:

Feb 5 2022, 4:14 PM · MediaWiki-Stakeholders-Group, Wikidata, Epic, Wikidata-Query-Service
Hannah_Bast added a comment to T289621: Evaluate Halyard as alternative to Blazegraph.

@Hannah_Bast maybe this interests you? Do you think this system would perform well considering the load on WDQS and the type of queries we have?

Feb 5 2022, 2:14 PM · MediaWiki-Stakeholders-Group, Wikidata, Epic, Wikidata-Query-Service
Hannah_Bast added a comment to T289621: Evaluate Halyard as alternative to Blazegraph.

@Pulquero Thank you for this interesting piece of information. I have a few questions:

Feb 5 2022, 1:53 PM · MediaWiki-Stakeholders-Group, Wikidata, Epic, Wikidata-Query-Service

Dec 11 2021

Hannah_Bast added a comment to T206560: [Epic] Evaluate alternatives to Blazegraph.

I am taking the liberty to polute the thread with a reference to "MillenniumDB: A Persistent, Open-Source, Graph Database" https://arxiv.org/pdf/2111.01540.pdf from November 2021. Millennium may have some serious limitations in terms of requirements that can be setup, but interestingly they write "However, MillenniumDB was designed with the complete version of Wikidata – including qualifiers, references, etc. – in mind." and their benchmarks seems strong. They compare against Blazegraph, Jena, Virtuoso and Neo4J.

Dec 11 2021, 10:12 AM · Wikidata, Epic, Wikidata-Query-Service

Oct 15 2021

Hannah_Bast added a comment to T206560: [Epic] Evaluate alternatives to Blazegraph.

I imported the wikidata-DB into neo4j and it works quite well.

Oct 15 2021, 3:07 PM · Wikidata, Epic, Wikidata-Query-Service

Oct 8 2021

Hannah_Bast added a comment to T291903: Evaluate QLever as a time lagging SPARQL backend to offload the BlazeGraph cluster.

@DD063520: You find some details at https://github.com/ad-freiburg/qlever/blob/master/docs/quickstart.md .

Oct 8 2021, 12:50 PM · Wikidata, Wikidata-Query-Service
Hannah_Bast added a comment to T291903: Evaluate QLever as a time lagging SPARQL backend to offload the BlazeGraph cluster.

Oh, ok. Could you give an example of a query that has no "highly selective triples" so I can test it on QLever vs. BG?

Oct 8 2021, 6:16 AM · Wikidata, Wikidata-Query-Service

Oct 2 2021

Hannah_Bast added a comment to T291903: Evaluate QLever as a time lagging SPARQL backend to offload the BlazeGraph cluster.

@Justin0x2004 Thanks, Justin. QLever already supports something like named subqueries. You can simply have the same subquery in multiple places and it will be evaluated only once and for the other occurrences, the result will be reused.

Oct 2 2021, 2:46 AM · Wikidata, Wikidata-Query-Service

Sep 30 2021

Hannah_Bast added a comment to T290839: Evaluate a double backend strategy for WDQS.

@So9q I have commented on your comments concerning Rya in the "Evaluate Apache Rya as alternative to Blazegraph": https://phabricator.wikimedia.org/T289561#7393732

Sep 30 2021, 11:53 PM · Wikidata, Wikidata-Query-Service
Hannah_Bast added a comment to T291903: Evaluate QLever as a time lagging SPARQL backend to offload the BlazeGraph cluster.

I have now revised QLever's Quickstart page: https://github.com/ad-freiburg/qlever

Sep 30 2021, 11:43 PM · Wikidata, Wikidata-Query-Service
Hannah_Bast added a comment to T289561: Evaluate Apache Rya as alternative to Blazegraph.

We looked a bit into Apache Rya. A couple of observations:

Sep 30 2021, 11:25 PM · Wikidata, Wikidata-Query-Service

Sep 28 2021

Hannah_Bast added a comment to T291903: Evaluate QLever as a time lagging SPARQL backend to offload the BlazeGraph cluster.

I will provide a detailed reply later today, also to the other thread. Four things for now:

Sep 28 2021, 8:51 AM · Wikidata, Wikidata-Query-Service

Sep 17 2021

Hannah_Bast added a comment to T290839: Evaluate a double backend strategy for WDQS.

It's of course up to you (the Wikidata team) to decide this. But I wouldn't dismiss this idea so easily.

Sep 17 2021, 12:08 AM · Wikidata, Wikidata-Query-Service

Sep 15 2021

Hannah_Bast added a comment to T290839: Evaluate a double backend strategy for WDQS.

Wikibase doesn’t store data in RDF, so dumping the data set means parsing the native representation (JSON) and writing it out again as RDF, including some metadata for each page.

Sep 15 2021, 7:08 PM · Wikidata, Wikidata-Query-Service
Hannah_Bast added a comment to T290839: Evaluate a double backend strategy for WDQS.

Can you or anyone else explain why the data dump takes so long, Lukas? One would expect that it is much easier to dump a (snapshot of a) dataset than to build a complex data structure from it. Also, dumping and compression are easily parallelized. And the pure volume isn't that large (< 100 GB compressed).

Sep 15 2021, 12:49 PM · Wikidata, Wikidata-Query-Service
Hannah_Bast added a comment to T206561: Evaluate Virtuoso as alternative to Blazegraph.

I agree with Kingsley that you don't need a distributed SPARQL engine when the knowledge graph fits on a single machine and will do so also in the future. Which is clearly the case for Wikidata, since it's even the case for the ten times larger UniProt (which at the time of this writing already contains over 90 billion triples).

Sep 15 2021, 5:41 AM · Wikidata, Wikidata-Query-Service
Hannah_Bast added a comment to T206560: [Epic] Evaluate alternatives to Blazegraph.

For whoever is interested, I wrote more about the QLever SPARQL engine on this thread: https://phabricator.wikimedia.org/T290839 .

Sep 15 2021, 2:51 AM · Wikidata, Epic, Wikidata-Query-Service
Hannah_Bast added a comment to T290839: Evaluate a double backend strategy for WDQS.

PS: Note that large query throughputs are not a problem for a SPARQL engine that runs on a single standard PC or server. Depending on the overall demand, you can just run multiple instances on separate machines and trivially distribute the queries. What's more important, I think, is the processing time for individual queries because you cannot easily distribute the processing of an individual query. And it does make quite a difference for the user experience whether a query takes seconds, minutes, or hours. The current SPARQL endpoint for Wikidata (realized using Blazegraph) times out a lot when the queries are a bit harder.

Sep 15 2021, 2:43 AM · Wikidata, Wikidata-Query-Service
Hannah_Bast added a comment to T290839: Evaluate a double backend strategy for WDQS.

Yes, QLever is developed in our group at the University of Freiburg. I presented it to the Wikidata team in March. You can try out a demo on the complete Wikidata on https://qlever.cs.uni-freiburg.de/wikidata . You can also select other interesting large knowledge graphs there, for example, the complete OpenStreetMap data.

Sep 15 2021, 2:35 AM · Wikidata, Wikidata-Query-Service

Sep 14 2021

Hannah_Bast added a comment to T290839: Evaluate a double backend strategy for WDQS.

I have already talked about Sage with Lukas last November. I don't think that Sage is an option for Wikidata. The focus of Sage is on the ability to pause and resume SPARQL queries (which is a very useful feature), not on efficiency. For example, if you run the people-professions query from https://phabricator.wikimedia.org/T206560 on their demo instance of Wikidata http://sage.univ-nantes.fr/#query (which has only 2.3B triples), it takes forever. Also simple queries are quite slow. For example, the following query (all humans) produces results at a rate of around a thousand rows per second:

Sep 14 2021, 12:40 PM · Wikidata, Wikidata-Query-Service

Sep 13 2021

Hannah_Bast added a comment to T290839: Evaluate a double backend strategy for WDQS.

Would it be an option that one of these two backends uses a SPARQL engine that does not support the SPARQL Update operation, but instead rebuilds its index periodically, for example, every 24 hours?

Sep 13 2021, 6:00 AM · Wikidata, Wikidata-Query-Service

Sep 9 2021

Hannah_Bast added a comment to T206560: [Epic] Evaluate alternatives to Blazegraph.

Thanks, Kingsley, that explains it!

Sep 9 2021, 6:54 PM · Wikidata, Epic, Wikidata-Query-Service

Sep 8 2021

Hannah_Bast added a comment to T206560: [Epic] Evaluate alternatives to Blazegraph.

[1] Our Live Wikidata SPARQL Query Endpoint

Sep 8 2021, 4:37 AM · Wikidata, Epic, Wikidata-Query-Service