Page MenuHomePhabricator

Epic: Wikidata Query Service stabilization
Open, HighPublic

Description

We are working to improve the stability and scalability of the Wikidata Query Service. This epic is tracking the larger work pieces. To stay up-to-date please follow the regular updates.

We are working on:

We are exploring (further tickets will be created as we dive into these):

  • Decoupling SERVICE from SPARQL: The Blazegraph-specific SERVICES are causing issues for query performance and tie us to Blazegraph. We will look into optimizing and changing them.
    • Optimizing the LABEL Service (T212933)
  • Introducing upper limits: To prevent commercial and other high-volume users from overwhelming the Query Service, we can put limits in place for high-volume users to ensure the Query Service is available for everyone.
  • Sharding Strategy aka "splitting" the WDQS graph: One way to help the Query Service scale is by splitting the graph inside Blazegraph. We will look into potential splits and their implications and then plan the actual split. (T337013)
  • Making it easier for people to run their own Query Service: Not all queries need to be run against the public Wikidata Query Service. Today already people are running their own Query Services. This takes load off the public endpoint. We can make it easier to set this up and run it.
  • Warning when running queries that are ill-suited for the Query Service: There are a number of queries that are better served by other systems that provide access to Wikidata’s data. We can detect these queries and recommend using other systems when the Query Service is not needed to get the specific data requested.
  • Moving off of Blazegraph: With Blazegraph being unmaintained we need to decide on an alternative system and migrate to it. The initial exploration for this has been done. (T330525)

Event Timeline

An update re: moving off of Blazegraph: one early 2025 benchmark of different backends suggests QLever and possibly Millennium are solid candidates for replacing Blazegraph. Which should stabilize WDQS more reliably than the temporary delay-tactic of splitting the graph.

An update re: moving off of Blazegraph: one early 2025 benchmark of different backends suggests QLever and possibly Millennium are solid candidates for replacing Blazegraph. Which should stabilize WDQS more reliably than the temporary delay-tactic of splitting the graph.

Per https://www.wikidata.org/wiki/File:WDQS_Backend_Alternatives_working_paper.pdf as of 2022 MillenniumDB does not have full support for SPARQL 1.1, so it can not be used to replace WDQS.