We are working to improve the stability and scalability of the Wikidata Query Service. This epic is tracking the larger work pieces. To stay up-to-date please follow the regular updates.
We are working on:
- Better suited APIs: We are improving the existing APIs and creating new ones to reduce the amount of queries to the Query Service that it is not well suited for.
- Creation and build out of the Wikibase REST API(tickets are at https://phabricator.wikimedia.org/tag/wikibase_rest_api/)
- Focusing Wikidata on general purpose data and lexicographical data: Wikidata cannot hold all open data. We are providing better alternatives for specialized and niche data.
- Reducing redundant data: Wikidata contains duplicated and redundant data. We want to minimize the need for this to reduce the amount of data and maintenance needed
- Introducing the mul language code (T285156)
We are exploring (further tickets will be created as we dive into these):
- Decoupling SERVICE from SPARQL: The Blazegraph-specific SERVICES are causing issues for query performance and tie us to Blazegraph. We will look into optimizing and changing them.
- Optimizing the LABEL Service (T212933)
- Introducing upper limits: To prevent commercial and other high-volume users from overwhelming the Query Service, we can put limits in place for high-volume users to ensure the Query Service is available for everyone.
- Sharding Strategy aka "splitting" the WDQS graph: One way to help the Query Service scale is by splitting the graph inside Blazegraph. We will look into potential splits and their implications and then plan the actual split. (T337013)
- Making it easier for people to run their own Query Service: Not all queries need to be run against the public Wikidata Query Service. Today already people are running their own Query Services. This takes load off the public endpoint. We can make it easier to set this up and run it.
- Warning when running queries that are ill-suited for the Query Service: There are a number of queries that are better served by other systems that provide access to Wikidata’s data. We can detect these queries and recommend using other systems when the Query Service is not needed to get the specific data requested.
- Moving off of Blazegraph: With Blazegraph being unmaintained we need to decide on an alternative system and migrate to it. The initial exploration for this has been done. (T330525)