Page MenuHomePhabricator

Introduce Named Queries and Named Query Middleware to wikidata
Open, In Progress, MediumPublic

Description

see also https://github.com/WDscholia/scholia/issues/2412

Description:
The Wikimedia Foundation is nearing the 4TB limit on its Blazegraph database, necessitating the exploration of federated queries, multiple SPARQL endpoints, and potentially different query languages due to the imminent graph split planned for Q1/2024. The current 1-minute timeout on the official Wikidata Query Service (WDQS) further compounds the issue, making efficient query management crucial.

Problem:
As Blazegraph approaches its storage limit and with a graph split under testing, our ability to handle queries efficiently is becoming strained. This situation may force the adoption of various technical adjustments like federated queries, use of different SPARQL endpoints, and more, potentially complicating the query process.

Proposed Solution:

  • Conversion to Named Queries: Shift all relevant queries to a named query format with parameters. This change will make queries easier to manage and modify without altering the core application code.
  • blackbox style SPARQL compatible middleware Introduction: Implement a SPARQL compatible middleware layer that handles the execution of these queries. The middleware will be responsible for routing queries to the appropriate data stores and translating them as necessary, thereby abstracting the complexities from the end-users. The actual SPARQL query will be hidden from the user This middleware will act as a broker between the client requests and the backend data stores, ensuring queries are executed on the correct store and results are returned efficiently. It will support named queries with parameters, enhancing flexibility and scalability.

Alternatives Considered:
Setting up a private instance of Wikidata as described in the CEUR-WS Vol-3262 paper, though this is resource-intensive and may not be feasible for all users.
Additional Context:
Recent discussions in Search Platform Office Hours and proposals for handling named queries suggest a growing need for more sophisticated query management solutions. Examples include short URLs supported by the Wikidata Query Service and internal handling by QLever, which could be extended further by our middleware.

References:

Event Timeline

Seppl2013 updated the task description. (Show Details)

Example:
Chat-GPT4 prompt
``
create a SPARQL query for wikidata that gives a list of german chancelors and the prior ministeries they have been head of in the past
ChatGPT

To create a SPARQL query that retrieves a list of German Chancellors along with the ministries they headed prior to becoming Chancellor, you can use the Wikidata Query Service. Here’s a basic query to achieve this:

sparql
SELECT ?chancellorLabel ?ministryLabel ?positionStartDate ?positionEndDate WHERE {
  # Get instances of the position "Chancellor of Germany"
  ?chancellor p:P39 ?statement.
  ?statement ps:P39 wd:Q4970706;  # wd:Q4970706 is the Wikidata item for "Chancellor of Germany"
           pq:P580 ?positionStartDate.

  OPTIONAL {
    ?statement pq:P582 ?positionEndDate.
  }

  # Get previous positions held by the same person
  ?chancellor p:P39 ?ministryStatement.
  ?ministryStatement ps:P39 ?ministry.
  ?ministryStatement pq:P580 ?ministryStartDate.
  
  OPTIONAL {
    ?ministryStatement pq:P582 ?ministryEndDate.
  }
  
  # Ensure the previous position is not the Chancellor itself and it happened before being Chancellor
  FILTER(?ministry != wd:Q4970706)
  FILTER(?ministryEndDate < ?positionStartDate || NOT EXISTS { ?ministryStatement pq:P582 ?ministryEndDate })

  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
ORDER BY ?positionStartDate DESC

Explanation:

Select Chancellor and Ministry Information: The query retrieves the label of the Chancellor (?chancellorLabel) and the label of the ministry (?ministryLabel) they headed, as well as the start (?positionStartDate) and end dates (?positionEndDate) of their Chancellorship.

Identify the Chancellors: It filters for people who have held the position of "Chancellor of Germany" (item Q4970706 on Wikidata).

Fetch Previous Ministries: It looks for other ministerial positions held by the same person (?ministry) before they became Chancellor, filtering out any positions that start after the Chancellorship began. The filter also ensures the ministry is not the Chancellorship itself.

A parser for the scholia queries can be found here https://github.com/tholzheim/named-queries/blob/master/tests/test_import_queries.py it is used to parse the queries to yaml files maybe it can be adjusted for snapquery.

Sat next to @WolfgangFahl and just showed them https://github.com/wmde/queripulator which might be of interest in terms of query manpulation, and also label service without needing a label service

WolfgangFahl changed the task status from Open to In Progress.Sun, May 5, 6:14 PM
WolfgangFahl triaged this task as Medium priority.

@Seppl2013: Thanks for participating in the Hackathon! We hope you had a great time.

  • If this task was being worked on and resolved at the Hackathon: Please change the task status to resolved via the Add Action...Change Status dropdown, and make sure that this task has a link to the public codebase.
  • If this task is still valid and should stay open: Please add another active project tag to this task, so others can find this task (as likely nobody in the future will look back at the Hackathon workboard when trying to find something they are interested in).
  • In case there is nothing else to do for this task, or nobody plans to work on this task anymore: Please set the task status to declined.

Thank you,
Phabricator housekeeping service