Page MenuHomePhabricator

Explore the feasibility of using SPARQL federation for scholia queries
Closed, ResolvedPublic

Description

The purpose of this ticket it to explore how federation could be used to rewrite scholia queries in the context of the WDQS graph split using the naive rule ?e wdt:P31 wd:Q13442814.

Queries to analyze/explore (please add more):

Note that early experiments can be done by federating wdqs with itself, e.g. https://w.wiki/7vE9.

Event Timeline

@Daniel_Mietchen @Fnielsen @EgonWillighagen as discussed in our previous meeting here is the task to coordinate the efforts around exploring federation for scholia queries. The ticket description is very minimal but should evolve as we make progress, thanks!

Note that early experiments can be done by federating wdqs with itself, e.g. https://w.wiki/7vE9.

Thanks for the example. Before I can experiment, I need to know which item types end up in which SPARQL endpoint. The example query suggest the author information will also go into the split. I am looking forward to the first experimental splitted endpoint to be available.

@EgonWillighagen thanks for the question!
The set of triples that will be part of the split are the triples that we consider owned by the item, in other words these are the triples listed by Special:EntityData using the dump flavor, e.g. https://www.wikidata.org/wiki/Special:EntityData/Q59239844.ttl?flavor=dump.
A scholarly article item will be part of the scholarly item subgraph if it matches this constraint: ?item wdt:P31 wd:Q5633421.
All its corresponding triples will also be part of the split, P53121 is a relatively painful query that demonstrate what triples can be considered owned by an entity and thus moved alongside the scholarly article to the same subgraph.

For instance in my query the BGP ?article wdt:P50 wd:Q1042470 matches a triple owned by the article and thus is queryable from the split.
On the hand everything requiring access to the triples owned by the author wd:Q1042470 is not queryable from the split and thus the BGP:

?article wdt:P50 ?author .
?author wdt:P213 "0000 0001 2124 7940"

won't be possible and would require federation like:

# all papers by ISNI 0000 0001 2124 7940 (Carlo Rovelli)
SELECT ?article ?articleLabel {
  ?author wdt:P213 "0000 0001 2124 7940"
  SERVICE <https://query.wikidata.org/sparql> {
    # Querying the scholarly article split
    ?article wdt:P50 ?author ;
             wdt:P31 wd:Q13442814 .
    BIND(?articleLabel as ?articleLabel) .
    SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
  }
}

The target endpoint being the main graph and the federated one being the scholarly article split.
I suppose federation can be done the other way around with:

# all papers by ISNI 0000 0001 2124 7940 (Carlo Rovelli)
SELECT ?article ?articleLabel {
  SERVICE <https://query.wikidata.org/sparql> {
    # Querying the wikidata main graph split
    ?author wdt:P213 "0000 0001 2124 7940"
  }
  hint:Prior hint:runFirst true . # Tell blazegraph to first collect ?author
  ?article wdt:P50 ?author ;
           wdt:P31 wd:Q13442814 .
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}

Where the target endpoint is the scholarly split and the federated one the main wikidata graph.
In the later example we already see that we have to help blazegraph by telling it what to run first (here collect the author information first).

I agree that using the current wdqs endpoint federating itself can be error prone but it's in theory possible to use it if someone is interested in doing early experiments.

Two scholia queries were rewritten:

The pages also contains some documentation about to approach such rewrites.
I'm boldly moving this ticket to our Needs Reporting (prior to be closed) column as I believe further explorations about how to rewrite scholia queries to support the split could perhaps be better handled in https://github.com/WDscholia/scholia.

But please free to re-open this ticket if you believe it has some value.

Gehel claimed this task.