Page MenuHomePhabricator

qlever dblp endpoint for wikidata federated query nomination
Closed, ResolvedPublic1 Estimated Story Points

Description

as started in https://phabricator.wikimedia.org/T197530 i was asked to create a separated ticket

Please make https://w.wiki/6m6B work by adding both https://qlever.cs.uni-freiburg.de/dblp and https://qlever.cs.uni-freiburg.de/wikidata.

according to https://github.com/ad-freiburg/qlever/discussions/588#discussioncomment-6018412

However, note that the IRIs in your example query are wrong, they should be https://qlever.cs.uni-freiburg.de/api/dblp and https://qlever.cs.uni-freiburg.de/api/wikidata

the query

https://w.wiki/6q2i is the query that should work

I am trying to find out how the "official procedure" for adding an endpoint to the allowed SPARQL Federation endpoints list is and found:

I tried the "Nominate new endpoint" button and got the result
This page is currently fully protected and can be edited only by administrators. ....

So hopefully via this task the situation may be resolved

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
Gehel triaged this task as High priority.Jun 27 2023, 3:45 PM

Looks like we lost track of this a bit. @bking and I can work this this week.

Change 946996 had a related patch set uploaded (by Ryan Kemper; author: Ryan Kemper):

[wikidata/query/deploy@master] query_service: whitelist qlever endpoint

https://gerrit.wikimedia.org/r/946996

Change 946996 merged by Ryan Kemper:

[wikidata/query/deploy@master] query_service: whitelist qlever endpoint

https://gerrit.wikimedia.org/r/946996

Change 947000 had a related patch set uploaded (by Ryan Kemper; author: Ryan Kemper):

[wikidata/query/deploy@master] whitelist: fix uni-freiburg.de links

https://gerrit.wikimedia.org/r/947000

Change 947000 merged by Ryan Kemper:

[wikidata/query/deploy@master] whitelist: fix uni-freiburg.de links

https://gerrit.wikimedia.org/r/947000

Mentioned in SAL (#wikimedia-operations) [2023-08-08T20:13:18Z] <ryankemper@deploy1002> Started deploy [wdqs/wdqs@aa5f5b7]: whitelist new qlever endpoints take 3 T339347

Mentioned in SAL (#wikimedia-operations) [2023-08-08T20:16:12Z] <ryankemper@deploy1002> Finished deploy [wdqs/wdqs@aa5f5b7]: whitelist new qlever endpoints take 3 T339347 (duration: 02m 54s)

Change 947001 had a related patch set uploaded (by Bking; author: Bking):

[wikidata/query/deploy@master] whitelist: add freiburg.de endpoints

https://gerrit.wikimedia.org/r/947001

Change 947001 abandoned by Bking:

[wikidata/query/deploy@master] whitelist: add freiburg.de endpoints

Reason:

Changes already completed in 947000

https://gerrit.wikimedia.org/r/947001

Mentioned in SAL (#wikimedia-operations) [2023-08-08T20:30:32Z] <ryankemper@deploy1002> Started deploy [wdqs/wdqs@f1a6177]: whitelist new qlever endpoints take 4 (forgot git pull) T339347

Mentioned in SAL (#wikimedia-operations) [2023-08-08T20:41:12Z] <ryankemper@deploy1002> Finished deploy [wdqs/wdqs@f1a6177]: whitelist new qlever endpoints take 4 (forgot git pull) T339347 (duration: 10m 44s)

@WolfgangFahl We've whitelisted the endpoints, but the query you linked above still does not work. Can you verify that is it working as expected? My teammate mentioned "it's returning application/sparql-results+xml but we only know how to process application/sparql-results+json, application/qlever-results+json." So maybe if we use a different Accept header? Let us know if we can assist.

@WolfgangFahl We've whitelisted the endpoints, but the query you linked above still does not work. Can you verify that is it working as expected? My teammate mentioned "it's returning application/sparql-results+xml but we only know how to process application/sparql-results+json, application/qlever-results+json." So maybe if we use a different Accept header? Let us know if we can assist.

I had this slightly backwards, after looking closer i think what is happening is:

  • Blazegraph is submitting (afaict) Accept: application/sparql-results+xml to qlever as part of the federated query
  • qlever is responding that it doesn't know how to respond in that format.
  • Blazegraph knows how to write application/sparql-results+json for normal api responses, but I'm not sure if it can read that format or how to tell it to use that here
Gehel claimed this task.
Gehel subscribed.

It looks like the federation is configured on our side, but there is an issue with how QLever respond to queries. I'm closing this for now, feel free to re-open if the Qlever issue is fixed and things still don't work.

Is it possible to configure Blazegraph to send the following Accept header:

Accept: application/sparql-results+xml, application/sparql-results+json

That would ask for results in SPARQL XML format (like before), and as a fallback in SPARQL JSON format. The HTTP standard explicitly supports this, see https://www.rfc-editor.org/rfc/rfc9110.html#name-accept .

@Hannah_Bast: anything is possible :)

That being said, that's not a feature that is possible out of the box. It would require some customization on our side to be able to configure different HTTP headers for different federation endpoints. I don't think this is something we should do.

@dcausse might have a different opinion.

@Gehel Thanks for the reply! But to clarify, what I am asking is not to do something different for different deferation endpoints. It's the same for every federation endpoint, namely sending the header

Accept: application/sparql-results+xml, application/sparql-results+json

That's the standard HTTP way of telling the endpoint: if you can, give me media type X, if not media type Y (and you could continu this list with yet more alternative media types).

@Hannah_Bast Blazegraph does properly send the header Accept: application/sparql-results+xml but it seems that this endpoint does only work when requesting application/sparql-results+json, anything else produces an http 500 error:

curl -k -XPOST -H"Accept:application/sparql-results+xml" --data-urlencode "query=select * { <iri1> <iri2> <iri3> } LIMIT 1"  https://data.nlg.gr/sparql

produces:

{"statusCode":500,"message":"Internal server error"}

Note that even if we changed blazegraph to accept multiple formats for all endpoints by setting the header that you suggest (Accept: application/sparql-results+xml, application/sparql-results+json) the https://data.nlg.gr/sparql endpoint still produces an http 500 error.

@dcausse I am confused, where does https://data.nlg.gr/sparql come from? I thought the endpoint in question were https://qlever.cs.uni-freiburg.de/api/dblp and https://qlever.cs.uni-freiburg.de/api/wikidata, where the following command lines work just fine:

curl -k -XPOST https://qlever.cs.uni-freiburg.de/api/dblp -H "Accept: application/sparql-results+xml, application/sparql-results+json" --data-urlencode "query=SELECT * WHERE { ?s ?p ?o } LIMIT 1"
curl -k -XPOST https://qlever.cs.uni-freiburg.de/api/wikidata -H "Accept: application/sparql-results+xml, application/sparql-results+json" --data-urlencode "query=SELECT * WHERE { ?s ?p ?o } LIMIT 1"

And to clarify my point, the exactly analogous query for the other SPARQL endpoint from your whitelist work fine, too. For example:

curl -k -XPOST http://opencitations.net/sparql -H "Accept: application/sparql-results+xml, application/sparql-results+json" --data-urlencode "query=SELECT * WHERE { ?s ?p ?o } LIMIT 1"

@Hannah_Bast sorry about this I mixed this ticket with another one, supporting https://qlever.cs.uni-freiburg.de/api/dblp would require changing the Accept the header that blazegraph sends during federation requests and it does not appear to be something that can be done without patching blazegraph (which is something we'd like to avoid unless really necessary). It's the first time we seem to encounter an endpoint that refuses to produce application/sparql-results+xml and we added almost 80 of them so far, it sounds to me like it would be nice to implement this on your side.

@dcausse @Gehel @WolfgangFahl QLever can now also produce application/sparql-results+xml. Here is an example:

curl -s https://qlever.cs.uni-freiburg.de/api/wikidata -H "Accept: application/sparql-results+xml" -H "Content-type: application/sparql-query" --data "PREFIX wd: <http://www.wikidata.org/entity/> SELECT * WHERE { VALUES (?country ?country_name ?mountain ?mountain_name ?height) { (wd:Q837 \"Nepal\"@en wd:Q513 \"Mount Everest\"@de 8850) } }"

For comparison, here is the analogous request to the WDQS, which gives an identical result (modulo formatting):

curl -s https://query.wikidata.org/sparql -H "Accept: application/sparql-results+xml" -H "Content-type: application/sparql-query" --data "PREFIX wd: <http://www.wikidata.org/entity/> SELECT * WHERE { VALUES (?country ?country_name ?mountain ?mountain_name ?height) { (wd:Q837 \"Nepal\"@en wd:Q513 \"Mount Everest\"@de 8850) } }"

@Hannah_Bast thanks for making such a change! I did a quick test locally and everything seems to work fine now, re-opening this accordingly.

Gehel removed Gehel as the assignee of this task.Oct 18 2023, 8:24 AM
Gehel lowered the priority of this task from High to Medium.Nov 15 2023, 9:48 AM

Change 994793 had a related patch set uploaded (by Ryan Kemper; author: Ryan Kemper):

[operations/puppet@production] wdqs: allow further federation to freiburg

https://gerrit.wikimedia.org/r/994793

Change 994793 merged by Ryan Kemper:

[operations/puppet@production] wdqs: allow further federation to freiburg

https://gerrit.wikimedia.org/r/994793

@Nikki We added https://qlever.cs.uni-freiburg.de/api/dblp as well as https://qlever.cs.uni-freiburg.de/api/wikimedia-commons. Can you confirm the wikimedia-commons endpoint is working as intended?

EDIT: Interestingly I'm getting java.util.concurrent.ExecutionException: java.util.concurrent.ExecutionException: java.lang.IllegalArgumentException: Service URI https://qlever.cs.uni-freiburg.de/api/dblp is not allowed with https://w.wiki/6m6B. Am I wrong to be using /api/dblp instead of just /dblp?

Yes, https://qlever.cs.uni-freiburg.de/api/dblp is the URL for API calls, whereas https://qlever.cs.uni-freiburg.de/dblp (without the /api) is the URL of the QLever UI. Same for all the other endpoints.

For example, https://qlever.cs.uni-freiburg.de/api/dblp?query=SELECT+%2A+WHERE+%7B+%3Fs+%3Fp+%3Fo+%7D+LIMIT+10 gives you the results for SELECT * WHERE { ?s ?p ?o } LIMIT 10 as application/sparql-results+json .

Yes, https://qlever.cs.uni-freiburg.de/api/dblp is the URL for API calls, whereas https://qlever.cs.uni-freiburg.de/dblp (without the /api) is the URL of the QLever UI. Same for all the other endpoints.

For example, https://qlever.cs.uni-freiburg.de/api/dblp?query=SELECT+%2A+WHERE+%7B+%3Fs+%3Fp+%3Fo+%7D+LIMIT+10 gives you the results for SELECT * WHERE { ?s ?p ?o } LIMIT 10 as application/sparql-results+json .

I'm probably missing something obvious here, but wrt https://qlever.cs.uni-freiburg.de/api/dblp we're getting the aforementioned Service URI https://qlever.cs.uni-freiburg.de/api/dblp is not allowed, but that URI is present in https://gerrit.wikimedia.org/g/operations/puppet/+/17cce9790f418d4d34c93932a39a45aea63e35c1/modules/query_service/files/allowlist.txt#84 (see https://w.wiki/93Nh for the query I'm using).

I suspect I'm missing something obvious but haven't figured it out yet.

@Hannah_Bast Okay, we figured out what was making the allowed endpoints not updated properly. https://w.wiki/6q2i doesn't get an error message anymore, although the query itself returns no results.

@RKemper Is your point that the queries should return a result? Neither DBLP nor Wikidata have the predicate foaf:name, so it's clear that both SERVICE queries return an empty result. Here is an example for a query that gives a result:

PREFIX schema: <http://schema.org/>
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
SELECT DISTINCT ?editor ?editorName
WHERE {
  SERVICE <https://qlever.cs.uni-freiburg.de/api/wikidata> {
    wd:Q113544723 wdt:P179 ?editor.
    ?editor schema:name ?editorName.
  }
}

@RKemper Is your point that the queries should return a result? Neither DBLP nor Wikidata have the predicate foaf:name, so it's clear that both SERVICE queries return an empty result. Here is an example for a query that gives a result:

PREFIX schema: <http://schema.org/>
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
SELECT DISTINCT ?editor ?editorName
WHERE {
  SERVICE <https://qlever.cs.uni-freiburg.de/api/wikidata> {
    wd:Q113544723 wdt:P179 ?editor.
    ?editor schema:name ?editorName.
  }
}

@Hannah_Bast Following up here: this query looks like it's working fine. Does everything look right on your end so I can mark this ticket as resolved?