Page MenuHomePhabricator

Also add RKD schema.org Knowledge Graph to the WDQS allowlist
Closed, ResolvedPublic

Description

Many thanks for adding the first RKD knowledge graph to the WDQS allowlist (see parent task). We have noticed that RKD maintains a second graph which includes more valuable info about artists specifically which we want to be able to include in federated queries too.

Request to allow federation with a second knowledge graph of the RKD (Netherlands Institute for Art History)

Event Timeline

Hello @RKemper ! Tagging you but feel free to tag someone else; I'm bringing this task to the attention as it's a new one, but it perhaps may have slipped under the radar because it's a subtask of another (finished) one...

Let me know if I can help with something.

Change #1193362 had a related patch set uploaded (by Brouberol; author: Brouberol):

[operations/puppet@production] wdqs: add RKD schema.org Knowledge Graph to allow-list

https://gerrit.wikimedia.org/r/1193362

brouberol changed the task status from Open to In Progress.Oct 3 2025, 8:08 AM

Change #1193362 merged by Brouberol:

[operations/puppet@production] wdqs: add RKD schema.org Knowledge Graph to allow-list

https://gerrit.wikimedia.org/r/1193362

I think based off making a request with browser tools -> network the correct URL is going to be https://rkd.triply.cc/_api/datasets/rkd/RKD-SDO-Knowledge-Graph/sparql

Change #1193942 had a related patch set uploaded (by Ryan Kemper; author: Ryan Kemper):

[operations/puppet@production] wdqs: Fix 3 federation endpoint URLs

https://gerrit.wikimedia.org/r/1193942

Change #1193942 merged by Ryan Kemper:

[operations/puppet@production] wdqs: Fix 3 federation endpoint URLs

https://gerrit.wikimedia.org/r/1193942

I think based off making a request with browser tools -> network the correct URL is going to be https://rkd.triply.cc/_api/datasets/rkd/RKD-SDO-Knowledge-Graph/sparql

Hmm, this might not be correct. We have allowlisted this URL but test queries fail with upstream request timeout (see https://query.wikidata.org/#SELECT%20%3Fs%20%3Fp%20%3Fo%20%7B%0A%20%20SERVICE%20%3Chttps%3A%2F%2Frkd.triply.cc%2F_api%2Fdatasets%2Frkd%2FRKD-SDO-Knowledge-Graph%2Fsparql%3E%20%7B%0A%20%20%20%20%3Fs%20%3Fp%20%3Fo%0A%20%20%7D%0A%7D%0ALIMIT%2010)

@dcausse any guesses why this federation isn't working here?

When I make a request through the rkd.triply UI, the request ends up going to https://rkd.triply.cc/_api/datasets/rkd/RKD-SDO-Knowledge-Graph/sparql which is the exact URL we have in the allowlist.

The following curl requests sent from my local machine return correctly:

(GET)
curl -G --data-urlencode query@simple.query -H"Accept:application/sparql-results+xml" https://rkd.triply.cc/_api/datasets/rkd/RKD-SDO-Knowledge-Graph/sparql

(POST)
curl -vvv -k -XPOST -H"Accept:application/sparql-results+xml" --data-urlencode "query=select * where {?s ?p ?o.} limit 5" https://rkd.triply.cc/_api/datasets/rkd/RKD-SDO-Knowledge-Graph/sparql

However the actual federation query from query.wikidata.org doesn't work (https://query.wikidata.org/#SELECT%20%3Fs%20%3Fp%20%3Fo%20%7B%0A%20%20SERVICE%20%3Chttps%3A%2F%2Frkd.triply.cc%2F_api%2Fdatasets%2Frkd%2FRKD-SDO-Knowledge-Graph%2Fsparql%3E%20%7B%0A%20%20%20%20%3Fs%20%3Fp%20%3Fo%0A%20%20%7D%0A%7D%0ALIMIT%2010)

And when I run that same query from a wdqs host directly like so:
ryankemper@wdqs1022:~$ curl -G --data-urlencode query@/tmp/federated_query.sparql -H"Accept:application/sparql-results+xml" localhost:80/sparql we get a 504 gateway timeout; here's the full stack trace: https://phabricator.wikimedia.org/P84132 (I don't think there is PII but out of paranoia I made this paste WMF-NDA visible only, but for those without access the important part is we get 504 gateway timeout).


I'm not quite sure what the best next debugging step is. In the past we had issues with endpoints that didn't accept Accept:application/sparql-results+xml but I think the first two curl tests I listed above (get vs post) rule out that issue, but here's an old comment about that for context: https://phabricator.wikimedia.org/T339347#9151770

cc @bking who I paired with this on today

@dcausse any guesses why this federation isn't working here?

I think this is due to your federated query which is asking for all the triples on the federated endpoint:

SELECT ?s ?p ?o {
  SERVICE <https://rkd.triply.cc/_api/datasets/rkd/RKD-SDO-Knowledge-Graph/sparql> {
    ?s ?p ?o
  }
}
LIMIT 10

is very costly because LIMIT 10 is applied too late.

You should use a query like this (moving the limit inside the federated query):

SELECT * {
  SERVICE <https://rkd.triply.cc/_api/datasets/rkd/RKD-SDO-Knowledge-Graph/sparql> {
    SELECT * {
      ?s ?p ?o
    } LIMIT 10
  }
}