Page MenuHomePhabricator

Implement the update service
Closed, ResolvedPublic

Description

As a REST endpoint /update, which should perform the following steps:

validate the RDF data model;
update the given dataset to Blazegraph.

Event Timeline

The request below adds data to an existing dataset:
curl --data-urlencode 'update=LOAD <${UPDATE_URL}> INTO GRAPH <${DATASET_URI}> ;' ${HOST}/bigdata/sparql
where ${UPDATE_URL} is the location of the new dataset to be updated and ${DATASET_URI} is the named graph URI of the existing dataset.

HEADS UP: this doesn't update exisiting triples, it just adds.

To update existing triples, we need to decide one of the following DELETE/INSERT operations:

  1. via SPARQL, as per https://www.w3.org/TR/2013/REC-sparql11-update-20130321/#deleteInsert;
  2. via the Blazegraph API, as per https://wiki.blazegraph.com/wiki/index.php/REST_API#UPDATE_.28POST_with_Multi-Part_Request_Body.29 doesn't seem to work as expected
  1. via SPARQL, as per https://www.w3.org/TR/2013/REC-sparql11-update-20130321/#deleteInsert;
  2. via the Blazegraph API, as per https://wiki.blazegraph.com/wiki/index.php/REST_API#UPDATE_.28POST_with_Multi-Part_Request_Body.29

The curl command below works, but doesn't take into account the default graph URI, in contrast to the Blazegraph documentation:
curl -v -F "remove=@${DATASET_TO_REMOVE};type=text/turtle" -F "add=@${DATASET_TO_ADD};type=text/turtle" --form-string "context-uri=<${DATASET_URI}>" ${HOST}/bigdata/sparql?updatePost

The context-uri field seems to be ignored, resulting in triples added to the bd:nullGraph URI.

  1. via SPARQL, as per https://www.w3.org/TR/2013/REC-sparql11-update-20130321/#deleteInsert;

This solution seems to work, but is expensive for the data provider, since it requires to build full SPARQL update queries.
For instance, the following request just updates Chuck Berry's image URL in the http://chuck-berry dataset graph:

curl --data-urlencode "update=WITH <http://chuck-berry> delete { ?s ps:P18 <http://commons.wikimedia.org/wiki/Special:FilePath/Chuck-berry-2007-07-18.jpg> } INSERT { ?s ps:P18 <https://commons.wikimedia.org/wiki/Special:FilePath/Chuck_Berry_1957.jpg> } WHERE { ?s ps:P18 <http://commons.wikimedia.org/wiki/Special:FilePath/Chuck-berry-2007-07-18.jpg> }" ${HOST}/bigdata/sparql

Also, data model validation should be run again

The context-uri field seems to be ignored, resulting in triples added to the bd:nullGraph URI.

Needs investigation in the source code.

To update existing triples, we need to decide one of the following DELETE/INSERT operations:

  1. via SPARQL, as per https://www.w3.org/TR/2013/REC-sparql11-update-20130321/#deleteInsert;
  2. via the Blazegraph API, as per https://wiki.blazegraph.com/wiki/index.php/REST_API#UPDATE_.28POST_with_Multi-Part_Request_Body.29 doesn't seem to work as expected

Also, data model validation should be run again

This fits well into solution 2.
Solution 1 may not be ideal, for 2 reasons:

  1. the data provider is required to know how to run SPARQL update queries, i.e., not trivial;
  2. it is hard to implement the RDF data model validation for triple patterns, i.e., when there are variables in the triples.

The context-uri field seems to be ignored, resulting in triples added to the bd:nullGraph URI.

Needs investigation in the source code.

Spotted the right parameters, i.e.:

Both should be passed as query parameters, not in the body, as one would expect in a POST request.

Here is the final working curl command:

curl -v -F "remove=@${DATASET_TO_REMOVE};type=text/turtle" -F "add=@${DATASET_TO_ADD};type=text/turtle" "${HOST}/bigdata/sparql?updatePost&context-uri-delete=${URL_ENCODED_DATASET_URI}&context-uri-insert=${URL_ENCODED_DATASET_URI}"

Change 368166 had a related patch set uploaded (by Hjfocs; owner: Hjfocs):
[wikidata/query/rdf@master] T170682: Implement the update service

https://gerrit.wikimedia.org/r/368166