Page MenuHomePhabricator

Implement the update service
Closed, ResolvedPublic

Description

As a REST endpoint /update, which should perform the following steps:

validate the RDF data model;
update the given dataset to Blazegraph.

Event Timeline

Hjfocs created this task.Jul 14 2017, 2:40 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJul 14 2017, 2:40 PM

The request below adds data to an existing dataset:
curl --data-urlencode 'update=LOAD <${UPDATE_URL}> INTO GRAPH <${DATASET_URI}> ;' ${HOST}/bigdata/sparql
where ${UPDATE_URL} is the location of the new dataset to be updated and ${DATASET_URI} is the named graph URI of the existing dataset.

HEADS UP: this doesn't update exisiting triples, it just adds.

Hjfocs added a comment.EditedJul 17 2017, 4:24 PM

To update existing triples, we need to decide one of the following DELETE/INSERT operations:

  1. via SPARQL, as per https://www.w3.org/TR/2013/REC-sparql11-update-20130321/#deleteInsert;
  2. via the Blazegraph API, as per https://wiki.blazegraph.com/wiki/index.php/REST_API#UPDATE_.28POST_with_Multi-Part_Request_Body.29 doesn't seem to work as expected
Hjfocs added a comment.EditedJul 17 2017, 5:09 PM
  1. via SPARQL, as per https://www.w3.org/TR/2013/REC-sparql11-update-20130321/#deleteInsert;
  2. via the Blazegraph API, as per https://wiki.blazegraph.com/wiki/index.php/REST_API#UPDATE_.28POST_with_Multi-Part_Request_Body.29

The curl command below works, but doesn't take into account the default graph URI, in contrast to the Blazegraph documentation:
curl -v -F "remove=@${DATASET_TO_REMOVE};type=text/turtle" -F "add=@${DATASET_TO_ADD};type=text/turtle" --form-string "context-uri=<${DATASET_URI}>" ${HOST}/bigdata/sparql?updatePost

The context-uri field seems to be ignored, resulting in triples added to the bd:nullGraph URI.

  1. via SPARQL, as per https://www.w3.org/TR/2013/REC-sparql11-update-20130321/#deleteInsert;

This solution seems to work, but is expensive for the data provider, since it requires to build full SPARQL update queries.
For instance, the following request just updates Chuck Berry's image URL in the http://chuck-berry dataset graph:

curl --data-urlencode "update=WITH <http://chuck-berry> delete { ?s ps:P18 <http://commons.wikimedia.org/wiki/Special:FilePath/Chuck-berry-2007-07-18.jpg> } INSERT { ?s ps:P18 <https://commons.wikimedia.org/wiki/Special:FilePath/Chuck_Berry_1957.jpg> } WHERE { ?s ps:P18 <http://commons.wikimedia.org/wiki/Special:FilePath/Chuck-berry-2007-07-18.jpg> }" ${HOST}/bigdata/sparql

Also, data model validation should be run again

The context-uri field seems to be ignored, resulting in triples added to the bd:nullGraph URI.

Needs investigation in the source code.

To update existing triples, we need to decide one of the following DELETE/INSERT operations:

  1. via SPARQL, as per https://www.w3.org/TR/2013/REC-sparql11-update-20130321/#deleteInsert;
  2. via the Blazegraph API, as per https://wiki.blazegraph.com/wiki/index.php/REST_API#UPDATE_.28POST_with_Multi-Part_Request_Body.29 doesn't seem to work as expected

Also, data model validation should be run again

This fits well into solution 2.
Solution 1 may not be ideal, for 2 reasons:

  1. the data provider is required to know how to run SPARQL update queries, i.e., not trivial;
  2. it is hard to implement the RDF data model validation for triple patterns, i.e., when there are variables in the triples.

The context-uri field seems to be ignored, resulting in triples added to the bd:nullGraph URI.

Needs investigation in the source code.

Spotted the right parameters, i.e.:

Both should be passed as query parameters, not in the body, as one would expect in a POST request.

Here is the final working curl command:

curl -v -F "remove=@${DATASET_TO_REMOVE};type=text/turtle" -F "add=@${DATASET_TO_ADD};type=text/turtle" "${HOST}/bigdata/sparql?updatePost&context-uri-delete=${URL_ENCODED_DATASET_URI}&context-uri-insert=${URL_ENCODED_DATASET_URI}"
Hjfocs moved this task from Doing to Done on the Wikidata-primary-sources board.Jul 26 2017, 3:14 PM

Change 368166 had a related patch set uploaded (by Hjfocs; owner: Hjfocs):
[wikidata/query/rdf@master] T170682: Implement the update service

https://gerrit.wikimedia.org/r/368166

Hjfocs closed this task as Resolved.May 31 2018, 9:02 AM