As a REST endpoint /update, which should perform the following steps:
validate the RDF data model;
update the given dataset to Blazegraph.
As a REST endpoint /update, which should perform the following steps:
validate the RDF data model;
update the given dataset to Blazegraph.
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Resolved | Hjfocs | T169044 Implement the Ingestion API | |||
Resolved | Hjfocs | T170682 Implement the update service |
The request below adds data to an existing dataset:
curl --data-urlencode 'update=LOAD <${UPDATE_URL}> INTO GRAPH <${DATASET_URI}> ;' ${HOST}/bigdata/sparql
where ${UPDATE_URL} is the location of the new dataset to be updated and ${DATASET_URI} is the named graph URI of the existing dataset.
HEADS UP: this doesn't update exisiting triples, it just adds.
To update existing triples, we need to decide one of the following DELETE/INSERT operations:
- via SPARQL, as per https://www.w3.org/TR/2013/REC-sparql11-update-20130321/#deleteInsert;
- via the Blazegraph API, as per https://wiki.blazegraph.com/wiki/index.php/REST_API#UPDATE_.28POST_with_Multi-Part_Request_Body.29
The curl command below works, but doesn't take into account the default graph URI, in contrast to the Blazegraph documentation:
curl -v -F "remove=@${DATASET_TO_REMOVE};type=text/turtle" -F "add=@${DATASET_TO_ADD};type=text/turtle" --form-string "context-uri=<${DATASET_URI}>" ${HOST}/bigdata/sparql?updatePost
The context-uri field seems to be ignored, resulting in triples added to the bd:nullGraph URI.
- via SPARQL, as per https://www.w3.org/TR/2013/REC-sparql11-update-20130321/#deleteInsert;
This solution seems to work, but is expensive for the data provider, since it requires to build full SPARQL update queries.
For instance, the following request just updates Chuck Berry's image URL in the http://chuck-berry dataset graph:
curl --data-urlencode "update=WITH <http://chuck-berry> delete { ?s ps:P18 <http://commons.wikimedia.org/wiki/Special:FilePath/Chuck-berry-2007-07-18.jpg> } INSERT { ?s ps:P18 <https://commons.wikimedia.org/wiki/Special:FilePath/Chuck_Berry_1957.jpg> } WHERE { ?s ps:P18 <http://commons.wikimedia.org/wiki/Special:FilePath/Chuck-berry-2007-07-18.jpg> }" ${HOST}/bigdata/sparql
The context-uri field seems to be ignored, resulting in triples added to the bd:nullGraph URI.
Needs investigation in the source code.
This fits well into solution 2.
Solution 1 may not be ideal, for 2 reasons:
Spotted the right parameters, i.e.:
Both should be passed as query parameters, not in the body, as one would expect in a POST request.
Here is the final working curl command:
curl -v -F "remove=@${DATASET_TO_REMOVE};type=text/turtle" -F "add=@${DATASET_TO_ADD};type=text/turtle" "${HOST}/bigdata/sparql?updatePost&context-uri-delete=${URL_ENCODED_DATASET_URI}&context-uri-insert=${URL_ENCODED_DATASET_URI}"
Change 368166 had a related patch set uploaded (by Hjfocs; owner: Hjfocs):
[wikidata/query/rdf@master] T170682: Implement the update service