Right now, Updater sends SPARQL UPDATE queries to Blazegraph in order to update items. Instead, we could create a separate service that would receive:
- List of items to update (URIs)
- For each item to update, list of triples associated with the item (as collected by Updater)
The service would then perform the same work as Updater did but without the overhead of creating/parsing SPARQL and sending extra data over the wire. For each item, it would:
- Fetch the triples that currently belong to the item
- Including direct connectors to the item, statement triples, forms/senses (for lexemes) and form statements.
- This also collects all reference and value URIs mentioned by any of the statements.
- Diff the data above against the data received from Updater.
- The triples that is not in incoming set are deleted.
- The triples that is in the incoming set but not in result set are inserted
- The values/references that had been mentioned in the existing data but not in the incoming data are verified to see if anything links to them. If not, they are deleted.
Having such service would allow us to perform faster updates and thus deliver more scalable service.