Page MenuHomePhabricator

Create dedicated Updater service in Blazegraph
Open, MediumPublic

Description

Right now, Updater sends SPARQL UPDATE queries to Blazegraph in order to update items. Instead, we could create a separate service that would receive:

  1. List of items to update (URIs)
  2. For each item to update, list of triples associated with the item (as collected by Updater)

The service would then perform the same work as Updater did but without the overhead of creating/parsing SPARQL and sending extra data over the wire. For each item, it would:

  1. Fetch the triples that currently belong to the item
    1. Including direct connectors to the item, statement triples, forms/senses (for lexemes) and form statements.
    2. This also collects all reference and value URIs mentioned by any of the statements.
  2. Diff the data above against the data received from Updater.
  3. The triples that is not in incoming set are deleted.
  4. The triples that is in the incoming set but not in result set are inserted
  5. The values/references that had been mentioned in the existing data but not in the incoming data are verified to see if anything links to them. If not, they are deleted.

Having such service would allow us to perform faster updates and thus deliver more scalable service.

Details

Related Gerrit Patches:
wikidata/query/rdf : masterMerging updater

Event Timeline

Smalyshev triaged this task as Medium priority.Jan 3 2019, 12:36 AM
Smalyshev created this task.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJan 3 2019, 12:36 AM
Smalyshev updated the task description. (Show Details)Jan 3 2019, 12:38 AM
Addshore moved this task from incoming to monitoring on the Wikidata board.Jan 3 2019, 9:27 AM
Abbe98 added a subscriber: Abbe98.Jun 7 2019, 7:46 PM

Change 518760 had a related patch set uploaded (by Smalyshev; owner: Igor Kim):
[wikidata/query/rdf@master] Merging updater (work in progress)

https://gerrit.wikimedia.org/r/518760

Testing on wdqs-test shows new Updater is 2x faster than old one. Didn't verify validity yet but speed looks good :)

Change 518760 merged by jenkins-bot:
[wikidata/query/rdf@master] Merging updater

https://gerrit.wikimedia.org/r/518760

Mentioned in SAL (#wikimedia-operations) [2019-11-19T16:15:16Z] <gehel> reloading data from wdqs1007 to wdqs1004 - after failed test of merging updater - T212826

Mentioned in SAL (#wikimedia-operations) [2019-11-19T20:17:46Z] <gehel> completed reloading data from wdqs1007 to wdqs1004 - after failed test of merging updater - T212826