Create dedicated Updater service in Blazegraph
Closed, DeclinedPublic
Actions

Assigned To

Authored By

	Smalyshev
	Jan 3 2019, 12:36 AM

Description

Right now, Updater sends SPARQL UPDATE queries to Blazegraph in order to update items. Instead, we could create a separate service that would receive:

List of items to update (URIs)
For each item to update, list of triples associated with the item (as collected by Updater)

The service would then perform the same work as Updater did but without the overhead of creating/parsing SPARQL and sending extra data over the wire. For each item, it would:

Fetch the triples that currently belong to the item
1. Including direct connectors to the item, statement triples, forms/senses (for lexemes) and form statements.
2. This also collects all reference and value URIs mentioned by any of the statements.
Diff the data above against the data received from Updater.
The triples that is not in incoming set are deleted.
The triples that is in the incoming set but not in result set are inserted
The values/references that had been mentioned in the existing data but not in the incoming data are verified to see if anything links to them. If not, they are deleted.

Having such service would allow us to perform faster updates and thus deliver more scalable service.

Details

	Subject	Repo	Branch	Lines +/-
	Merging updater	wikidata/query/rdf	master	+1 K -64

Customize query in gerrit

Related Objects
Search...

Status	Assigned	Task
Invalid	None	T209201 WDQS server/updater performance issues
Resolved	Igorkim78	T235759 [TRACKING] WDQS / Blazegraph optimization / bug fixes
Declined	Igorkim78	T212826 Create dedicated Updater service in Blazegraph
Declined	Igorkim78	T231411 Test new Updater service
Resolved	Gehel	T238557 Allow for logging recently updated entities
Declined	Igorkim78	T238555 Create endpoint to extract low level data for a list of entity IDs.

Event Timeline

Smalyshev triaged this task as Medium priority.Jan 3 2019, 12:36 AM

Smalyshev created this task.

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJan 3 2019, 12:36 AM

Smalyshev updated the task description. (Show Details)Jan 3 2019, 12:38 AM

Addshore moved this task from incoming to monitoring on the Wikidata board.Jan 3 2019, 9:27 AM

Smalyshev added a project: Epic.Mar 12 2019, 11:27 PM

Pintoch awarded a token.Apr 25 2019, 1:40 PM

Daniel_Mietchen subscribed.Apr 27 2019, 8:45 PM

Fnielsen subscribed.Apr 29 2019, 3:19 PM

EgonWillighagen awarded a token.Apr 29 2019, 3:35 PM

EgonWillighagen subscribed.

Abbe98 subscribed.Jun 7 2019, 7:46 PM

Smalyshev moved this task from Incoming to Current work on the Wikidata-Query-Service board.Jul 30 2019, 11:34 PM

Smalyshev added a project: Discovery-Wikidata-Query-Service-Sprint.

Smalyshev moved this task from Backlog to In progress on the Discovery-Wikidata-Query-Service-Sprint board.

Smalyshev assigned this task to Igorkim78.Aug 20 2019, 5:36 PM

Change 518760 had a related patch set uploaded (by Smalyshev; owner: Igor Kim):
[wikidata/query/rdf@master] Merging updater (work in progress)

https://gerrit.wikimedia.org/r/518760

gerritbot added a project: Patch-For-Review.Aug 27 2019, 5:32 PM

Testing on wdqs-test shows new Updater is 2x faster than old one. Didn't verify validity yet but speed looks good :)

Smalyshev added a subscriber: Gehel.Aug 28 2019, 10:32 PM

Fnielsen mentioned this in T231543: Preview shows that I'd remove other subscribers on a ticket.Aug 29 2019, 11:37 AM

Gehel added a parent task: T235759: [TRACKING] WDQS / Blazegraph optimization / bug fixes.Oct 17 2019, 12:42 PM

Gehel edited projects, added Discovery-Search (Current work); removed Discovery-Wikidata-Query-Service-Sprint.Nov 12 2019, 1:55 PM

Gehel mentioned this in T238229: WDQS is having high update lag for the last week.Nov 14 2019, 9:34 AM

Change 518760 merged by jenkins-bot:
[wikidata/query/rdf@master] Merging updater

https://gerrit.wikimedia.org/r/518760

Maintenance_bot removed a project: Patch-For-Review.Nov 18 2019, 3:10 PM

Ghuron awarded a token.Nov 19 2019, 3:58 AM

Mentioned in SAL (#wikimedia-operations) [2019-11-19T16:15:16Z] <gehel> reloading data from wdqs1007 to wdqs1004 - after failed test of merging updater - T212826

Mentioned in SAL (#wikimedia-operations) [2019-11-19T20:17:46Z] <gehel> completed reloading data from wdqs1007 to wdqs1004 - after failed test of merging updater - T212826