Figure out how to detect that claim is updated
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	Smalyshev
	Dec 20 2014, 1:29 AM

Description

I was under the impression that each time the claim is updated, the new ID is generated, so I can just match the IDs to update the claims. However, I have discovered the following: for item Q24517, the old claim in my dump is:

"P279":[{"id":"Q24517$7A855A08-5DD0-41A6-9A36-E3AC3DE24B11","mainsnak":{"snaktype":"value","property":"P279","datatype":"wikibase-item","datavalue":{"value":{"entity-type":"item","numeric-id":2095},"type":"wikibase-entityid"}}

However, in current dump at https://www.wikidata.org/wiki/Special:EntityData/Q24517.json it is:

"P279":[{"id":"Q24517$7A855A08-5DD0-41A6-9A36-E3AC3DE24B11","mainsnak":{"snaktype":"value","property":"P279","datatype":"wikibase-item","datavalue":{"value":{"entity-type":"item","numeric-id":2207288},"type":"wikibase-entityid"}}

As we can see, same claim ID but it refers to a different node. That makes it hard to recognize when the claim must be updated. So I'd like to figure out:

Is it intentional or a bug?
If it's intentional, can it be changed to generate new IDs on change?
If not, what would be the best way to recognize when clam changes?

The items can have many claims, so knowing which ones changed and updating only those would greatly speed up the query service function.

Alternatively, if there's some other format that is better suitable to loading updates, we may want to use that instead of JSON data.

Related Objects
Search...

		Status	Subtype	Assigned	Task
		Resolved		Smalyshev	T76373 Evaluate Titan as graph storage/query engine for Wikidata Query service
		Resolved		Smalyshev	T85045 Figure out how to detect that claim is updated

Event Timeline

Smalyshev created this task.Dec 20 2014, 1:29 AM

Smalyshev claimed this task.

Smalyshev raised the priority of this task from to High.

Smalyshev updated the task description. (Show Details)

Smalyshev added projects: Wikidata, Wikidata-Query-Service.

Smalyshev changed Security from none to None.

Smalyshev added subscribers: Smalyshev, • Manybubbles, • GWicke and 4 others.

Statement IDs are GUIDs (with the Item ID prefixed), and they do not change when the Statement changes (otherwise, they would be hashes, not IDs - References are currently handled by hash). This is intentional and necessarily to be able to discuss Statements as such.

Internally we use hashes for this kind of comparison - I suppose including hashes in the JSON dumps might be nice. But for now, you could just keep a map of GUID -> hash somewhere, and when reading the next dump, re-compute and compare the hash for each statement.

With regard to the Statement's GUID staying stable: there might be some wiggle room here on the data model level: we might change the GUID when the main Snak or Qualifiers (the "claim") change. But adding a Reference shouldn't change the ID. This needs some thought though - the GUID allows a statement to be referenced and discussed across multiple revisions. This is useful for things like tracking which statement violates which soft-constraint, etc.

Probably will have to switch to use content hashes as identifier for change.

We'll use content hash instead of claim ID to detect changes. We'll also use lastrevid on the item to track revisions.

Smalyshev moved this task from Incoming to Done on the Wikidata-Query-Service board.Mar 9 2015, 8:51 PM

Figure out how to detect that claim is updatedClosed, ResolvedPublicActions

Description

Related ObjectsSearch...

Event Timeline

Figure out how to detect that claim is updated
Closed, ResolvedPublic
Actions

Related Objects
Search...