Page MenuHomePhabricator

Wikidata wdtn:P214 values (VIAF) seem to be corrupt
Closed, ResolvedPublicBUG REPORT

Description

Wikidata wdtn:P214 values for VIAF, as reported through WDQS, appear corrupt. See, for instance, https://w.wiki/48Y

Issue surfaced at https://www.wikidata.org/w/index.php?title=Wikidata:Project_chat&oldid=945124862#Strange_wdtn:_forms_for_some_VIAFs_?

To take one example: the wdt:P214 value for https://www.wikidata.org/wiki/Q59362643 is 12148449524915690527, but WDQS reports the wdtn:P214 value as https://viaf.org/viaf/31

Steps to Reproduce:

  • Query wdtn:P214 values compared to wdt:P214 values

Expected results

Actual Results:

Details

Related Gerrit Patches:

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMay 21 2019, 12:50 AM

Assuming this is about Wikidata, hence adding project tag so someone can find this task.

Tagishsimon added a subscriber: matej_suchanek.EditedMay 21 2019, 9:17 AM

Seems more likely it's a data input issue than a query issue @matej_suchanek, given that WDQS doesn't exhibit the same sort of error for analogous wdtn:, such as wdtn:P244 (Library of Congress authority ID) - see for instance https://w.wiki/48X

Jheald added a subscriber: Jheald.May 21 2019, 9:58 AM

@Tagishsimon the WDQS tag is appropriate though, as it includes the pipeline for getting statements into the WDQS triplestore, and also any corruption issues happening there. Thanks for creating the ticket! #

The LoC data does indeed seem to be clean - compare https://w.wiki/4BN versus https://w.wiki/4BP, checking 100,000 cases.

new feature?

Smalyshev added a subscriber: Smalyshev.

I think I know what is the problem there... VIAF is stored as prefix+number, so there might be an overflow there. There's code that is supposed to deal with it, but maybe there's a bug in that code.

Smalyshev moved this task from Backlog to Next on the User-Smalyshev board.May 23 2019, 3:47 PM

*wd:Q59362643
*wdtn:P214
*https://viaf.org/viaf/12148449524915690527

The current (above) format of the triples (when it works) seems fine from a Wikidata perspective, but from a LOD perspective, shouldn't the triple just be something like:

*wd:Q59362643
*wikibase:identifier
*https://viaf.org/viaf/12148449524915690527

Smalyshev triaged this task as Medium priority.May 26 2019, 12:25 AM

@Esc3300 adding more triples is possible, but should be discussed in a separate task.

I don't think the wdtn triples are needed. I'd just add the "wikibase:identifier" ones.

Smalyshev added a subscriber: Igorkim78.

This looks like Blazegraph URI handler bug: when the number fits unsigned int but not signed int, InlineUnsignedIntegerURIHandler is erroneously storing it as small byte value, due to this:

		if (value < 256L) {
			return new XSDUnsignedByteIV((byte) (value + Byte.MIN_VALUE));
		}

For 12148449524915690527 the signed long representation is less than zero (-6298294548793861089), thus the bug happens.

Change 513244 had a related patch set uploaded (by Smalyshev; owner: Smalyshev):
[wikidata/query/blazegraph@master] Fix handling of numbers that convert to negative longs

https://gerrit.wikimedia.org/r/513244

Change 513244 merged by Smalyshev:
[wikidata/query/blazegraph@master] Fix handling of numbers that convert to negative longs

https://gerrit.wikimedia.org/r/513244

Smalyshev claimed this task.Jun 3 2019, 7:21 PM
Smalyshev moved this task from Next to Done on the User-Smalyshev board.
Smalyshev closed this task as Resolved.Jun 3 2019, 7:47 PM

Should be fixed now.

Jheald added a comment.Jun 4 2019, 8:17 AM

Hi @Smalyshev. You've closed this as resolved, but a query like https://w.wiki/4bn is still returning corrupt data.
How long should it take for the old corrupt values to disappear?

Doh, my fault, I forgot 0 is also a number. Will fix.

@Jheald now they should all be fine.