User Details
- User Since
- Sep 12 2015, 6:46 PM (543 w, 14 h)
- Availability
- Available
- LDAP User
- Unknown
- MediaWiki User
- Sebotic [ Global Accounts ]
Jan 13 2017
@thiemowmde agreed, it would bring down the error rate for this specific identifier/string. But currently, that's the only one we have a size distribution for. I think, Lydia intended to solve this issue here for every property of data type string. So, if for technical reasons (MySQL index field length) it should be limited to 768 for now, this would also be fine for chemistry for now, but how about other properties?
Jan 12 2017
I calculated these numbers above, they are solely valid for the chemical structure property InChI (P234), based on ~68 million InChI values in the largest public chemistry database PubChem (also valid for other chemical structure properties like canonical and isomeric SMILES). For any other data of Wikidata datatype string/text, I cannot provide numbers, as I lack the distribution of string lengths relevant to other data which should be represented as strings in Wikidata. And as you can see from the distribution above, increasing the limit would only influence representation of the top ~1% of total chemical structure data.
Jan 6 2017
Thanks Lydia!
Oct 25 2016
Jul 14 2016
thanks, here are the headers for r1 and r2, respectively:
I have a quick follow up for this. I made 2 slightly differing sparql queries one accessing values directly and one inderectly. They should give the same return values, but it seems that if each query is executed on a different server, the 2 result sets differ, one gives back 54320 values, the other 54315. Irrespective of the counts, some values differ. Seem my code here: https://gist.github.com/sebotic/a92f9291175f4968ce265ffe31e0e9c2
Sep 15 2015
@Smalyshev I just tested the query once again. Some of the old data is gone now, but one still comes up. It is this item: 'http://www.wikidata.org/entity/Q402633 I currently do not have other queries to execute, but I will think of some.