Mar 11 2015
Feb 27 2015
Feb 20 2015
Nik tells me that the HA features in Virtuoso are only available in the closed source enterprise version. That basically means WMF is not going to use it in production.
Feb 19 2015
We are, indeed, playing with the idea of a SPARQL endpoint now...
Now my reply was so long that the ticket has already been closed in the meantime :-D Anyway, those are my two (or more) cents on this topic ;-) I don't think the paper goes into these topics very much (as they are not so much technical as philosophical).
Thanks for adding Denny. Long reply, but details matter here.
Our primary goal is to encode the JSON information in RDF, and possibly to enrich this information where it makes sense in an RDF-context (e.g., by adding links to other datasets). The JSON data includes the entity type, so it is clear that we want to encode it in RDF in some way. As I said, my understanding is that Q42 *is* an item for a suitable sense of "item", just as P31 is a "property" in this sense. In neither case are we referring to the HTML page or any other electronic document. The confusion arises from your preconception of the item class referring to a document or "description", which in turn is understandable given our lack of up-to-date documentation for this vocabulary.
The RDF should certainly contain information about the entity type of exported data. This is essential to ensure that the RDF data contains all the information that is found in the JSON (other than the ordering). As I read it, things that are of rdf:type Item are things that are described by on item on Wikidata. If this is not obvious to anybody who uses the data (maybe somebody really thinks that Washington himself is an item?!), we can always emphasize this in the documentation of the Item class. I therefore suggest to close this issue as invalid. It's just a matter of how we document our ontology. In particular, it should not be assumed that any triple in RDF has a self-evident ground truth associated with it that one can grasp just by reading the URIs (or their labels), though I think confusion is very unlikely here since we do not export any RDF data about item documents.
Feb 16 2015
I think json should be in the path somewhere. It does not have to be at the top-level, but it would not be good if dump files of one type end up in their own directory. The only way for tools to detect and download dumps automatically is to look at the HTML directory listings, and this listing should not change its appearance (again). Note that different types of dumps will be created in different intervals, so a combined directory that contains several types of dumps would look quite messy in the end.
Feb 14 2015
No, I don't think I will work on this anytime soon. Thanks for cleaning up :-)
Jan 12 2015
I don't know about the details of the "import" task discussed here, but for the record: we are happy to support this use of WDTK by helping to update our implementation where necessary.
This is not correct, original structure can be recovered
@JanZerebecki I understand what you are saying about what "indexing" means here. Makes sense to me. What you are saying about my example query sounds as if you are planning to implement query execution manually. I hope this is not the case and you can just give the query to Titan to get it answered for you.
@Smalyshev My point is merely that sitelinks and labels can be handled like statements. Since statements must be supported anyway, it would be sensible to reuse the data structures and query expressions defined for them. I don't think that confusion is likely, since the query language will not use the colloquial names as my examples. Properties of Wikidata will always be referred to by their Pid, whereas something like "has badge" would not have an id of this form. So it's not like having a reserved label "has badge" that competes with Wikidata property labels.
Jan 11 2015
@Smalyshev My suggestion was just about the surface appearance, not about the inner workings. I am saying that the following two phrases have the same structure:
On another note, it would be good if the view on all data that is in the system is somewhat uniform. We don't want special query syntax for badges etc. This could be avoided by viewing everything as statements, possibly using some "special" properties (and qualifiers). For example, a label can be structurally represented like a statement for a property of type monolingual text. A sitelink could be represented as a statement with main property pointing to the article title and qualifiers defining the site and badges. Doing this does not change the data, but it would unify the query syntax.
I would like to turn it around. We should support indexing everything:
The fact that we're not creative enough to make up queries for everything doesn't mean it isn't useful.