Page MenuHomePhabricator

Add flavor=basicdump
Closed, DeclinedPublic

Description

Hello

I am testing the entities' dumps via Special:EntityData.
And I have two big problems.

First problem, I saw the concepts of GeoSPARQL in the dumps of entities. It's fun but almost no databases support GeoSparql and sometimes, this causes bugs.
Example RDF with Geosparql (POINT) : http://www.wikidata.org/wiki/Special:EntityData/Q1000336.ttl?flavor=dump

So... I tested BlazeGraph in the aim to support GeoSparql. For the moment, I can not deploy it in production (again a lot problems). It's my second problem : "I can not use your experimental database"

Conclusion : I can not use RDF of Wikidata for the moment because I can not import your dump in my old triplestores and I can not use Blazegraph for the moment.

Question : can you add an export without experimental tools/types ?
For example : "ttl" of flavor "basicdump" : Special:EntityData/QXXXX.ttl?flavor=basicdump

I think the perfect RDF is the enemy of good RDF database. A basic dump without GeoSparql is already a very good work and it will be compatible with the majority of RDF databases. When it will be ok, I will open quickly a SPARQL endpoint of Wikidata.

Thanks
Karima

Event Timeline

Karima assigned this task to Smalyshev.
Karima raised the priority of this task from to Medium.
Karima updated the task description. (Show Details)

Please do not overuse/misuse the Epic tag.

Do you have a link to the patch for review?

Bugreporter subscribed.

Clean up the task using "Create Subtask" function.

Sorry, I'm a newbie
@Ciencia_Al_Poder what patch ?
@Bugreporter You want I create a subtask of this task with the title "Clean up the task" ?

Ok, I think I now see what happened here:

So basically you created this task with the "create subtask" but you didn't remove any of the CC or projects of the original task

Sorry, I'm a newbie
@Ciencia_Al_Poder what patch ?

The task had the Patch-For-Review tag, whis should be included only when there's a patch for review. It doesn't apply here and it was copied over from the original task.

@Bugreporter You want I create a subtask of this task with the title "Clean up the task" ?

No. Bugreporter said he was "cleaning up the task", which is now done :)

The dump of https://www.wikidata.org/wiki/Special:EntityData/Q1000336.ttl?flavor=dump looks fine and the database should accept any triples of any type, even if no special handling for them is available. I'm not sure what the point of changing the types or removing GeoSPARQL would be. Which triple store exactly can not handle these types and what exactly is the problem there?

the database should accept any triples of any type

@Smalyshev
Yes, the database should accept any triples in theory but for example, GeoSparql can create bugs because the implementation is not finished. No database has finished the implementation of geosparql (or maybe only Oracle).
Example with Virtuoso :
https://github.com/openlink/virtuoso-opensource/issues/195

I searched unit tests of GeoSparql for my benchmark of databases ( http://sparqlscore.com/ ). I didn't found.

Geosparql is a new tool and the implementation is not finished. The SPARQL endpoints of Wikidata have need stability and robustness. I think.

@Karima what you referred to is an example of an edge case in a different format, so I'm not sure how it applies to our case. If there are specific problems in the RDF data as we produce it - such as incompatibility with any of the major and commonly used triple store - I would be glad to hear about it and look for solutions. If not, then I'm not sure what the problem is.

@Smalyshev
I can test incompatibility of types with the triple stores on the market, if you want help all the editors to be compatible with Geosparql. But a quick solution is to create a filter of your side and hide these exotic triples for basic developer (like me) so I will deploy easily an core Wikidata and share compatible queries with the community.

If wikidata can not be give a core of data (with basic/stable types), each RDF database of Wikidata will have different behaviours, ie the same query SPARQL will not give the same result in function of the endpoint. For an enduser, it will be incomprehensible. We have to have the same results in the mirrors of Wikidata.

I can develop a filter for removing these exotic data but If I do that... the developers like me will do the same thing but with another manner (rename the type or remove triples with geoPOINT and add 2 triples latitude/longitude, etc) and we will have, in one year, 100 different versions of "mirror" Wikidata in function of the version of databases. I think, it's not the idea of Wikidata.

For me, it's not a problem of triplestore. It's a problem of interoperability of Wikidata. We need in the future a process before to impose a new datatype.
Today it's geosparql and tomorrow, it will be "coolgeosparql" (And I will have to buy the last version of the last cool database... or wait two years, the patch in a free robust version for reusing my data in Wikidata). There is a problem...

So, do I have to develop my own version of Wikidata ? or Wikidata can propose an interoperable subversion for avoid this inevitable disorder...

I'm sorry, but I still not sure solution to which problem you are seeking. GeoSPARQL is a standard to represent coordinate data. If you don't like it, you can use the component items as described in https://www.mediawiki.org/wiki/Wikibase/Indexing/RDF_Dump_Format#Globecoordinate. Besides that, I'm not sure what exactly you're looking for.

GeoSPARQL is a standard of OGC and not of W3C. Wikidata needs to do the difference between the lobbies and the recommendation of Web. I read in the documentation of this standard of OGC.
http://www.opengeospatial.org/standards/geosparql

Attention is drawn to the possibility that some of the elements of this document may be the subject of patent rights. Open Geospatial Consortium shall not be held responsible for identifying any or all such patent rights.

So... the editors don't implement this standard because there are probably patents behind. If the patents are not free, Geosparql will be never a real recommandation of W3C.

Thanks for your example, I didn't see the triples of Globecoordinate without geosparql. It's enough for me and my students.
https://www.mediawiki.org/wiki/Wikibase/Indexing/RDF_Dump_Format#Globecoordinate

However, I can not load your ttl without remove the triples in relation with geosparql :-/ My cheap triple store don't like the patents probably.
So do I have to implement a filter of my side or your propose a basic RDF for cheap triple store ?

Smalyshev unsubscribed.
Smalyshev subscribed.

@Karima I can not advise about how to implement your software but really don't see how I can help here since I still don't understand what the problem is.

@Smalyshev
My problem : I cannot load the ttl of wikidata with a request LOAD when there are triples about GeoSparql :

LOAD <http://www.wikidata.org/wiki/Special:EntityData/Q1000336.ttl?flavor=dump> INTO GRAPH <http://www.wikidata.org/entity/Q1000336>

===> ERROR : database don't support WKT string with the coordinates with type geo:wktLiteral

NB : I tested the 5 examples in the doc of Geosparql on 4 different triple stores... no database supports it for the moment.
http://sparqlscore.com/

@Karima which database you are using? Blazegraph certainly can load the data just fine, and I don't see why any triple store would not be able to load any arbitrary typed literal. You don't need full GeoSPARQL support for that - in fact, you don't need any GeoSPARQL support for it - you just need to be able to load arbitrary typed literals.

There is indeed a bit of a problem specifically with Virtuoso 7.1.0 as it seems to be a bit quirky in loading geo data (see e.g. https://github.com/openlink/virtuoso-opensource/issues/360), I'll look into it. It may be a bug that was recently fixed in 7.2.0, see: http://virtuoso.openlinksw.com/dataspace/doc/dav/wiki/Main/VOSNews#GEO+functions

Hello
I developed a filter of my side for removing the problems in the ttl.

Thanks
Bye
karima