Page MenuHomePhabricator

BlazeGraph Finalization: Geo
Closed, ResolvedPublic

Description

We'll need filtering and sorting based on distance. We support different globes. Fun times.

Event Timeline

Manybubbles raised the priority of this task from to Medium.
Manybubbles updated the task description. (Show Details)

Perhaps some Frankensteinian creation from Blazegraph + Elasticsearch + GeoSPARQL?

Its in their roadmap. Maybe we can influence some. I'm not sure why but
they seem to have some people already using it but maybe they rigged it up
themselves.

We would be interested in both the elastic search and GeoSPARQL integrations.

Full GeoSPARQL support is complex. It involves not only the spatial index, but extensions to the query planning and there is a large test suite that we would need to pass. A simple way to achieve a spatial index is to use MGRS [1] coding of the coordinates. This turns the coordinates into an outer grid system and an inner z-ordering system. The resulting MGRS codes can be fed into the full text index in blazegraph. You can then do prefix scans on the coordinates against that full text index and it will give you a spatial restriction. You can then add FILTERs that perform additional kinds of spatial filtering.

Peter might be able to comment on what it would take to provide full GeoSPARQL. There is also the opensahara project which has provided GeoSPARQL support. However, to achieve good GeoSPARQL performance where would have to be significant effort to port the integration to run against our internal indices and handle the rewrites in ASTOptimizers. (opensahara uses PostGRES as I recall for the spatial support, so the basic spatial join is via an external SERVICE.) I can forward the relevant threads from our developers mailing list if you like and also reach out to the opensahara group if there is interest in pursuing this approach.

[1] https://en.wikipedia.org/wiki/Military_grid_reference_system

@Thompsonbry.systap I don't think we need full GeoSPARQL at least at the beginning stage, but support for something like distance filtering would already cover the most frequent cases.

It is very easy to write and register custom functions for things like
distance filtering. See [1]. This could be combined with the MGRS approach
and prefix scans on the text index to provide a fairly efficient spatial
distance capability.

[1] http://wiki.bigdata.com/wiki/index.php/CustomFunction

Bryan


Bryan Thompson
Chief Scientist & Founder
SYSTAP, LLC
4501 Tower Road
Greensboro, NC 27410
bryan@systap.com
http://bigdata.com
http://mapgraph.io

CONFIDENTIALITY NOTICE: This email and its contents and attachments are
for the sole use of the intended recipient(s) and are confidential or
proprietary to SYSTAP. Any unauthorized review, use, disclosure,
dissemination or copying of this email or its contents or attachments is
prohibited. If you have received this communication in error, please notify
the sender by reply email and permanently delete all copies of the email
and its contents and attachments.

Manybubbles claimed this task.

I'm resolving this issue. The resolution is a the following conclusions:

  1. There is no builtin support for geo things.
  2. Building geo based filtering based on MGRS-style identifiers and prefix searching as bounding boxes is not trivial but not devastatingly hard either.

and the following tentative plan:

  1. Do not implement GEO at all in the first round.
  2. Implement GEO using MGRS-style search for bounding boxes. Probably using custom functions as described above. This should be simpler than implementing it against Elasticsearch or something similar. For reference, my understanding is that this MGRS-style thing is quite similar to how Lucene/Elasticsearch implements geo things.
  3. Potentially implement a whole GeoSPARQL suite at a later date. This is certainly something we'd look to upstream rather than use as a plugin.

Its not super clear if we'd actually do step 2 at all. Depending on user interest we may skip it.

This should not be a blocker for choosing BlazeGraph.

That has impact on export format. Right now we export as "Point(12.56 34.78)"^^geo:wktLiteral. Please tell if you think this would be incompatible with whatever we may end up doing later.

I have no reason to believe that this would be a problem. The code to translate that into MGRS information could be pushed down into the server when the data are being written onto the text index so it sees a geo:wktLiteral and turns it into the appropriate MGRS coding. Also, the full text analyzers are already pluggable. Something could be done to make this pluggable as well.