Page MenuHomePhabricator

WDQS: Implement multi-point centroid aggregation function
Open, LowPublic

Description

Given a set of points, it is frequently needed to calculate their "average" point - a weighted center of sorts. Simply creating a new point with AVG(longitude(point)), AVG(latitude(point)) may work in many places, but it could be grossly incorrect near anti-meridian.

this stackoverflow question provides a simple explanation of how it is done, by transitioning each point's WSG 84 (longitude, latitude) into Cartesian system (x,y,z), averaging on each coordinate, and converting back. If earth is assumed to be a sphere, the calculations are slightly simpler.

I found this example that uses JTS (reference library for all geo calcs) - https://gist.github.com/oschrenk/2787570

Usage

SELECT (geof:pointCentroid( ?location ) as ?center) WHERE {
  ?place wdt:P625 ?location .
   ...
}

Event Timeline

Yurik created this task.Aug 29 2017, 8:53 PM
Restricted Application added projects: Wikidata, Discovery. · View Herald TranscriptAug 29 2017, 8:53 PM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript
Yurik updated the task description. (Show Details)Aug 29 2017, 8:56 PM
Yurik updated the task description. (Show Details)Aug 29 2017, 9:18 PM

Hmm, I am not sure whether it is possible to define custom aggregate functions in Blazegraph. Aggregates do not work the same way as regular functions, and it looks like they are defined on grammar level. If so, defining custom aggregate may not be possible.

@Smalyshev would it be possible to at least provide a few "helper" functions like convert(point) -> ?x, ?y, ?z and the reverse? This way the user could bind x,y,z variables, do the average aggregation, and use the reverse x,y,z -> point.

Not sure what is meant by x, y and z. You already have geof:latitude and geof:longitude - is it not enough?

Yurik added a comment.EditedAug 30 2017, 8:56 PM

@Smalyshev those functions only get the components in WGS 84 -- latitude and longitude. In order to calculate the center of a set of points, I would need to convert each point to X,Y,Z - geocentric Cartesian coordinates. Afterwards, I would need to average each one, and convert the resulting avgX, avgY, and avgZ back to WSG 84's latitude & longitude.

The whole process is somewhat awkward and error prone - that's why I would prefer to use the standard JTS library for the centroid calculation, or at least for WSG84->Cartesian and Cartesian->WSG84. Doing it by hand in SPARQL seems like a recipe for mistakes.

Smalyshev triaged this task as Low priority.Aug 30 2017, 9:09 PM

We don't have any module right now that does such complex geodesical calculations. We could add it but I am not sure whether it is worth developing it and what would be the extent to which we should go. I'd like to have more consistent story behind it, just implementing one function doesn't sound like a good idea, unless this function is special. In which case I'd like to hear more why.

Yurik added a comment.EditedAug 30 2017, 9:37 PM

@Smalyshev I agree we shouldn't implement this without a story.

The overall goal is a basic geo support. Since we only support points, we need to provide some fundamental utilities for the geo point handling. The basics relate to individual points (e.g. distance), and aggregates (e.g. calculate centroid of a set of points, calculate bounding box that includes all points). The basic bounding box support should offer a union (minimum bbox that includes a set of bboxes), intersect (maximum bbox that is a subset of each given bbox), and centroid (center of a single bbox). IIF we ever support shapes, the list of functions would greatly increase - but I don't think we are ready for that just yet.

Aggregates allow users to extract meaning from multiple points. Without that, each point remains a string without much meaning beyond distance from X.

Lydia_Pintscher moved this task from incoming to monitoring on the Wikidata board.Mar 5 2018, 4:15 PM