Page MenuHomePhabricator

Multiple overlapping OSM area relations with the same QID are imported as a single polygon with unwanted hole(s)
Open, Needs TriagePublicBUG REPORT

Description

Event Timeline

Currently multiple relations with the same QID exist:
https://www.openstreetmap.org/relation/51529 (sea included)
https://www.openstreetmap.org/relation/62775 (only the land area, type=land_area)
It looks like geoshapes treats the latter smaller of these two areas as an inner area (hole).

This issue probably surfaced after Wikimedia switched from osm2pgsql to imposm in last year as osm2pgsql doesn't support land_area relations. In imposm docs it says that by default relations of types [multipolygon, boundary, land_area] are used to build multi-polygons for tables of type polygon. If I understand correctly, per this config we currently query polygons from wikidata_relation_polygon table. In our imposm mapping relation_types aren't specified for this table and so I assume it includes land_area polygons by default.

I think we don't need land_area relations and we could just adjust the mapping to exclude these from the relevant polygon table again. Though, this would still leave us with the example of Norway mentioned in T305312 (one relation with Svalbard and another relation without Svalbard, neither relation of type land_area).

I think it's really important that someone starts writing definitions on WHAT all the various endpoints are supposed to be returning. There seems to be a mis match of expectations every single week, both internal and external and this is creating a lot of back and forth. Without clear definitions, issues like these are inevitable.

Currently multiple relations with the same QID exist:
https://www.openstreetmap.org/relation/51529 (sea included)
https://www.openstreetmap.org/relation/62775 (only the land area, type=land_area)
It looks like geoshapes treats the latter smaller of these two areas as an inner area (hole).

I have not looked at the code so I may be wrong, but this looks to me as a case where XOR is used to fill in the area. We have quite a few casas in Norway where municipalities were changed in real life, Wikidata items not replaced, only changed, and on OSM both the old and new border tagged with the same qid.

"Land area" will not solve these cases. I thus think this is not a case that can be solved just by tweeking the import, but where the rendering algorithm should be investigated.

As I said, I may be wrong as I have not made any attempt to investigate the source code myself.

"Land area" will not solve these cases. I thus think this is not a case that can be solved just by tweeking the import, but where the rendering algorithm should be investigated.

Yes, in any case overlapping polygons and/or multiple area relations with the same QID are handled improperly, regardless of land_area. Return to the status quo that excluded land_area relations would just seem to me as a sensible simplification and it would probably solve at least some more prominent cases.

Maybe someone can figure out why a multi-polygon is oddly built using data from two different OSM relations in the first place. Could it be perhaps due to incomplete imposm mapping?

If we manage to build separate (multi-)polygons for separate OSM relations then it remains a question which of these overlapping polygons is returned. I suspect that on the software level there isn't a good way to control this reliably, and at the end the ambiguity needs to be solved in the source data (e.g. in case of old Hammerfest and new Hammerfest figure if the old one perhaps needs separate Wikidata item or whether it could be left without wikidata tag).

This is pure speculation, but one possible explanation for the "XOR" effect is that polygons have an orientation, the sequence of points are either clockwise or counter-clockwise. This article has a good discussion of how this works and how holes can be punched out of polygons. Perhaps we're ending up with a stack of polygons with mixed direction, where the land has the opposite orientation as the sea so is punched out as a negative? This could be confirmed by looking at the points in our local database, after being imported.

If we manage to build separate (multi-)polygons for separate OSM relations then it remains a question which of these overlapping polygons is returned. I suspect that on the software level there isn't a good way to control this reliably, and at the end the ambiguity needs to be solved in the source data (e.g. in case of old Hammerfest and new Hammerfest figure if the old one perhaps needs separate Wikidata item or whether it could be left without wikidata tag).

How the two versions of Hammerfest interacts can be seen in the map in the infobox on https://no.wikipedia.org/wiki/Hammerfest . It is a mapframe, so can be viewed fullscreen interactive too.

Yes the two versions of Hammerfest could have been given different wikidata ids (qid), and ideally then have different tags in Osm. But to rely on changes in the osm database to solve our problem is not a good solution, and would not be even if updates from Osm to our database had been quick. (It is not, there are lots of changes I made in Osm a few months ago not yet seen in Wikipedia)

My reason for that opinion is that it is outside of our control, and even under different copyright licence. (Osm does not allow data from Wikipedia to be used due to copyright, thus I limit my edits significantly there)

On the software level overlapping polygons should calculate the overlap and deal with the intersection. Adding max of the two perhaps, but not by xor as I assume happens now.

I can confirm what was said about "XOR" and "punching holes" above. Here is a little experiment you can do: Open the broken map in https://de.wikipedia.org/wiki/Benutzer:Wulf_Data/Sandbox in fullscreen mode. Inspect the <svg> code. Find the broken path. Find the fill-rule="evenodd" attribute. Set it to nonzero or remove it (nonzero is the default).

Unfortunately, this is not an acceptable solution. We need to punch holes. Think of a lake with an island. We want to color the lake in blue, but not the island. It must be evenodd.

As far as I can tell multiple relations are merged into a single <path>. I'm not sure why this is done. The only reason I can think of is performance optimization. It's a little easier for the browser to render a single big path than many small ones. It's also less data because we don't need to repeat the color attributes and such for each path.

I believe this optimization is exactly what causes the issue we see. When two almost identical paths overlap each and are combined into a single evenodd path, they "XOR" each other and we get holes that can be as big as the entire path. All we see is the outline. You can try this in Inkscape: Draw any shape. Fill it. Duplicate it. Combine the two.

I agree with what @Pikne wrote above: A good first step is to change the code that creates the <svg> to create individual <path> elements for each relation. Saving data can be achieved by grouping the paths in a <g> element and assigning the properties to the group instead of each path. There are also mouse click handlers assigned to the paths. Again, it should be possible to assign these to the group instead.

Note this will not fully solve the issue but cause another one: The transparent fill color (fill-opacity="0.5") will cause duplicate or otherwise overlapping paths to appear much darker. However, I believe this is much better than a hole. The issue is at least visible. We can try to improve it further when we are there.

I tried to find the code responsible for the "merging multiple <path> into one" we talked about above. I couldn't find it. I think it happens very early, close to when the OSM relations are imported into that "postgres-postgis" database table with the name "wikidata_relation_polygon". This import is configured via https://gerrit.wikimedia.org/g/operations/puppet/+/production/modules/osm/files/imposm_mapping.yml#426. The database table does have an index on the "wikidata" column, which is the column that contains the Wikidata Q-id. The SQL query in Kartotherian returns a flat array of coordinates where multiple shapes are already merged.

The merge happens either the moment the table is created or in the PostGIS SQL query. That does some ST_Collect, but that doesn't sound suspicious to me in the documentation. ST_Union on the other hand is exactly what we are looking for. queries.yaml mentions it as "doing the same as ST_Collect".

TL;DR: I think this happens to early and to deep in the stack and we shouldn't try to change it. Not with the remaining resources we have.

... we shouldn't try to change it. Not with the remaining resources we have.

To be clear, it's still a bug that should be fixed, but your team won't work on it, right?

From reading imposm docs I understand it is responsible for building the polygons in the first place. Maybe to more tech-savvy people these docs reveal something useful about this issue?

Correct, we won't pick this up as part of our WMDE-GeoInfo-FocusArea. As said, I don't know if my analysis is correct. If it is this behavior is just a limitation of how the system is currently designed. It looks like there is just no place in the relation tables to store the information if a hole in a shape is intentional or not. This entire part of the system would need to be redesigned.

The only possible workaround I can think of is to go to OSM and carefully reverse the direction of some of the problematic paths, making sure they are either all clockwise or all counter-clockwise. This way they shouldn't create holes when combined.

It looks like there is just no place in the relation tables to store the information if a hole in a shape is intentional or not.

In docs it says "Imposm uses geometry operations to verify if a member of a multipolygon is a hole, or if it is a separate polygon." This generally works fine, as long as data from multiple relations don't get mixed up somehow.

... go to OSM and carefully reverse the direction of some of the problematic paths, making sure they are either all clockwise or all counter-clockwise. This way they shouldn't create holes when combined.

To my knowledge OSM doesn't really require users to follow this clockwise and anti-clockwise approach. Some other systems may. OSM relations can also share members, so that one relation uses a member object as an outer ring and other relation uses the same member object as an inner ring (hole). In OSM only coastline needs to be drawn anti-clockwise, but that's unrelated to relations and holes.

Data scheme in OSM also has inner role to determine which member(s) of a relation represent a hole, but per imposm docs this role is considered less reliable than geometry operations.

I didn't say it's a good workaround. 😋️ I'm sure Imposm can handle this. My current best guess is that the mappings we use can't.

Subscribing. I found this issue on a couple of admin areas shapes that I was working on, while a number of (not great) workarounds include removing the multiple wikidata Q numbers or the P402 entries (including using the deprecate ranking), or create new Q numbers for the multiple shapes, I see some high profile items such as Q90 (Paris) currently affected.

Pikne renamed this task from Some geoshapes are shown inverse to Multiple overlapping OSM area relations with the same QID are imported as a single polygon with unwanted hole(s).Nov 20 2023, 3:54 PM
Pikne added a project: Regression.
Pikne updated the task description. (Show Details)