Change Details

The geoshapes service currently runs a transformation every time a client requests geoshapes: ``` SELECT id, ST_AsGeoJSON(ST_Transform(ST_Simplify(geometry, $3*sqrt(ST_Area(ST_Envelope(geometry)))), 4326)) as data FROM ( SELECT id, ST_Multi(ST_Collect(geometry)) AS geometry FROM ( SELECT wikidata AS id, (ST_Dump(geometry)).geom AS geometry FROM $1~ WHERE wikidata IN ($2:csv) AND GeometryType(geometry) != 'POINT' ) combq GROUP BY id ) subq ``` This seems like an expensive query, and it might be more efficient to run only once at the time of import. The dynamic granularity "$3" might never be changed from the default in production requests. * [ ] Profile this transformation to determine the resource demands for these queries, and the resulting data size. * [ ] Profile the `ST_AsGeoJSON` step separately—this probably expands the data quite a bit. Is this an expensive call? How much bigger is the data? * [ ] Verify that Kartographer requests are never changing the query selection (currently `simplifyarea`, see also `config.allowUserQueries`), or the granularity constant (currently given as arg1 in the query, 0.001). Deprecate this feature either way, unless it's proven to be useful. * [ ] Look at the imposm job to see how we can hook into it, by processing either rows during or after the synchronization. We only want to simplify the changed entities. * [ ] Implement this simplification enhancement to the synchronization job. Create a new table, with the unique index wikidata_id, and the resulting geojson or binary geodata. * [ ] Update the service to use the simplified column (maybe overlapping old and new data during the migration period) * [ ] Fully refresh the master database to have simplified data for all wikidata entities. Outcome * simplified geojson takes ~50% of the space of the high quality data ``` lang=sql SELECT sum(pg_column_size(geometry)) from wikidata_relation_polygon; SELECT sum(length(data)), avg(length(data)) FROM public.simplified_geojson ``` * average size of simplified geojson: 2727 * querying for the simplified version of every wikidata item in our example datasets (tested with ~1600) took ~2 seconds, normally only one wikidata item is being queried at a time which would take around 1 millisecond --> adding a new table might not improve much, because the query is already pretty fast