The geoshapes service currently runs a transformation every time a client requests geoshapes:
```
SELECT id, ST_AsGeoJSON(ST_Transform(ST_Simplify(geometry, $3*sqrt(ST_Area(ST_Envelope(geometry)))), 4326)) as data
FROM (
SELECT id, ST_Multi(ST_Collect(geometry)) AS geometry
FROM (
SELECT wikidata AS id, (ST_Dump(geometry)).geom AS geometry
FROM $1~
WHERE wikidata IN ($2:csv)
AND GeometryType(geometry) != 'POINT'
) combq
GROUP BY id
) subq
```
This seems like an expensive query, and it might be more efficient to run only once at the time of import. The dynamic granularity "$3" might never be changed from the default in production requests.
* [ ] Profile this transformation to determine the resource demands for these queries, and the resulting data size.
* [ ] Profile the `ST_AsGeoJSON` step separately—this probably expands the data quite a bit. Is this an expensive call? How much bigger is the data?
* [ ] Verify that Kartographer requests are never changing the query selection (currently `simplifyarea`, see also `config.allowUserQueries`), or the granularity constant (currently given as arg1 in the query, 0.001). Deprecate this feature either way, unless it's proven to be useful.
* [ ] Look at the imposm job to see how we can hook into it, by processing either rows during or after the synchronization. We only want to simplify the changed entities.
* [ ] Implement this simplification enhancement to the synchronization job. Create a new table, with the unique index wikidata_id, and the resulting geojson or binary geodata.
* [ ] Update the service to use the simplified column (maybe overlapping old and new data during the migration period)
* [ ] Fully refresh the master database to have simplified data for all wikidata entities.
Outcome
* simplified geojson takes ~50% of the space of the high quality data
``` lang=sql
SELECT sum(pg_column_size(geometry)) from wikidata_relation_polygon;
SELECT sum(length(data)), avg(length(data)) FROM public.simplified_geojson
```
* average size of simplified geojson: 2727
* querying for the simplified version of every wikidata item in our example datasets (tested with ~1600) took ~2 seconds, normally only one wikidata item is being queried at a time which would take around 1 millisecond
--> adding a new table might not improve much, because the query is already pretty fast