Following up T300042 and based on the POC developed there, we should add a caching mechanism for the query results.
We could use the same approach like Geoshapes or something else, if there is something better.
Open questions
- How does SPARQL endpoint caching work? If the same query is made multiple times, is it executed again or not?
- TBD: Ongoing request to the search team
[] How does geosearch caching work? (invalid for our use case here)
- Is there caching for the geoshapes endpoint?
- Looks like there is front caching for shape result ~24 hours.
- Shape data locally is retrieved directly form a local DB.
- If a geoshapes query returns a wikidata item with no geoshape (maybe among a list of other items which do have shapes), is there a visible error? How is this handled?
- Looks like an empty set is returned, https://maps.wikimedia.org/geoshape?getgeojson=1&ids=Q102, https://maps.wikimedia.org/geoshape?getgeojson=1&ids=Q1028,Q102
- Research existing traffic: What proportion of SPARQL requests are coming from the geoshapes service?
- What proportion of these queries return successfully?
- seems super low, a few request per million SPARQL requests
- How long do geoshapes queries take when SPARQL is being used (in contrast to QID input)?
- might be irrelevant regarding amount of requests
- Could/Should we improve the current approach to reduce traffic?
- might be irrelevant regarding amount of requests
- How many existing maps are there already using sparql queries for geoshapes? (Maybe interesting to find good pilot wikis for geopoints.)
- also seems so low that it's probably irrelevant
Notes: https://docs.google.com/document/d/1HM5uo8onOVUws5zAT6taswRW_R0pSF3cf3IosmUcVkM
Relevant links
- Wikidata:SPARQL query service/query optimization
- ST_AsGeoJSON, part of the postgis suite of pgsql functions
- Caching overview - Wikitech