#### Description
**Problem we are solving**:
The WMDE Technical Wishes [[ https://meta.wikimedia.org/wiki/WMDE_Technical_Wishes | team ]] is working on improvements to maps, and our current feature focus is to serve historical versions of annotated maps. The Kartographer system can currently render only the latest revision of the embedded map on an article, when serving static image thumbnails as you see on high-traffic sites. The lack of versioned maps is especially problematic on wikis such as German Wikipedia, which use FlaggedRevs page stabilization. Kartographer mapframe rendering is currently disabled on these wikis, otherwise the stabilized pages would often include blank maps (since the latest version doesn't match page contents).
**Planned implementation**:
Each hop in the pipeline for retrieving maps will support a `revid` parameter. In conjunction with other URL parameters, this is enough information to render any historical version of an embedded mapframe.
**Examples**:
Example map with annotations: [[ https://maps.wikimedia.org/img/osm-intl,6,53.383333,-1.466667,300x400.png?lang=en&domain=en.wikipedia.org&title=Downton+Abbey&groups=_39680418b083bf0edb91278ce24d9075eb0496fa | here ]]
Example broken map, missing annotations: [[ https://maps.wikimedia.org/img/osm-intl,6,53.383333,-1.466667,300x400.png?lang=en&domain=en.wikipedia.org&title=Downton+Abbey&groups=_1c9960b16c26bad1fc8d874d86ac41ca253a10c2 | here ]]. This can be seen on historical pages: [[ https://en.wikipedia.org/wiki/Downton_Abbey?oldid=939031521#Filming_locations | here ]]
#### Preview environment
Two revisions of a static mapframe, distinguished by having different colors of marker:
https://en.wikipedia.beta.wmflabs.org/w/index.php?title=Kartographer%20versioned%20maps%20example&direction=prev&oldid=534876
https://en.wikipedia.beta.wmflabs.org/w/index.php?title=Kartographer%20versioned%20maps%20example&direction=prev&oldid=534875
#### Which code to review
//(Provide links to all proposed changes and/or repositories. It should also describe changes which have not yet been merged or deployed but are planned prior to deployment. E.g. production Puppet, wmf config, or in-flight features expected to complete prior to launch date, etc.).//
* Changes to Kartographer: [[ https://github.com/wikimedia/mediawiki-extensions-Kartographer/compare/master...wmde-maps-revid | deployed ]]
* `mapdata` API responds to `revids`
* Static and dynamic maps are rendered with versioned map URL parameters.
* All new features are behind a config flag, only the mapdata API change is enabled by default, the rendering changes ship disabled.
* Will merge to the main branch soon, we expect this to go quietly.
* mapdata API client library passes through the `revids` parameter: [[ https://github.com/wikimedia/mapdata/compare/wmde-maps-revid | deployed ]]
* Kartotherian passes through revid parameters: [[ https://github.com/wikimedia/mediawiki-services-kartotherian/compare/wmde-maps-revid | deployed ]]
* A new config option `versioned_maps: false` disables pass-through, which forces the request to fall back to a title-only mapdata request. Default is to allow versioned requests. This knob is our main safety feature, it converts image requests into their legacy equivalent.
#### Performance assessment
Please initiate the performance assessment by answering the below:
- What work has been done to ensure the best possible performance of the feature?
* Analyzed various caching layers including the Varnish front-end, and the Parser cache: T293841, T292049, T295050, T295363
* An overall assessment of how each step might have negative impacts and what a rollback looks like: T293843
* Looking into open and active side-issues which may interact with our changes, such as T269984
* Each code deployment is expected to have no impact, and each configuration change can be rolled back. In particular, the kartotherian pass-through feature flag can transform new, versioned static map image requests into legacy requests. This means we should be able to avoid a mass purge or denial of service in the event of all anticipated failure modes.
* If we see the need to make cache invariant to revid (see below), we've already experimented with hiding the parameter from Varnish and have a tentative plan to move the parameters out of the URL and into headers, so that ATS reuses thumbnails more often. If our feature causes any performance regression, implementing this fallback plan will compensate and more.
- What are likely to be the weak areas (e.g. bottlenecks) of the code in terms of performance?
* The feature itself is not expected to have any weak areas, it changes some parser cache queries but these would already be happening just for the wrong revision ID.
* An existing hazard is that, in the worst-case scenario such as after a mass purge of the upload cache, the maps server would go to 4x normal load (cache hit rate is currently at 75%), for at least 1 hour (upload varnish expiry). The cluster would be unable to serve this load, probably resulting in timeouts, possibly resulting in broken map thumbnails that slowly heal over several hours. Anybody purging the entire cache could spin up additional, temporary maps servers to avoid this.
* We may have missed some DoS vectors for example when receiving malicious mapdata requests.
- Are there potential optimisations that haven't been performed yet?
* There are some parameters such as `revid`, `title`, and `domain` which should be cache-invariant (T293914), allowing cached maps to be reused across history, pages, and sites, but we haven't evaluated whether this is worth pursuing. We suspect the gain might be mostly negated due to low edit velocity relative to cache expiry, and an unavoidable variation on the `lang` used to translate labels. The most promising, potential optimization is that a map might be reused in a common template, possibly making `title` + `revid invariance a valuable optimization to collapse all pages to use a single image.
* We're pursuing better cache policies for both performance and functional reasons, for example mapdata API responses can be cached long-term for a specific revision, medium-term for a legacy title-only request, and very short-term for error responses. (T295604, T295130, T269984)
* Have not looked into the ATS layer at all, its contribution to hit rate is unknown and there could be tuning opportunity. A given embedded map thumbnail never changes, refreshing the base map as slowly as once per month would be fine. Investigation task: T297363.
* After the `revid` parameter is deployed, we can analyze web requests and estimate what impact further optimizations would have.
* Data centers could be balanced. Currently, the main data center takes the full load and codfw nothing at all. Maps probably have a strong regional bias, so caching different thumbnails in each location might be efficient.
- Please list which performance measurements are in place for the feature and/or what you've measured ad-hoc so far.
* We have one-off queries against the mediawiki API request log to measure cache performance, mapdata API processing time, and responsiveness. For some results, see tasks linked at the top of this section.
* Kartotherian map server performance can be monitored using [[ https://grafana.wikimedia.org/d/000000305/maps-performances?orgId=1 | this dashboard ]] and the webrequest log (using superset SQL lab). Any negative impact from the versioned maps feature is expected to show up on the "static snapshot requests" graph, in the form of a slow, one-month slope up to a higher burden as img URLs are replaced incrementally when each cached article expires.