[Maps] Modernize Vector Tile Infrastructure
Closed, ResolvedPublic1 Estimated Story Points
Actions

Assigned To

Authored By

	• sdkim
	Sep 25 2020, 2:52 PM

Description

Problem

Wikimedia Maps has the most number of outages of any service at the foundation to date. As those outages occur, there is no defined performance metric to indicate success or service degradation; our current monitoring capabilities for maps are poor and unhelpful; and it is very complex and not easily understood in order to gain support and maintenance.

Hypothesis

We believe that modernizing the maps infrastructure will reduce complexity, enable monitoring capabilities, and better empower SRE to resolve issues quickly and intuitively.

The overall rationale behind the following phased approach is to be able to do atomic changes without breaking the current functionality and with minimal disruption. We also want to allow for evaluation of the changes and enable feedback to be provided along the way. The plan is to approach this modernization iteratively and starting with replacing our current vector tile server, tilerator, with the open-source vector tile server, Tegola. We hope moving away from server-side raster rendering to client-side reducing dependency on SRE and allows this team to be autonomous when it comes to supporting and maintaining the maps stack.

By modernizing our maps infrastructure, we empower SREs to support maps-related incidents and maintenance by

Moving away from static allocation of services to bare metal to services in Kubernetes
Reduce the complexity of the infrastructure by removing legacy/deprecated dependencies
Use technologies where our SREs have a lot of expertise

Outcomes

Wikimedia users will have a reliable and consistent experience contributing to and learning about geo-information

Maintenance effort on vector-tile related outages reduces to 10% per quarter for a full-time engineer

SREs will be empowered to maintain and support maps-related incidents without previous experience

Performance metrics can be monitored through Prometheus in providing alerting thresholds as defined by SLO
SLOs are defined and agreed upon by the Product Infrastructure & SRE teams
Maps documentation can be read and clearly understood by an SRE to provide an overview and provide actionable remedies to handle problems

Details

	Subject	Repo	Branch	Lines +/-
	Remove a reference to Tilerator	mediawiki/services/kartotherian	master	+1 -1

Customize query in gerrit

Related Objects
Search...

Status	Subtype	Assigned	Task
Resolved		• ssastry	T263854 [Maps] Modernize Vector Tile Infrastructure
Declined		hnowlan	T263858 Evaluate and Document current state of Maps from SRE perspective
Resolved	Spike	Jgiannelos	T265622 [SPIKE][Maps] Tile Server Replacement/Enhancements
Resolved	Spike	Jgiannelos	T265623 [SPIKE][Maps] Localization
Resolved	Spike	Jgiannelos	T265624 [SPIKE][Maps] Tile Rasterization
Resolved		MSantos	T267339 [EPIC] Address maps level of support issues
Resolved		MSantos	T269884 Empower maps support by providing better documentation
Resolved		MSantos	T269690 View an Example Map (using Tegola vector tile server)
Resolved		MSantos	T270169 Bootstrap Tegola vector-tile server with baseline MVT schema from OSM bright
Resolved		hnowlan	T270170 Generate docker images for tegola
Resolved		Jgiannelos	T270171 Create imposm3 setup for OSM bright
Resolved		Jgiannelos	T270172 Use redis as a tegola cache
Resolved		Jgiannelos	T270174 Benchmark performance of tegola as a tile server
Resolved		Jgiannelos	T270175 Support vector tile pre-generation
Resolved		Jgiannelos	T289771 Add kafka support for tile-pregeneration events
Resolved		Jgiannelos	T290982 Support expired tile deduplication
Resolved		Jgiannelos	T293366 Performance considerations about the current usage of EventPlatform from maps
Resolved		Jgiannelos	T294011 Sending events to `maps.tiles_change` stream is failing
Resolved		Jgiannelos	T270177 Configure tegola to serve tiles
Resolved		Jgiannelos	T277871 Integrate kartodock with the minimal kartotherian branch
Resolved		hnowlan	T271920 Document infrastructure considerations
Resolved		Jgiannelos	T272843 Compare resources needed for Redis and Swift storage in order to replace Cassandra.
Resolved		Jgiannelos	T271630 Get an architecture / services review of planned changes to Maps stack
Resolved		MSantos	T272451 Create the "Decision Statement Overview" for Maps 2.0
Resolved		None	T275063 Maps - rearchitecting of Maps Stack to improve stability and reliability
Resolved		Jgiannelos	T275845 Cleanup deprecated codebase from kartotherian project
Resolved		MSantos	T274378 Extract geoshapes into a standalone service
Resolved		MSantos	T274380 Add CI jobs for geoshapes service
Resolved		Jgiannelos	T275874 Create helm charts for tegola vector tile server
Resolved		Jgiannelos	T276324 OSM pipeline evaluation
Resolved		Jgiannelos	T281976 Adapt helm charts to use the new PostGIS queries structure
Resolved		MSantos	T281978 Adapt kartotherian tm2 source to use the new PostGIS queries structure
Resolved		Jgiannelos	T196474 Externalize tile storage for maps
Resolved		Jgiannelos	T149885 Investigate Swift as a storage backend for maps tiles
Resolved		MSantos	T280767 Maps 2.0 roll-out plan
Resolved		MSantos	T291178 Provide configuration support on Kartographer to enable tiles from Tegola per wiki
Resolved		hnowlan	T298246 Disable unused services on maps nodes
Resolved		Jgiannelos	T298248 Connect kartotherian to tegola as a tile backend per cluster
Resolved		Jgiannelos	T298249 Cleanup kartographer default styles in mediawiki config
Resolved		Jgiannelos	T298251 Investigate cache latency on tegola codfw
Resolved		jijiki	T290149 Configure replication slots on Postgres masters
Resolved		• ssastry	T297408 Install latest package for maps-deduped-tilelist (v0.0.4)

Event Timeline

• sdkim created this task.Sep 25 2020, 2:52 PM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptSep 25 2020, 2:52 PM

• sdkim claimed this task.Sep 25 2020, 2:53 PM

• Mholloway subscribed.Sep 25 2020, 4:08 PM

• sdkim added a project: Product Infrastructure Roadmap.Oct 14 2020, 1:23 PM

• sdkim moved this task from Later to Next on the Product Infrastructure Roadmap board.

• sdkim mentioned this in T267421: Some interactive maps are not clickable.Nov 17 2020, 3:26 PM

WDoranWMF added a project: Platform Engineering Roadmap.Nov 23 2020, 6:57 PM

AntiCompositeNumber subscribed.Nov 23 2020, 9:21 PM

MSantos moved this task from All map-related tasks to Production Infrastructure on the Maps board.Nov 25 2020, 12:07 PM

WDoranWMF added a project: Code-Health-Objective.Nov 27 2020, 5:02 PM

Jgiannelos closed subtask T265623: [SPIKE][Maps] Localization as Resolved.Dec 1 2020, 4:30 PM

Jgiannelos closed subtask T265624: [SPIKE][Maps] Tile Rasterization as Resolved.

Jgiannelos closed subtask T265622: [SPIKE][Maps] Tile Server Replacement/Enhancements as Resolved.

• eprodromou added a project: Tech-Product API Roadmap.Dec 3 2020, 6:32 PM

Naike moved this task from Later (future inbox) to Next on the Platform Engineering Roadmap board.Dec 7 2020, 6:40 PM

Naike set the point value for this task to 1.

Naike removed the point value for this task.Dec 7 2020, 6:42 PM

Naike set the point value for this task to 1.

• Mholloway unsubscribed.Dec 7 2020, 6:43 PM

WDoranWMF moved this task from Next to Matrixed on the Platform Engineering Roadmap board.Dec 7 2020, 6:51 PM

Naike reassigned this task from • sdkim to hnowlan.Dec 7 2020, 7:19 PM

• sdkim updated the task description. (Show Details)Dec 8 2020, 2:13 PM

• sdkim updated the task description. (Show Details)Dec 8 2020, 2:17 PM

• sdkim moved this task from Next to Now on the Product Infrastructure Roadmap board.

• sdkim added a subtask: T267339: [EPIC] Address maps level of support issues.Dec 8 2020, 3:06 PM

• eprodromou moved this task from Untriaged to Next on the Tech-Product API Roadmap board.Dec 9 2020, 9:44 PM

• ssastry mentioned this in T271630: Get an architecture / services review of planned changes to Maps stack.Jan 10 2021, 12:33 AM

• sdkim renamed this task from [Maps] Improve Service Consistency & Reduce Maintenance Cost to [Maps] Modernize Vector Tile Infrastructure.Jan 11 2021, 9:37 PM

• sdkim claimed this task.

• sdkim updated the task description. (Show Details)

• sdkim added a subscriber: hnowlan.

• sdkim added a subtask: T271630: Get an architecture / services review of planned changes to Maps stack.Jan 19 2021, 8:50 PM

• sdkim reassigned this task from • sdkim to SubrahamanyamVarma.Jan 22 2021, 2:40 PM

• sdkim moved this task from Next to Now on the Tech-Product API Roadmap board.

MSantos reassigned this task from SubrahamanyamVarma to • ssastry.Jan 26 2021, 11:01 AM

MSantos added a subscriber: SubrahamanyamVarma.

MSantos mentioned this in T274356: Security Readiness Review For maplibre-gl-js.Feb 10 2021, 12:38 PM

MSantos mentioned this in T274378: Extract geoshapes into a standalone service.Feb 10 2021, 3:30 PM

MSantos mentioned this in T274388: New Service Request geoshapes.Feb 10 2021, 4:31 PM

Naveenpf subscribed.Feb 10 2021, 5:42 PM

MSantos mentioned this in T274875: Security Readiness Review For mapbox-gl-leaflet.Feb 16 2021, 11:32 AM

Jgiannelos added a subtask: T275845: Cleanup deprecated codebase from kartotherian project.Feb 26 2021, 1:19 PM

Naike moved this task from Matrixed to Delivered on the Platform Engineering Roadmap board.Mar 1 2021, 6:36 PM

MSantos added a subtask: T276324: OSM pipeline evaluation.Mar 3 2021, 1:17 PM

MSantos added a subtask: T196474: Externalize tile storage for maps.Mar 4 2021, 7:03 PM

Krinkle updated the task description. (Show Details)Mar 5 2021, 12:01 AM

Mainframe98 mentioned this in T110223: Move maps code from github to gerrit.Mar 25 2021, 1:06 PM

MSantos closed subtask T275874: Create helm charts for tegola vector tile server as Resolved.Apr 7 2021, 2:17 PM

Kozuch awarded a token.Apr 14 2021, 3:39 PM

MSantos added a subtask: T280767: Maps 2.0 roll-out plan.Apr 21 2021, 10:40 AM

jijiki mentioned this in T283049: Swift account to store pre-rendered vector-tiles .May 18 2021, 6:13 AM

hnowlan closed subtask T263858: Evaluate and Document current state of Maps from SRE perspective as Declined.May 25 2021, 4:36 PM

Jgiannelos closed subtask T271630: Get an architecture / services review of planned changes to Maps stack as Resolved.Jun 1 2021, 11:13 AM

Jgiannelos closed subtask T196474: Externalize tile storage for maps as Resolved.Aug 26 2021, 10:57 AM

Jgiannelos closed subtask T276324: OSM pipeline evaluation as Resolved.Sep 17 2021, 11:19 AM

Jgiannelos closed subtask T271920: Document infrastructure considerations as Resolved.Nov 12 2021, 3:51 PM

MSantos closed subtask T269690: View an Example Map (using Tegola vector tile server) as Resolved.Nov 15 2021, 11:37 AM

Jgiannelos closed subtask T297408: Install latest package for maps-deduped-tilelist (v0.0.4) as Resolved.Jan 13 2022, 4:30 PM

Jgiannelos closed subtask T275845: Cleanup deprecated codebase from kartotherian project as Resolved.Mar 21 2022, 3:36 PM

Change 776278 had a related patch set uploaded (by Awight; author: Awight):

[mediawiki/services/kartotherian@master] Remove a reference to Tilerator

https://gerrit.wikimedia.org/r/776278

gerritbot added a project: Patch-For-Review.Apr 2 2022, 9:04 AM

Change 776278 abandoned by WMDE-Fisch:

[mediawiki/services/kartotherian@master] Remove a reference to Tilerator

Reason:

This patch seems to take care of the issue: If62089bfe77f6cae1083017d53b5cffcd0952c5b

https://gerrit.wikimedia.org/r/776278

Maintenance_bot removed a project: Patch-For-Review.May 6 2022, 12:30 PM

MSantos closed subtask T280767: Maps 2.0 roll-out plan as Resolved.Sep 16 2022, 8:08 AM

WMDE-Fisch subscribed.Sep 16 2022, 8:09 AM

Jgiannelos mentioned this in T228497: Review sizing of maps cluster.Nov 8 2022, 4:40 PM

jijiki closed subtask T290149: Configure replication slots on Postgres masters as Resolved.Nov 18 2022, 11:40 AM

awight mentioned this in T323981: Document outstanding maintenance for the maps stack.Jan 6 2023, 12:50 PM

For future reference, I've removed the client-side rendering tasks from this EPIC since they were removed from the scope a long time ago and with that we can finally close this task.

[Maps] Modernize Vector Tile InfrastructureClosed, ResolvedPublic1 Estimated Story PointsActions