Page MenuHomePhabricator

Performance considerations about the current usage of EventPlatform from maps
Closed, ResolvedPublic

Description

After our initial end to end tests using EventPlatfom for maps.tile_change events the following considerations came up:

  • Currently maps.tile_change topic has only 1 partition and we are planning to have multiple consumers in our consumer group for tile pregeneration. Does it make sense to increase the number of partitions to around the level of concurrency we need for the pregeneration workers?
  • The initial design for the maps.tile_change event schema was based in the idea that each tile change was going to be a single event. It turns out that once a day we would have millions of tile changes which as an information is not really valuable to have it to a single event. Maybe changing the schema to bundle multiple tile changes per event would be useful. The reasons are:
    • In our tile pregeneration setup we need to receive N amount of events to populate a single task for the pregeneration of N tiles. With a new schema we could just fetch a single message to populate the tile list per task
    • In our OpenStreetMap import pipeline we use eventgate to publish events. Using multiple tile changes per event would reduce the amount of HTTP requests we send and the time needed to finish the data import.
    • Its not expected to have a need for such granular information when it comes to map tile changes

Event Timeline

Jgiannelos renamed this task from Performance considerations around on the usage of EventPlatform from maps to Performance considerations about the current usage of EventPlatform from maps.Oct 14 2021, 1:38 PM

Change 730803 had a related patch set uploaded (by Jgiannelos; author: Jgiannelos):

[schemas/event/primary@master] maps.tile-change: Use batched tiles per event instead of single tile

https://gerrit.wikimedia.org/r/730803

For posteriority, in a recent test we discovered that one hour's worth of OSM sync can generate 120000 single tiles for one single zoom, this can reach 5 times of this size if we consider re-generating zoom 10 to zoom 15, as we currently do.

In the same test, we have submitted 4000 jobs and the script took nearly 5 minutes to finish the requests to EventGate.

Q: how did you publish to EventGate? You can POST an array of events (although 4000 at once might be too many; there is a POST body byte size limit).

Also, I think we discussed this, but I can't remember what you said. Why are you using EventGate? Can you produce directly to Kafka instead?

Q: how did you publish to EventGate? You can POST an array of events (although 4000 at once might be too many; there is a POST body byte size limit).

Good to know about the array of events. I think we've only sent events one by one.

Also, I think we discussed this, but I can't remember what you said. Why are you using EventGate? Can you produce directly to Kafka instead?

Mostly because of familiarity and convenience to use HTTP requests from bash scripts. We can always switch to a kafka client instead. Also this was the initial suggestion from the SREs when we talked about events.

Change 730841 had a related patch set uploaded (by Jgiannelos; author: Jgiannelos):

[schemas/event/primary@master] maps: Schema for batched tile changes

https://gerrit.wikimedia.org/r/730841

Change 730803 abandoned by Jgiannelos:

[schemas/event/primary@master] maps.tile-change: Use batched tiles per event instead of single tile

Reason:

https://gerrit.wikimedia.org/r/730803

Change 730841 merged by Ottomata:

[schemas/event/primary@master] maps: Schema for batched tile changes

https://gerrit.wikimedia.org/r/730841

Change 732381 had a related patch set uploaded (by Ottomata; author: Ottomata):

[eventgate-wikimedia@master] Bump schema repo shas

https://gerrit.wikimedia.org/r/732381

Change 732381 merged by Ottomata:

[eventgate-wikimedia@master] Bump schema repo shas

https://gerrit.wikimedia.org/r/732381

Change 732382 had a related patch set uploaded (by Ottomata; author: Ottomata):

[operations/deployment-charts@master] Bump eventgate-main image version to get maps.tiles_change schema

https://gerrit.wikimedia.org/r/732382

Change 732382 merged by Ottomata:

[operations/deployment-charts@master] Bump eventgate-main image version to get maps.tiles_change schema

https://gerrit.wikimedia.org/r/732382

Change 732767 had a related patch set uploaded (by Ottomata; author: Ottomata):

[schemas/event/primary@master] Fix maps.tiles_change schema required field

https://gerrit.wikimedia.org/r/732767

Change 732767 merged by Ottomata:

[schemas/event/primary@master] Fix maps.tiles_change schema required field

https://gerrit.wikimedia.org/r/732767

Change 732769 had a related patch set uploaded (by Ottomata; author: Ottomata):

[eventgate-wikimedia@master] Bump primary schema repo to get maps tiles changes schema fix

https://gerrit.wikimedia.org/r/732769

Change 732769 merged by Ottomata:

[eventgate-wikimedia@master] Bump primary schema repo to get maps tiles changes schema fix

https://gerrit.wikimedia.org/r/732769

Change 732771 had a related patch set uploaded (by Ottomata; author: Ottomata):

[operations/deployment-charts@master] eventgate-main - Bump image version to get maps tiles.change fix

https://gerrit.wikimedia.org/r/732771

Change 732771 merged by Ottomata:

[operations/deployment-charts@master] eventgate-main - Bump image version to get maps tiles.change fix

https://gerrit.wikimedia.org/r/732771

Change 734975 had a related patch set uploaded (by Jgiannelos; author: Jgiannelos):

[operations/software/tegola@wmf/v0.14.x] tile-pregeneration: Adapt to new event schema

https://gerrit.wikimedia.org/r/734975

Change 734975 merged by jenkins-bot:

[operations/software/tegola@wmf/v0.14.x] tile-pregeneration: Adapt to new event schema

https://gerrit.wikimedia.org/r/734975

Change 736494 had a related patch set uploaded (by Jgiannelos; author: Jgiannelos):

[operations/deployment-charts@master] tegola-vector-tiles: Use batched tile changes kafka stream

https://gerrit.wikimedia.org/r/736494

Change 736494 merged by jenkins-bot:

[operations/deployment-charts@master] tegola-vector-tiles: Use batched tile changes kafka stream

https://gerrit.wikimedia.org/r/736494

Change 736554 had a related patch set uploaded (by Jgiannelos; author: Jgiannelos):

[operations/deployment-charts@master] tegola-vector-tiles: Setup cronjob parallelism

https://gerrit.wikimedia.org/r/736554

Change 736554 merged by jenkins-bot:

[operations/deployment-charts@master] tegola-vector-tiles: Setup cronjob parallelism

https://gerrit.wikimedia.org/r/736554

After some testing on codfw k8s it looks like with 1 partition per topic only 1 worker ends up consuming the vast majority of the messages.
I think we need to increase the partitions to the number of workers (currently thats 6 in codfw and 6 in eqiad) so all of them can pregenerate tiles in parallel.

This refers to the following kafka topics on kafka-main:

  • codfw.maps.tiles_change
  • codfw.maps.tiles_change

Running the following in Kafka clusters main-eqiad, main-codfw, and jumbo-eqiad:

kafka topics --alter --topic eqiad.maps.tiles_change --partitions 6
kafka topics --alter --topic codfw.maps.tiles_change --partitions 6

...

Mentioned in SAL (#wikimedia-operations) [2021-11-10T19:51:43Z] <ottomata> altering {eqiad,codfw}.maps.tiles_change to increase to 6 partitions in kafka main-eqiad, main-codfw and jumbo-eqiad: https://phabricator.wikimedia.org/T293366#7497076

19:52:22 [@kafka-main1001:/home/otto] $ kafka topics --describe --topic eqiad.maps.tiles_change
kafka-topics --zookeeper conf1004.eqiad.wmnet,conf1005.eqiad.wmnet,conf1006.eqiad.wmnet/kafka/main-eqiad --describe --topic eqiad.maps.tiles_change
Topic:eqiad.maps.tiles_change	PartitionCount:6	ReplicationFactor:3	Configs:
	Topic: eqiad.maps.tiles_change	Partition: 0	Leader: 1005	Replicas: 1005,1001,1002	Isr: 1005,1001,1002
	Topic: eqiad.maps.tiles_change	Partition: 1	Leader: 1001	Replicas: 1001,1002,1003	Isr: 1001,1002,1003
	Topic: eqiad.maps.tiles_change	Partition: 2	Leader: 1002	Replicas: 1002,1003,1004	Isr: 1002,1003,1004
	Topic: eqiad.maps.tiles_change	Partition: 3	Leader: 1003	Replicas: 1003,1004,1001	Isr: 1003,1004,1001
	Topic: eqiad.maps.tiles_change	Partition: 4	Leader: 1004	Replicas: 1004,1001,1002	Isr: 1004,1001,1002
	Topic: eqiad.maps.tiles_change	Partition: 5	Leader: 1005	Replicas: 1005,1001,1002	Isr: 1005,1001,1002
19:52:29 [@kafka-main1001:/home/otto] $ kafka topics --describe --topic codfw.maps.tiles_change
kafka-topics --zookeeper conf1004.eqiad.wmnet,conf1005.eqiad.wmnet,conf1006.eqiad.wmnet/kafka/main-eqiad --describe --topic codfw.maps.tiles_change
Topic:codfw.maps.tiles_change	PartitionCount:6	ReplicationFactor:3	Configs:
	Topic: codfw.maps.tiles_change	Partition: 0	Leader: 1005	Replicas: 1005,1001,1002	Isr: 1005,1001,1002
	Topic: codfw.maps.tiles_change	Partition: 1	Leader: 1001	Replicas: 1001,1002,1003	Isr: 1001,1002,1003
	Topic: codfw.maps.tiles_change	Partition: 2	Leader: 1002	Replicas: 1002,1003,1004	Isr: 1002,1003,1004
	Topic: codfw.maps.tiles_change	Partition: 3	Leader: 1003	Replicas: 1003,1004,1001	Isr: 1003,1004,1001
	Topic: codfw.maps.tiles_change	Partition: 4	Leader: 1004	Replicas: 1004,1001,1002	Isr: 1004,1001,1002
	Topic: codfw.maps.tiles_change	Partition: 5	Leader: 1005	Replicas: 1005,1001,1002	Isr: 1005,1001,1002
19:52:54 [@kafka-main2004:/home/otto] $ kafka topics --describe --topic eqiad.maps.tiles_change
kafka-topics --zookeeper conf2004.codfw.wmnet,conf2005.codfw.wmnet,conf2006.codfw.wmnet/kafka/main-codfw --describe --topic eqiad.maps.tiles_change
Topic:eqiad.maps.tiles_change	PartitionCount:6	ReplicationFactor:3	Configs:
	Topic: eqiad.maps.tiles_change	Partition: 0	Leader: 2005	Replicas: 2005,2001,2002	Isr: 2005,2001,2002
	Topic: eqiad.maps.tiles_change	Partition: 1	Leader: 2001	Replicas: 2001,2002,2003	Isr: 2001,2002,2003
	Topic: eqiad.maps.tiles_change	Partition: 2	Leader: 2002	Replicas: 2002,2003,2004	Isr: 2002,2003,2004
	Topic: eqiad.maps.tiles_change	Partition: 3	Leader: 2003	Replicas: 2003,2004,2001	Isr: 2003,2004,2001
	Topic: eqiad.maps.tiles_change	Partition: 4	Leader: 2004	Replicas: 2004,2001,2002	Isr: 2004,2001,2002
	Topic: eqiad.maps.tiles_change	Partition: 5	Leader: 2005	Replicas: 2005,2001,2002	Isr: 2005,2001,2002
19:53:02 [@kafka-main2004:/home/otto] $ kafka topics --describe --topic codfw.maps.tiles_change
kafka-topics --zookeeper conf2004.codfw.wmnet,conf2005.codfw.wmnet,conf2006.codfw.wmnet/kafka/main-codfw --describe --topic codfw.maps.tiles_change
Topic:codfw.maps.tiles_change	PartitionCount:6	ReplicationFactor:3	Configs:
	Topic: codfw.maps.tiles_change	Partition: 0	Leader: 2005	Replicas: 2005,2001,2002	Isr: 2005,2001,2002
	Topic: codfw.maps.tiles_change	Partition: 1	Leader: 2001	Replicas: 2001,2002,2003	Isr: 2001,2002,2003
	Topic: codfw.maps.tiles_change	Partition: 2	Leader: 2002	Replicas: 2002,2003,2004	Isr: 2002,2003,2004
	Topic: codfw.maps.tiles_change	Partition: 3	Leader: 2003	Replicas: 2003,2004,2001	Isr: 2003,2004,2001
	Topic: codfw.maps.tiles_change	Partition: 4	Leader: 2004	Replicas: 2004,2001,2002	Isr: 2004,2001,2002
	Topic: codfw.maps.tiles_change	Partition: 5	Leader: 2005	Replicas: 2005,2001,2002	Isr: 2005,2001,2002
19:53:29 [@kafka-jumbo1001:/home/otto] $ kafka topics --describe --topic eqiad.maps.tiles_change
kafka-topics --zookeeper conf1004.eqiad.wmnet,conf1005.eqiad.wmnet,conf1006.eqiad.wmnet/kafka/jumbo-eqiad --describe --topic eqiad.maps.tiles_change
Topic:eqiad.maps.tiles_change	PartitionCount:6	ReplicationFactor:3	Configs:
	Topic: eqiad.maps.tiles_change	Partition: 0	Leader: 1008	Replicas: 1008,1007,1001	Isr: 1008,1007,1001
	Topic: eqiad.maps.tiles_change	Partition: 1	Leader: 1009	Replicas: 1009,1002,1005	Isr: 1009,1002,1005
	Topic: eqiad.maps.tiles_change	Partition: 2	Leader: 1001	Replicas: 1001,1005,1008	Isr: 1001,1005,1008
	Topic: eqiad.maps.tiles_change	Partition: 3	Leader: 1003	Replicas: 1003,1008,1007	Isr: 1003,1008,1007
	Topic: eqiad.maps.tiles_change	Partition: 4	Leader: 1004	Replicas: 1004,1009,1001	Isr: 1004,1009,1001
	Topic: eqiad.maps.tiles_change	Partition: 5	Leader: 1006	Replicas: 1006,1001,1003	Isr: 1006,1001,1003
19:53:34 [@kafka-jumbo1001:/home/otto] $ kafka topics --describe --topic codfw.maps.tiles_change
kafka-topics --zookeeper conf1004.eqiad.wmnet,conf1005.eqiad.wmnet,conf1006.eqiad.wmnet/kafka/jumbo-eqiad --describe --topic codfw.maps.tiles_change
Topic:codfw.maps.tiles_change	PartitionCount:6	ReplicationFactor:3	Configs:
	Topic: codfw.maps.tiles_change	Partition: 0	Leader: 1005	Replicas: 1005,1008,1001	Isr: 1005,1008,1001
	Topic: codfw.maps.tiles_change	Partition: 1	Leader: 1005	Replicas: 1005,1008,1001	Isr: 1005,1008,1001
	Topic: codfw.maps.tiles_change	Partition: 2	Leader: 1008	Replicas: 1008,1007,1001	Isr: 1008,1007,1001
	Topic: codfw.maps.tiles_change	Partition: 3	Leader: 1007	Replicas: 1007,1009,1001	Isr: 1007,1009,1001
	Topic: codfw.maps.tiles_change	Partition: 4	Leader: 1009	Replicas: 1009,1001,1003	Isr: 1009,1001,1003
	Topic: codfw.maps.tiles_change	Partition: 5	Leader: 1001	Replicas: 1001,1003,1004	Isr: 1001,1003,1004

Change 747196 had a related patch set uploaded (by Jgiannelos; author: Jgiannelos):

[operations/mediawiki-config@master] Deprecate unused maps event stream

https://gerrit.wikimedia.org/r/747196

Change 747196 merged by jenkins-bot:

[operations/mediawiki-config@master] Remove unused maps event stream

https://gerrit.wikimedia.org/r/747196

Mentioned in SAL (#wikimedia-operations) [2022-05-03T13:14:12Z] <lucaswerkmeister-wmde@deploy1002> Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:747196|Remove unused maps event stream (T293366)]] (duration: 01m 04s)