Page MenuHomePhabricator

Change-Prop consumer group must respect service name
Closed, ResolvedPublic

Description

Currently the consumer group name in change-prop is hardcoded into the sources. In order to run k8s deployment of change-prop and scb deployment in parallel and test things, we need to make the consumer group prefix configurable. I propose to use service name instead of the hardcoded 'change-prop' string. See for example how the retry stream name is constructed - we need to do the same.

Code-wise this is easy, but changing the consumer group names in production will be messy. When deployed with new consumer groups names, change-prop will not find any committed offsets for it's new group, so it will start from the latest offset, effectively skipping the backlog. So, when deploying the change we need to roll it out on half of the nodes first and wait for at least the highest backlog. This will cause duplicate message processing, but we can rely on deduplicator-by-id to get rid of duplicates. Also, updates are idempotent, so some duplication is not a huge problem.

Ping me on IRC if you have questions. Please do the coding part, we can do the deployment together.

Event Timeline

Pchelolo created this task.Feb 5 2020, 6:24 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptFeb 5 2020, 6:24 PM

Per discussion on ops-services-sync, this will not really be needed for a k8s switch, but it's still a good idea to do it.

Mentioned in SAL (#wikimedia-operations) [2020-02-18T20:39:55Z] <ppchelko@deploy1001> Started deploy [changeprop/deploy@e2fe8ca]: respect service name in consumer group T244387

Mentioned in SAL (#wikimedia-operations) [2020-02-18T20:47:54Z] <ppchelko@deploy1001> Finished deploy [changeprop/deploy@e2fe8ca]: respect service name in consumer group T244387 (duration: 07m 59s)

Mentioned in SAL (#wikimedia-operations) [2020-02-24T21:42:28Z] <ppchelko@deploy1001> Started deploy [cpjobqueue/deploy@f87bdd9]: Take service name into account for consumer group name T244387

Mentioned in SAL (#wikimedia-operations) [2020-02-24T21:43:43Z] <ppchelko@deploy1001> Finished deploy [cpjobqueue/deploy@f87bdd9]: Take service name into account for consumer group name T244387 (duration: 01m 14s)

Pchelolo closed this task as Resolved.Feb 24 2020, 9:49 PM