Page MenuHomePhabricator

Separate retry and error topics between JobQueue and normal ChangeProp
Closed, ResolvedPublic

Description

Currently the retry and error topics for both JobQueue-ChangeProp and normal ChangeProp are prefixed with change-prop. For retry topics is not a big deal as they are separated by the actual topic name, but for the error topic all the errors from both CP instances end up in the same topic which is bad.

I propose to replace change-prop prefix with the service name, so the error/retry topics will be prefixed with changeprop and cpjobqueue.

This will break the metrics and might lead to loosing a couple of retries during the deployments, but since the traffic in retry topics is very low it's not a big deal, and the metrics is easily fixable.

Event Timeline

Ideally the consumer group names should be prepended with the service_name as well just for consistency, but renaming the consumer groups will make us loose all the backlog we have in all the topics and not process it. I don't think consistency is important enough to do that. What do you think @mobrovac ?

Another approach how to minimize the damage to normal change prop while making a nice and uniform use of the service_name everywhere is to rename normal ChangeProp service from changeprop to change-prop - this will break logging and metrics, but those dashboards are easily fixable.

Ideally the consumer group names should be prepended with the service_name as well just for consistency, but renaming the consumer groups will make us loose all the backlog we have in all the topics and not process it. I don't think consistency is important enough to do that. What do you think @mobrovac ?

+1. Le'ts go with just retry and error topics for now.

Another approach how to minimize the damage to normal change prop while making a nice and uniform use of the service_name everywhere is to rename normal ChangeProp service from changeprop to change-prop - this will break logging and metrics, but those dashboards are easily fixable.

Euh, actually ChangeProp's name in the production environment is changeprop, not change-prop. In that sense, it would be more correct to change the service name in its config.yamlto changeprop.

Euh, actually ChangeProp's name in the production environment is changeprop, not change-prop. In that sense, it would be more correct to change the service name in its config.yamlto changeprop.

The service name in the config.yaml is already changeprop and that prevents us from using the service name for retry/error topics and consumer group names universally because retry topics are named with change-prop prefix, but more importantly the consumer group names have change-prop prefix. Changing the consumer group names in CP will make us loose some events. It's not a big deal for most of the topics, but it would be an issue for transfusions. Perhaps we can increase transfusion concurrency, get rid of the backlog as much as possible and rename?

Yeah, we'd need to schedule that in such a way so as to switch the names while the backlog is 0.

Pchelolo edited projects, added Services (done); removed Services (doing).

The topics have been separated. They use the service name as prefix now.