Page MenuHomePhabricator

Set up retry queues for change propagation
Closed, ResolvedPublic

Description

In case a rule processing failed, change propagation service needs to be able to retry the rule execution. In order to do it reliably, it will post a message to the static retry topic. As other consumers of an Event-Platform could benefit from the ability to use retry topics, we want to set them up transparently for each topic created in kafka. A retry message will contain an original failed rule, so a consumer of a retry message would know which rule to execute.

So, here's the list of things that should be done:

  • Set up a schema for a retry topic.
  • Make event logging-service transparently create a retry topic for each topic created and accept messages for this topic. It's questionable whether we want this or not, as retry topics will be useful only for Event-Platform while the producer service might be useful for other applications where retry topics are not needed. So, I think it's better just to set the topics in the eventbus-topics.yaml config file @mobrovac @Ottomata what do you think?
  • Set up retry consumers in change propagation. WIP PR

Event Timeline

Change 284742 had a related patch set uploaded (by Ppchelko):
Set up retry topic shema

https://gerrit.wikimedia.org/r/284742

Change 284742 merged by Mobrovac:
Set up retry topic schema

https://gerrit.wikimedia.org/r/284742

Change 284377 had a related patch set uploaded (by Mobrovac):
Update the schemas to include the user_block and retry schemas as well

https://gerrit.wikimedia.org/r/284377

We also need multi-page processing for things like page links. Previously, @Pchelolo and I discussed using a single changeprop-owned topic for both, with a framing format wrapping the original message.

The current patches seem to aim for a retry-only topic, which means that we'd need to create another associated topic for multi-page processing. Should we use separate topics, or generalize the framing format to allow combining retries with multi-page processing?

Also, should we even register changeprop-internal topics with the eventbus service? Doing this for all eventbus-owned topics might become a bit tiresome, and there shouldn't be any legitimate way to produce to those topics from anything but the changeprop service itself.

The current patches seem to aim for a retry-only topic, which means that we'd need to create another associated topic for multi-page processing. Should we use separate topics, or generalize the framing format to allow combining retries with multi-page processing?

Hm, we can change this to use a single topic for both use-cases, but I think we shouldn't do that. These use cases are conceptually very different from each other, a retry indicates an error while a continuation is a normal processing. So, it would be useful to monitor the retry topic and continuation topic differently. Also, continuation might produce a tremendous amount of events, thus interfering with retries and making the whole system less timely updated. For ideas and discussion about the continuation I've filed T133221

Also, should we even register changeprop-internal topics with the eventbus service? Doing this for all eventbus-owned topics might become a bit tiresome, and there shouldn't be any legitimate way to produce to those topics from anything but the changeprop service itself.

We've been going back and forth with @mobrovac on this question, and finally decided that static retry topics within Event-Platform could become useful outside of change-propagation too. Change prop is only one of the Event-Platform consumers, and by setting the retry topics like this, we encourage other consumers to use the same pattern for producing retries.

Change 284377 merged by Ottomata:
Update the schemas to include the user_block and retry schemas as well

https://gerrit.wikimedia.org/r/284377

The retry topic PR was merged, so this one can be resolved now