Page MenuHomePhabricator

Puppetize event schema topic configuration
Closed, DeclinedPublic

Description

For T160748, I've been trying to think of a nice way to keep topic, schema, and EventStream routes DRY and sane. Currently:

  • schemas and topic -> schema mappings are in mediawiki/event-schemas repository
  • EventStreams route -> topic mapping in in Puppet hiera
  • EventStreams spec.yaml.j2 and config.yaml.j2 templates are in scap,

Scap renders the the spec.yaml and config.yaml files for the EventStreams service, which originally source their data from puppet hiera. I'd like to add schema URIs to the rendered spec.yaml, so that swagger-ui docs can display the schema for each stream route.

I just talked with @Pchelolo a bit, and we think we'll have some more needs to have topic configuration in puppet soon enough, due to the upcoming JobQueue rewrite.

We think we should come up with a good and extensible topic config data structure that we can store in hiera, that would specify things like:

  • the schemas allowed to be produced to the topic (if a schema is required).
  • other Kafka topic details: number of partitions, retention, etc. (This could be hard to ensure with puppet)
  • a description of what the topic is for(?)

I could then use this config to render EventStreams spec.yaml and config.yaml. I'd only have to provide the route name and the topics that should be included in that route. The rest would be looked up by getting the topic config for each topic.

@mobrovac, whatcha think?

Event Timeline

The original idea of having the event-schemas repository was so that others can re-use the config (Vagrant, 3rd party, etc). If we move the config to Puppet we lose the re-usability aspect of it, which I'm not keen on. On the other hand, given all of the places inside WMF prod where these are needed, it makes sense to distribute them via Puppet. I wonder if we could have Puppet or a cron script set up somewhere that would have the rights to automatically commit any changes to the rendered config to event-schemas. Thoughts?

In the future, though, I think schemas, topics, and their configurations and mappings should probably go into etcd and be pulled from it, but we are not there yet.

For Vagrant at least, I don't think we really need the topic config. We could make EventBus default to not enforcing the topic -> schema mapping. We could do this for 3rd parties too, but I can see the argument that in 3rd party prod settings (are there any?), they'd want the topic -> schema enforcement.

I wonder if we could have Puppet or a cron script set up somewhere that would have the rights to automatically commit any changes to the rendered config to event-schemas. Thoughts?

Can't say I like this, but it could work. We don't update topic configs often, and the ones we use in prod will not be the same ones that 3rd parties and vagrant should use. We might add more partitions to support more volume, but wouldn't want to do that in Vagrant.

We could maintain a dev/example topic-config.yaml in event-schemas, but have the production one be stored in puppet. Thoughts?

The original idea of having the event-schemas repository was so that others can re-use the config

For schemas, yes, but IIRC we only put the topic config in this repository for lack of a better place.

We could maintain a dev/example topic-config.yaml in event-schemas,

Actually, if we do this, maybe it would be better to put this dev/example topic config into eventlogging/eventbus instead of event-schemas? Does change-prop in vagrant need topic-config.yaml?

Nuria lowered the priority of this task from Medium to Low.May 29 2017, 3:49 PM

We'll be doing this differently in Modern Event Platform.