Page MenuHomePhabricator

Allow multiple schemas in a single EventBus topic.
Closed, DeclinedPublic5 Story Points

Description

Some internal topics in ChangeProp have a semantics of job queue - they need to hold persistent jobs of various types and be able to process them in order. The job is an event and for documentation purposes we want to have schemas for every event. In case several different types of jobs should be stored in a single topic, we need to set a schema for this topic to a superset of all job event schemas. This can either be achieved by defining a specific schema for each topic and using anyOf stanza or by supporting multiple allowed schemas for a single topic. I think the latter will be cleaner and will decrease the future amount of work needed to define a new internal job queue for ChangeProp

Please implement support for that in the Event-Platform service. The eventbus-topics.yaml config for such topics could look something like this:

change-prop.backlinks-rerender:
  schema_names:
    - change-prop/transclude-rerender
    - change-prop/continue

Details

Related Gerrit Patches:
mediawiki/event-schemas : masterRemoved change-Prop related topics and configs.

Event Timeline

Pchelolo created this task.Dec 6 2016, 10:02 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptDec 6 2016, 10:02 PM
Ottomata set the point value for this task to 5.

I'm generally +1 on the idea, but this might complicate the consumer side of the story. If a topic may have multiple schemas, then the consumer needs to know which schema it is (and possibly, why is that schema chosen over the others). So at the very least we should:

  • add the field meta.schema which can be either populated by the proxy service or the producer
  • define that having multiple schemas per topic is allowed exclusively for internal topics, such as change-prop ones

I mentioned this to Petr in IRC yesterday, but let's discuss here.

I'm not opposed to implementing this, but if change-prop doesn't use the eventbus service, should I? This also makes me wonder if we should even include the change-prop related topics in the topic -> schema mapping config. @Pchelolo said they are mostly there for documentation purposes. I'm all for documentation, but perhaps this could just be documented in a README or wiki or code comments somewhere? I don't think these topics should be involved in eventbus service configuration if they are never used by eventbus service. And if change prop doesn't use eventbus service, then it can post multiple schemas to the same topic on its own.

Thoughts?

add the field meta.schema which can be either populated by the proxy service or the producer

meta.schema_uri is set by eventbus service, but yeah, if change-prop produces directly to Kafka, it would be good for it to set it as well.

Pchelolo added a comment.EditedDec 7 2016, 5:25 PM

This also makes me wonder if we should even include the change-prop related topics in the topic -> schema mapping config.

Since we have automatic topic creation enabled we can actually exclude them, but we've never tested whether CP works if the topic wasn't created before. It should, but who knows.. If it does work we can exclude those mapping and CP related schemas and indeed document the schemas in a README in CP repo.

The service doesn't automatically create topics, and we recently removed the ensure-kafka-topics-exist script from eventlogging. Didn't you test? I think you said you did in https://gerrit.wikimedia.org/r/#/c/321397/ :p

The service doesn't automatically create topics

Let me rephrase: The service doesn't do anything special to create topics, it just produces to the topic, and Kafka auto creates it if it doesn't exist yet.

@Ottomata Oh, right, right... I actually did check it but completely forgot about that :)

Ottomata added a comment.EditedDec 7 2016, 5:49 PM

:)

Should we decline this ticket and make a patch to remove the change prop topic configs?

@Ottomata Let's see what @mobrovac has to say here, but overall I would agree. I will then put the CP schemas to CP repo under some doc folder. We need them because we use them in tests.

I'm totally fine with keeping the schemas in the event-schemas repo. I think its nice to have them all in one place, and to go through the same quality controls. I'm proposing that we remove the change prop topic configuration.

It is confusing because eventbus-topics.yaml is an eventbus service specific config. It doesn't really belong in the event-schemas repo at all. We just put it there for simplicity: https://github.com/wikimedia/mediawiki-event-schemas/blob/master/config/eventbus-topics.yaml#L7-L9

I'm totally fine with keeping the schemas in the event-schemas repo. I think its nice to have them all in one place, and to go through the same quality controls. I'm proposing that we remove the change prop topic configuration.

+1. Let's remove them from the config, but keep them in the repo to have all the schemas in one place.

Change 325865 had a related patch set uploaded (by Ppchelko):
Removed change-Prop related topics and configs.

https://gerrit.wikimedia.org/r/325865

Change 325865 merged by Ottomata:
Removed change-Prop related topics and configs.

https://gerrit.wikimedia.org/r/325865

mobrovac closed this task as Declined.Dec 8 2016, 9:33 PM

Closing as invalid as per discussion.

Hm, well, we still need to talk about what we should do with the change-prop topic configs, since we had to revert that today.

I expect the topic of supporting multiple event schemas to also come up in the context of the public RecentChanges feed, where several different kinds of events are provided in a single event stream. See T149736 for that discussion.