Page MenuHomePhabricator

Modern Event Platform: Stream Intake Service (EventGate): Implementation
Closed, ResolvedPublic21 Story Points

Description

Ticket proliferation disambiguration!

This ticket will be used to track and task implementation work for the Stream Intake Service.

The RFC in T201963 is still ongoing (waiting for feedback) but as the RFC process takes an undefined amount of time to complete, we should not block on it, and move forward with implementation now.

Description

The Stream Intake Service will be used to intake events over HTTP from both internal and external clients. Those events will be validated and then produced to Kafka. The events API will be compatible (or close to compatible) with the existing eventlogging-service /v1/events API.

Technical Requirements

  • POST of a single event
  • POST of an array of events
  • Two response modes:
    • Fire and forget: HTTP response is given before event is validated and produced to Kafka
    • ACKed: HTTP response is given based on event validation status and Kafka produce success
  • JSONSchemas of events read from URIs, either local file:/ or remote http:/.
  • Schemas for a given URI should be cached
  • event schema uris extraced from events, e.g meta.schema_uri field (this should be configurable).
  • Destination topics extracted from events
  • Configurable topic transformation (e.g. datacenter prefixing)
  • Topic schema restriction: only certain schemas should be allowed in certain topics.
  • Events that fail for any reason should be produced to Kafka in an error topic with a specific event error schema

Related Objects

StatusAssignedTask
OpenOttomata
OpenOttomata
ResolvedOttomata
ResolvedOttomata
Resolvedsbassett
ResolvedOttomata
StalledOttomata
Declinedakosiaris
OpenOttomata
ResolvedOttomata
ResolvedOttomata
ResolvedOttomata
ResolvedPchelolo
ResolvedOttomata
ResolvedEvanProdromou
ResolvedEBernhardson
ResolvedOttomata
ResolvedOttomata
ResolvedOttomata
ResolvedOttomata
ResolvedOttomata
ResolvedOttomata
ResolvedOttomata
ResolvedOttomata
ResolvedOttomata
DeclinedNone

Event Timeline

Ottomata triaged this task as Normal priority.Oct 11 2018, 6:41 PM
Ottomata created this task.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptOct 11 2018, 6:41 PM

Some open questions I have (some of these don't need to be resolved now since we don't have a use case for them)

  • How to deal with Kafka message keys?
  • How to deal with partitioners?
  • Schemas in the event-schemas repo are currently prefixed with .yaml. We should probably leave the file extension off, so the relative and file extensionless schema uris can be used in the events. Both .json and .yaml files are readable by YAML parsers, so we don't need to know which it is. E.g. mediawiki/revision/create/3 should point to a real file.
  • Should topic transformation (e.g. datacenter prefixing) be configured by full topic name...or just by intended 'stream' name.
  • What to do with messages that fail Kafka produce? EventBus currently is configured to write them to local files. This has helped with some recovery in the past, but wouldn't be a great idea to do in a high volume system. Perhaps we can build in support for this, but make enable/disable it via configuration?

I have a prototype of an eventbus rewrite in node-service-template here: https://github.com/ottomata/eventbus

Should I bring this into gerrit so we can start collaborating / reviewing? If so, what should the name be? Should we just keep using the name eventbus and call this eventbus 2.0 or something?

Ottomata updated the task description. (Show Details)Oct 11 2018, 7:34 PM

@Ottomata added ticket for prototype explicitly: https://phabricator.wikimedia.org/T206815 If it is a full rewrite I vote for a different name (stream intake service?)

Ottomata added a comment.EditedOct 12 2018, 6:51 PM

I'd prefer to keep the name or make up something new like it. A new service will be API compatible with what we now call 'eventbus', and we will likely keep using the Mediawiki EventBus extension. There is no existant 'eventbus' repository now, and eventlogging itself doesn't mention 'eventbus' anywhere (only eventlogging-service). If we call this software eventbus, we can have several deployed instances in prod: eventbus main-eqiad, eventbus public-eqiad, etc.

We need a name! Have been brainstorming over on https://etherpad.wikimedia.org/p/event-platform. The current three top contenders:

I like EventHorizon, but it is already taken. YAR naming is hard!

I'm setting a deadline for myself to name this thing and to create a WMF repo this week. At the moment I'm leaning toward EventSiphon...

Simply EventIntake?

class EventIntake
EventIntake service
EventIntake instance

... Hm. Maybe it's not bad after all, and very obvious...

EventValve is ok too. Hm.

I put down some proposals which are more literary (JozefK, DerProcess, etc) but going with the obvious EventIntake or even EventValve is fine. Naming is hard :)

"EventGate" as in : "the place where events enter our galaxy"...

"EventGate" as in : "the place where events enter our galaxy"...

Event[Black]Hole as in the point after which clients have no idea what happens :) but I like Eventgate too.

EventGate is pretty good. It's fairly descriptive: some events are let in, others aren't. HMMM

I'm pretty seriously considering EventGate. (A downside is the connotation that words suffixed with -gate are scandals. But that's dumb so too bad :p ) Any objections?

Also, I'd like to make the canonical hosting of this repo in Github, rather than Gerrit. I want this thing to be more easily findable and contributable by others, not just WMFers. People know how to do PRs, they don't know how to make gerrit patches. The eventgate-deploy repository can still be in gerrit. Objections here? @akosiaris does Kubernetes have any opinion here?

I would argue that we want all of our code to be easily findable and contributable by others. I believe that fragmenting our code base so that some code is in one place and some in another is a very bad thing. If gerrit limits findability and contributability of our codebase, we need to address that rather than choosing to host some code elsewhere. I've formed this opinion over a long span of time working with third party extension developers, encouraging them to host their extensions in gerrit and submit patches there. There are significant benefits to doing so (CI, code updates, code search, l10n, etc.) despite the learning curve, and it becomes more difficult to continue to argue that if WMF fragments its own codebase.

Yeah, I in general agree with that, especially when code is WMF specific. However, I'm hoping that this service/library will fill a gap in the Kafka open source ecosystem. Users coming from that ecosystem don't use gerrit; users from ours (Mediawiki, etc.) do. If I was expecting Mediawiki devs to use and contribute to this, I would just put this in Gerrit. I don't really expect many MW-type folks to use this. I'm hoping that the folks like those that added all the +1s at https://github.com/confluentinc/schema-registry/issues/220 will find this useful. :)

Could we host in Github and mirror back to Gerrit?

Nuria added a comment.EditedNov 29 2018, 11:43 PM

I tend to agree with Andrew here, this is a very generic piece of code and that github issue sure makes a compelling case about putting this repo in github. If I understand how things work even if we were to do that we would need a gerrit mirror to deploy via scap, correct?

my 2 cents: EventGate is awesome, GH is the best place for this piece of code.

I tend to agree with Andrew here, this is a very generic piece of code and that github issue sure makes a compelling case about putting this repo in github.

+1 from me too on @Ottomata's thoughts.

If I understand how things work even if we were to do that we would need a gerrit mirror to deploy via scap, correct?

Correct. Since this service is planned to be put directly on k8s, we need a gerrit repo that can be used to trigger the building pipeline that tests the service and creates the needed production images. Unfortunately, though, there is no way to mirror GH repos to gerrit (there have been attempts at doing that in the past, but they were not successful). That leaves us with two options:

  1. Manually push changes to gerrit when we want to deploy
  2. Have the repo in gerrit and a mirror on GH and then manually deal with PRs, issues, etc that people put on GH

I personally prefer option 1 because it gives us a very nice opportunity to interact with the Kafka community all the while having full control over pushing and deploying. On the hand, if we go for option 2, I have a hard time imagining that saying to people opening PRs "create a gerrit account and resubmit your code there" would yield much collaboration.

I tend to agree with Andrew here, this is a very generic piece of code and that github issue sure makes a compelling case about putting this repo in github.

+1 from me too on @Ottomata's thoughts.

Yes, makes sense in that context.

Nuria added a comment.Nov 30 2018, 4:06 PM

Option 3 is reexamining why we cannot mirror from github to gerrit, right?

@mobrovac can we use Diffusion with Kubernetes? I think ORES is mirroring from Github to Diffusion.

I'm not familiar with how deploys with Kubernetes work, but I assume the work something similiar to scap but with automated Docker building stuff. We could host the deploy repo in gerrit with eventgate as a dependency that is pulled in, just like we do for other node deploys and dependencies.

mpopov added a subscriber: mpopov.Dec 5 2018, 2:28 PM
Ottomata moved this task from Backlog to In Progress on the Event-Platform board.Dec 5 2018, 10:04 PM
Ottomata added a comment.EditedFeb 20 2019, 3:29 PM

Here's what's left to do for this quarter's goal of porting over the Monolog based events to this new service:

Change 491819 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[mediawiki/vagrant@master] Hieraize npm::node_version to allow for installing later versions of NodeJS

https://gerrit.wikimedia.org/r/491819

Change 491820 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[mediawiki/vagrant@master] Puppetize eventgate role

https://gerrit.wikimedia.org/r/491820

Ottomata raised the priority of this task from Normal to High.Feb 21 2019, 5:59 PM

Change 491819 merged by Ottomata:
[mediawiki/vagrant@master] Hieraize npm::node_version to allow for installing later versions of NodeJS

https://gerrit.wikimedia.org/r/491819

Change 491820 merged by Ottomata:
[mediawiki/vagrant@master] Puppetize eventgate service in eventbus role

https://gerrit.wikimedia.org/r/491820

Change 493443 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/deployment-charts@master] eventgate-analytics Set kafka compression.codec: snappy and message.max.bytes: 4194304

https://gerrit.wikimedia.org/r/493443

Change 493444 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/deployment-charts@master] eventgate: set compression.codec: snappy and message.max.bytes: 4194304

https://gerrit.wikimedia.org/r/493444

Change 493443 abandoned by Ottomata:
eventgate-analytics Set kafka compression.codec: snappy and message.max.bytes: 4194304

https://gerrit.wikimedia.org/r/493443

Change 493444 merged by Ottomata:
[operations/deployment-charts@master] eventgate: set compression.codec: snappy and message.max.bytes: 4194304

https://gerrit.wikimedia.org/r/493444

eventgate-analytics is now handing ~6K api-request events per second. There will always be more work to do, but I'm going to consider this task done.

Ottomata changed the point value for this task from 0 to 21.
Ottomata moved this task from Next Up to Done on the Analytics-Kanban board.
Nuria added a comment.Apr 23 2019, 8:58 PM

ta-ta-channnnnnnnn

Ottomata moved this task from In Progress to Done on the Event-Platform board.May 9 2019, 7:32 PM
Nuria renamed this task from Modern Event Platform: Stream Intake Service: Implementation to Modern Event Platform: Stream Intake Service (EventGate): Implementation.May 14 2019, 9:34 PM
Nuria closed this task as Resolved.