Page MenuHomePhabricator

Modern Event Platform
Open, MediumPublic0 Estimated Story Points

Description

This is a parent task for the work to be done for the Modern Event Platform Program.

EventLogging is home grown, and was not designed for purposes other than low volume analytics in MySQL databases. However, the ideas it was based on are solid and convergently have become an industry standard, often called a Stream Data Platform. In the last two years, we have been developing the EventBus sub-system with the aim of standardizing events to be used both internally for propagating changes to update the dependent artifacts as well as exposing them to clients. While this has been a success, integrating these events with different systems requires much custom and cumbersome glue code. There exist open source technologies for integrating and processing streams of events.

Engineering teams should be able to quickly develop features that are easy to instrument and measure, as well as for those features to react to events from other systems.

As a way to begin the process of understanding existing challenges with EventLogging, we have created the following document: https://docs.google.com/spreadsheets/d/1M1A4YEdlF0T79KgQO7g4_jpzNSe-XCn3lO0_TzhO6yQ/edit?ts=5ae7bc8a#gid=0. This document is meant to list out all the steps to instrumenting and analyzing with EventLogging, indicate which ones are the most time-consuming and error-prone, identify which teams participate, and be specific about the challenges in each step.

This program also overlaps with the Better Use of Data program. See also https://docs.google.com/spreadsheets/d/16cALJVeql2euSad3GgXJjDCOVYsBRC64ietw8oRzsbI/edit#gid=0.

For some historical context see the slides at Event Infrastructure at WMF (2018).

Background reading

Background reading

Components

Each of the components described below are units of technical output of this program. They are either deployed services/tools, or documentation and policy guidelines.

Let's first define a couple of terms before the individual technical components are detailed below.

  • Event - A strongly typed and schemaed piece of data, usually representing something happening at a definite time. E.g. revision-create, user-button-click, page-load, etc.
  • Stream - A contiguous (often unending) collection of events (loosely) ordered by time.
Stream Intake Service

from internal and external clients (browsers & apps). EventLogging + EventBus do some of this already, but are limited in scope and scale. This is EventGate.

Event Schema Repositories

This is comprised of several git repositories, all pulled together and easily accessible over a simple HTTP service / filebrowser. It may eventually also have a nice GUI.

Event Schema Guidelines

Some exist already for analytics purposes, some exist for mediawiki/event-schemas. We should unify these.

Stream Connectors for ingestion to and from various state stores

(MySQL, Redis, Druid, Cassandra, HDFS, etc.) This will likely be Kafka Connect. We will need to adapt Kafka Connect to work with JSONSchemas and our Event Schema Repository.

Stream Configuration Service

Product needs the ability to have more dynamic control over how client side producers of events are configured. This includes things like sampling rate, time based event producing windows etc. (This component was originally conceived of as part of the Event Schema Repository component. It is complex and architecturally different enough to warrant its own component here.)

Stream Processing system with dependency tracking system conceptual design

Engineers should have a standardized way to build and deploy and maintain stream processing type jobs, for both analytics an production purposes. A very common use of stream processing at WMF is change-propagation, which to do well requires a dependency tracking mechanism, a very long term goal. We want to choose stream processing technologies that work toward this goal.

This component is the lowest priority of the Modern Event Platform, and as such will have more thought and planning towards the end of the program.

See also:

Timeline

FY2017-2018
  • Q4: Interview product and technology stakeholders to collect desires, use cases, and requirements.

FY2018-2019
  • Q1: Survey and choose technologies and solutions with input from Services and Operations.
  • Q2: Begin implementation and deployment of some chosen techs.
  • Q3: Deployment of eventgate-analytics stream intake service - T206785,
  • Q4: Deployment of eventgate-main stream intake service - T218346
  • Q4: Decommission Avro streams in favor of eventgate-analytics JSON based ones, T188136
  • Q4: (new) CI support for event schemas repo - T206814

FY2019-2020
Stream Intake Service - T201068

Migrate Mediawiki EventBus events to eventgate-main & deprecate eventlogging-service-eventbus

  • Q1: Continue migrating events to eventgate-main - T211248
  • Q2: Decomission eventlogging-service-eventbus (Done in Q1)
Event Schema Repositories - T201063
  • Q1: Schema repository hooks to generate dereferenced canonical version - T206812
  • Q2: Support $ref in JSONSchemas - T206824
  • Q2/Q3: Set up public HTTP endpoint for - T233630
  • Q2/Q3: Create a new 'primary' and 'secondary' schema repositories.
  • Q3: Deprecate 'mediawiki' schema repository. (Moved to Q1 2020-2021)
Stream Configuration Service - T205319
  • Q1: start planning with Audiences - Design Document
  • Q2: implementation prototype - T233634
  • Q3: Deployment and use by EventLogging and eventgate-analytics-external - T242122
Replace EventLogging Analytics

This is a long term project to be worked on in collaboration with Audiences engineers which includes work on the Event Schema Repositories and Event Stream Configuration Service components.

  • Q1: Begin planning this work with Audiences - Design Document
  • Q2: Coding work on all of these pieces (e.g. client side library to use Stream Config and POST to eventgate) - T228175
  • Q2-Q4: deployment of Stream Config Service and some usages of eventgate-analytics-external
  • Q4: Begin migrating existent EventLogging streams to EventGate - T238230 and T238138

See also: T225237: Better Use of Data

Stream Connectors

NOTE: 2019-09: This work is stalled due to licensing issues with Confluent's HDFS Connector

  • Q1: Kafka Connect development work (Kubernetes? YARN? Standalone?) - T223626
  • Q2: Kafka Connect deployment
  • Q2-Q4: Replace usages of Camus HDFS with Kafka Connect HDFS - T223628
Stream Processing System & Dependency Tracking

NOTE: 2019-11: This work is stalled due to lack of owner for dependency tracking
Work for next year:

  • collect basic requirements
  • Figure out if a streaming platform + graph db support basic requirements at scale

FY2020-2021

(As of 2020-06 these are timeline guesses, not goals.)

  • Q1-Q3: Migrate all legacy EventLogging streams to Eventgate (see also)
  • Q1: Deprecate 'mediawiki' schema repository
  • Q1: Centralize all event stream configuration in mediawiki-config
  • Q1: Automate Analytics Event Ingestion jobs using EventStreamConfig - T251609
  • Q1: Improve monitoring of Analytics Event Ingestion using canary events - T251609

Use case collection

  • JADE for ORES
  • Fundraising banner impressions pipeline
  • WDQS state updates - T244590: [Epic] Rework the WDQS updater as an event driven application
  • Job Queue (implementation ongoing)
  • Frontend Cache (varnish) invalidation
  • Scalable EventLogging (with automatic visualization in tools (Pivot, etc.))
  • Realtime SQL queries and state store updates. Can be used to verify real time that events have what they should/are valid
  • Trending pageviews & edits
  • Mobile App Events
  • ElasticSearch index updates incorporating new revisions & ORES scores
  • Automatic Prometheus metric transformation and collection
  • Dependency tracking transport and stream processing
  • Stream of reference/citation events: https://etherpad.wikimedia.org/p/RefEvents
  • Client side error logging rate limiting and de-duping via Stream Processing - T217142
  • Stream processing: Filtering exit text stream for specific keywords
  • Stream processing: diff stream
  • Stream processing: revision token stream, for ORES and for search.
  • Stream processing: realtime historical data endpoint T240387: MW REST API Historical Data Endpoint Needs
  • Stream processing: DDoS and other traffic anomaly detection:
    • Outlier detection
    • adaptive rate limiting
  • Emitting structured metadata about page edits at save+parse time (links added, images added, wikidata items added, templates used, etc.)
  • monitoring and alerting in spikes of referrers (Isaac)

(...add more as collected!)

System Diagram

Diagram Here: https://www.lucidchart.com/documents/view/ca3f0d6b-9b45-4524-aed7-299e38908d0f/0_0

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
phuedx added a comment.Oct 1 2018, 1:04 PM
  • Largish events (due to stack traces) - not huge but well over the current GET size limit.
  • Event volume is impossible to predict or control. Normally very low, if something goes wrong then one or more event per pageview.

Thanks to whoever brought those points up.

If the answer to answer to 1 is to use POST requests to submit _certain_ events via the /topics endpoint (taken from the diagram in the description), then it follows that we could batch-send several per-page events in one request.

AIUI this falls out of the

As an engineer, I want to batch produce many events at once so mobile apps can produce events after an offline period.

story in T201068: Modern Event Platform: Stream Intake Service.

Ottomata updated the task description. (Show Details)Oct 3 2018, 7:35 PM
Ottomata updated the task description. (Show Details)Dec 5 2018, 4:55 PM
Ottomata updated the task description. (Show Details)Dec 5 2018, 6:22 PM
Ottomata updated the task description. (Show Details)
Ottomata updated the task description. (Show Details)
Ottomata moved this task from Backlog to Parent Tasks on the Event-Platform board.Dec 5 2018, 10:06 PM
Ottomata updated the task description. (Show Details)Jan 17 2019, 9:38 PM
Ottomata updated the task description. (Show Details)Jan 22 2019, 7:09 PM
Ottomata updated the task description. (Show Details)May 17 2019, 3:25 PM
Ottomata updated the task description. (Show Details)
Ottomata updated the task description. (Show Details)
Ottomata updated the task description. (Show Details)May 17 2019, 3:55 PM
Ottomata updated the task description. (Show Details)May 23 2019, 3:01 PM
Ottomata updated the task description. (Show Details)Jun 19 2019, 4:40 PM
Ottomata updated the task description. (Show Details)Jun 28 2019, 4:59 PM
Ottomata updated the task description. (Show Details)Jun 28 2019, 5:30 PM
Ottomata updated the task description. (Show Details)Jul 1 2019, 6:31 PM
Ottomata updated the task description. (Show Details)
Ottomata updated the task description. (Show Details)Jul 24 2019, 3:24 PM
Ottomata updated the task description. (Show Details)Sep 25 2019, 1:31 PM
Ottomata updated the task description. (Show Details)Sep 25 2019, 2:38 PM
Ottomata updated the task description. (Show Details)Oct 21 2019, 2:09 PM
Ottomata updated the task description. (Show Details)Nov 12 2019, 9:46 PM
Ottomata updated the task description. (Show Details)Nov 13 2019, 4:40 PM
Ottomata updated the task description. (Show Details)Jan 2 2020, 6:55 PM
Ottomata updated the task description. (Show Details)Jan 6 2020, 2:41 PM
Ottomata removed subscribers: Tbayer, chelsyx.
Ottomata updated the task description. (Show Details)Jan 13 2020, 8:06 PM
Ottomata updated the task description. (Show Details)Feb 6 2020, 3:42 PM
Ottomata updated the task description. (Show Details)Feb 19 2020, 7:12 PM
Ottomata updated the task description. (Show Details)Feb 19 2020, 7:19 PM
Ottomata updated the task description. (Show Details)Feb 19 2020, 7:24 PM
Ottomata updated the task description. (Show Details)Feb 26 2020, 5:23 PM
Ottomata updated the task description. (Show Details)Mar 2 2020, 6:04 PM
Jhernandez removed a subscriber: Jhernandez.Apr 2 2020, 6:46 PM
Ottomata updated the task description. (Show Details)Apr 6 2020, 2:20 PM
Ottomata updated the task description. (Show Details)Apr 6 2020, 2:48 PM
Ottomata updated the task description. (Show Details)Jun 30 2020, 1:04 PM
Ottomata updated the task description. (Show Details)
Ottomata updated the task description. (Show Details)Jul 8 2020, 3:24 PM
Ottomata updated the task description. (Show Details)Jul 8 2020, 5:07 PM
Ottomata updated the task description. (Show Details)
Ottomata updated the task description. (Show Details)Jul 8 2020, 5:28 PM
Ottomata updated the task description. (Show Details)Jul 8 2020, 6:13 PM
Ottomata updated the task description. (Show Details)Jul 23 2020, 7:09 PM
Ottomata updated the task description. (Show Details)Sep 14 2020, 3:09 PM
mforns renamed this task from Modern Event Platform (TEC2) to Modern Event Platform.Sep 14 2020, 4:48 PM
Ottomata updated the task description. (Show Details)Sep 15 2020, 1:46 PM
Ottomata updated the task description. (Show Details)Sep 15 2020, 8:33 PM
Ottomata updated the task description. (Show Details)
Ottomata updated the task description. (Show Details)Sep 15 2020, 8:36 PM