Page MenuHomePhabricator

[WIP] RFC: Stream Configuration Service
Closed, DeclinedPublic

Description

In T205319: Modern Event Platform: Stream Configuration we need a way to configure internal and external producers of event data. Clients will get configuration that will change the way they instrument and submit events to the Stream Intake Service (EventGate).

Specifically, clients need to know:

  • sample rate at which to send events (we don't want all clients to always send analytics event data
  • sample token, e.g. session id or page token, see: T205569: Define cross-schema event stitching approach=
  • date ranges for instrumentation experiments
  • which schema is allowed in a given stream
  • etc.

The configuration should be easily modifiable by Product engineers and possibly Product managers. This means outside of SWAT and Mediawiki Train. Changes should be visible to clients within a minimum period of time (hopefully minutes).

We are working on a design document for the more general product analytics use cases of Modern Event Platform. The design document will include implementation details of the Stream Configuration Service. I'd like to put the Stream Configuration Service through the RFC process to get input and ideas on how it should be built.

We'd like to avoid extra HTTP requests where possible. As such, the configuration should be delivered to JavaScript clients via MediaWiki, possibly via a ResourceLoader module. For non MW clients, such as mobile apps, we will also need a remotely query-able URL from which to get the configuration. (TODO: does this already exist?)

Flexible configuration of remote clients might be something WMF wants to support more broadly. It would be nice if whatever implementation is chosen here would help with any remote client configuration, not just stream producers.

DRAFT DRAFT DRAFT

Implementation

ResourceLoader

ResourceLoader package modules will be used to deliver config to JavaScript clients. By using ResourceLoader where we can, we can avoid extra round trips for config. The config will be loaded by a packageFiles callback function from wherever it is stored or available.

HTTP Endpoint

There will be clients that aren't able to use ResourceLoader, such as mobile apps. These clients will need a way to request the current config. We will set up a public endpoint that will serve config.

Canonical storage ideas

mediawiki-config

Stream configs would be stored in the mediawiki-config repository. ResourceLoader would get configs from mediawiki-config php code. We would set up an HTTP endpoint likely in the MW API somewhere that would know how to serve the config remotely to client apps.

Pros
  • mediawiki-config & ResourceLoader already exist and work
Cons
  • SWAT deployment schedule may not be flexible enough for Product teams.

git repositories

Store all config in one or more git repositories. Config would be served as static files via an HTTP server. Config modifications would be made via code review, and deployed using usual deployment tools (scap, helm, whatever). ResourceLoader would request configs from the HTTP endpoint for Mediawiki client configuration. Remote apps would request config directly from the HTTP endpoint.

Pros
  • decentralized config
  • code review used for changes
Cons
  • not easy for Product managers to make changes
  • deployment via scap/helm is not straightforward
  • difficult to build a GUI
  • Stream config repository purpose overlaps with mediawiki-config

JsonConfig Mediawiki extension

We're moving schemas away from on-wiki hosting for good reasons. However, configuration is environment specific, and does not need to be decentralized in the same way. We already use the JsonConfig Mediawiki extension for various purposes, but none are as critical as this use would be. ResourceLoader would request current JsonConfig from meta.wikimedia.org. Remote apps could get config directly from meta.wikimedia.org.

Pros
  • already exists
  • easy for engineers and Product managers to make changes
  • has a GUI (Mediawiki)
  • Uses existing API and CDN (via varnish and RESTBase?) to cache and expire (is this true?)
  • JsonConfig improvements (YAML, JSONSchema validation of configs) would be beneficial for other use cases
Cons
  • JsonConfig code not well maintained, may need to be rewritten from scratch
  • JsonConfig does not (currently) support YAML or JSONSchema validation of config
  • Configuration of Mediawiki via a wiki is circular and weird

mobile apps config?

TODO: how is this done now?

Event Timeline

Ottomata created this task.Jul 12 2019, 6:11 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJul 12 2019, 6:11 PM
Ottomata updated the task description. (Show Details)
Ottomata updated the task description. (Show Details)Jul 12 2019, 6:57 PM
Ottomata updated the task description. (Show Details)Jul 12 2019, 7:12 PM
Ottomata updated the task description. (Show Details)
Krinkle updated the task description. (Show Details)Jul 12 2019, 7:36 PM
Krinkle added a subscriber: Krinkle.

I've changed mw.config to "a ResourceLoader module". Principally the same from JS perspective, but reflecting current best practices.

Ottomata updated the task description. (Show Details)Jul 12 2019, 8:50 PM
Ottomata closed this task as Declined.Jul 16 2019, 1:56 PM

Thanks so much for the ResourceLoader idea, Timo! It already does pretty much everything we'd need. I've talked with @jlinehan, and he thinks that keeping these configs in mediawiki-config and using ResouceLoader to distribute them will be sufficient for Product. I think that architecture is simple enough to not have to go through an RFC, so I'm declining this. Our draft Design Document is here for now:

https://docs.google.com/document/d/1dpCo33RpZAbQG15nM_GcZ_zqA3sj0S4h_0CQ4Fsahkg/edit?ts=5d2d44c2#