In T205319: Modern Event Platform: Stream Configuration we need a way to configure internal and external producers of event data. Clients will get configuration that will change the way they instrument and submit events to the Stream Intake Service (EventGate).
Specifically, clients need to know:
- sample rate at which to send events (we don't want all clients to always send analytics event data
- sample token, e.g. session id or page token, see: T205569: Define cross-schema event stitching approach=
- date ranges for instrumentation experiments
- which schema is allowed in a given stream
The configuration should be easily modifiable by Product engineers and possibly Product managers. This means outside of SWAT and Mediawiki Train. Changes should be visible to clients within a minimum period of time (hopefully minutes).
We are working on a design document for the more general product analytics use cases of Modern Event Platform. The design document will include implementation details of the Stream Configuration Service. I'd like to put the Stream Configuration Service through the RFC process to get input and ideas on how it should be built.
Flexible configuration of remote clients might be something WMF wants to support more broadly. It would be nice if whatever implementation is chosen here would help with any remote client configuration, not just stream producers.
DRAFT DRAFT DRAFT
There will be clients that aren't able to use ResourceLoader, such as mobile apps. These clients will need a way to request the current config. We will set up a public endpoint that will serve config.
Canonical storage ideas
Stream configs would be stored in the mediawiki-config repository. ResourceLoader would get configs from mediawiki-config php code. We would set up an HTTP endpoint likely in the MW API somewhere that would know how to serve the config remotely to client apps.
- mediawiki-config & ResourceLoader already exist and work
- SWAT deployment schedule may not be flexible enough for Product teams.
Store all config in one or more git repositories. Config would be served as static files via an HTTP server. Config modifications would be made via code review, and deployed using usual deployment tools (scap, helm, whatever). ResourceLoader would request configs from the HTTP endpoint for Mediawiki client configuration. Remote apps would request config directly from the HTTP endpoint.
- decentralized config
- code review used for changes
- not easy for Product managers to make changes
- deployment via scap/helm is not straightforward
- difficult to build a GUI
- Stream config repository purpose overlaps with mediawiki-config
JsonConfig Mediawiki extension
We're moving schemas away from on-wiki hosting for good reasons. However, configuration is environment specific, and does not need to be decentralized in the same way. We already use the JsonConfig Mediawiki extension for various purposes, but none are as critical as this use would be. ResourceLoader would request current JsonConfig from meta.wikimedia.org. Remote apps could get config directly from meta.wikimedia.org.
- already exists
- easy for engineers and Product managers to make changes
- has a GUI (Mediawiki)
- Uses existing API and CDN (via varnish and RESTBase?) to cache and expire (is this true?)
- JsonConfig improvements (YAML, JSONSchema validation of configs) would be beneficial for other use cases
- JsonConfig code not well maintained, may need to be rewritten from scratch
- JsonConfig does not (currently) support YAML or JSONSchema validation of config
- Configuration of Mediawiki via a wiki is circular and weird
mobile apps config?
TODO: how is this done now?