In {T205319} we need a way to configure internal and external producers of event data. Clients will get configuration that will change the way they instrument and submit events to the [[ https://phabricator.wikimedia.org/T201068 | Stream Intake Service ]] ([[ https://wikitech.wikimedia.org/wiki/Event_Platform/EventGate | EventGate ]]).
Specifically, clients need to know:
- sample rate at which to send events (we don't want all clients to always send analytics event data
- sample token, e.g. session id or page token, see: {T205569}=
- date ranges for instrumentation experiments
- which schema is allowed in a given stream
- etc.
The configuration should be easily modifiable by Product engineers and possibly Product managers. Changes should be visible to clients within a minimum period of time (hopefully minutes).
We are working on a design document for the more general product analytics use cases of [[ https://phabricator.wikimedia.org/T185233 | Modern Event Platform ]]. The design document will include implementation details of the Stream Configuration Service. I'd like to put the Stream Configuration Service through the RFC process to get input and ideas on how it should be built.
We'd like to avoid extra HTTP requests where possible. As such, the configuration should be delivered to JavaScript clients via Mediawiki, possibly via [[ https://www.mediawiki.org/wiki/Manual:Interface/JavaScript#mw.config | mw.config ]]. For non MW clients, such as mobile apps, we will also need a remotely query-able URL from which to get the configuration. (TODO: does this already exist?)
Flexible configuration of remote clients might be something the WMF wants to support more broadly. It would be nice if whatever implementation is chosen here would help with any remote client configuration, not just stream producers.
//DRAFT DRAFT DRAFT//
= Implementation ideas
== git repositories
Store all config in one or more git repositories. Config would be served as static files via an HTTP server. Config modifications would be made via code review, and deployed using usual deployment tools (scap, helm, whatever). Varnish would cache configs with a small TTL. This is similar to how we designed the Schema Registry service (T201643).
==== Pros
- simple architecture
- decentralized config
- code review used for changes
==== Cons
- not easy for Product managers to make changes
- deployment via scap/helm is not straightforward
- difficult to build a GUI
== conftool
[[ https://wikitech.wikimedia.org/wiki/Conftool | Conftool ]] is used for [[ https://wikitech.wikimedia.org/wiki/MediaWiki_and_EtcdConfig | dynamic Mediawiki config ]]. It might make sense to re-use this here,
enabling the EventLogging extension or other client side code to query the Mediawiki API for this config.
==== Pros
- already exists
- easy for engineers to make changes
- JSONSchema validation for config supported
==== Cons
- might not be the right fit for remote clients
- difficult for Product managers to make changes
- difficult to build a GUI (is this true?)
- currently only used by SRE(?), could this be extended for more general use?
== JsonConfig Mediawiki extension
We're moving schemas away from on-wiki hosting for good reasons. However, configuration is environment specific, and does not need to be decentralized in the same way. We already use the [[ https://www.mediawiki.org/wiki/Extension:JsonConfig | JsonConfig Mediawiki extension ]] for various purposes, but none are as critical as this use would be.
Exposing configuration to client code could be generally useful for things other than streams. We could improve and use JsonConfig for this purpose, or potentially write a new similar extension with YAML support.
==== Pros
- already exists
- easy for engineers and Product managers to make changes
- has a GUI (Mediawiki)
- Uses existing API and CDN (via varnish and RESTBase?) to cache and expire (is this true?)
- JsonConfig improvements would be beneficial for other use cases
==== Cons
- JsonConfig code not well maintained, may need to be rewritten from scratch
- JsonConfig does not (currently) support YAML or JSONSchema validation of config