In {T205319} we need a way to configure internal and external producers of event data. Clients will get configuration that will change the way they instrument and submit events to the [[ https://phabricator.wikimedia.org/T201068 | Stream Intake Service ]] ([[ https://wikitech.wikimedia.org/wiki/Event_Platform/EventGate | EventGate ]]).
Specifically, clients need to know:
- sample rate at which to send events (we don't want all clients to always send analytics event data
- sample token, e.g. session id or page token, see: {T205569}=
- date ranges for instrumentation experiments
- which schema is allowed in a given stream
- etc.
The configuration should be easily modifiable by Product engineers and possibly Product managers. This means outside of SWAT and Mediawiki Train. Changes should be visible to clients within a minimum period of time (hopefully minutes).
We are working on a [[ https://docs.google.com/document/d/1dpCo33RpZAbQG15nM_GcZ_zqA3sj0S4h_0CQ4Fsahkg/edit# | design document ]] for the more general product analytics use cases of [[ https://phabricator.wikimedia.org/T185233 | Modern Event Platform ]]. The design document will include implementation details of the Stream Configuration Service. I'd like to put the Stream Configuration Service through the RFC process to get input and ideas on how it should be built.
We'd like to avoid extra HTTP requests where possible. As such, the configuration should be delivered to JavaScript clients via MediaWiki, possibly via a [[ https://www.mediawiki.org/wiki/ResourceLoader/Package_modules#Data_and_config_bundling | ResourceLoader ]] module. For non MW clients, such as mobile apps, we will also need a remotely query-able URL from which to get the configuration. (TODO: does this already exist?)
Flexible configuration of remote clients might be something the WMF wants to support more broadly. It would be nice if whatever implementation is chosen here would help with any remote client configuration, not just stream producers.
//DRAFT DRAFT DRAFT//
= Implementation ideas
== git repositories== ResourceLoader
Store all config in one or more git repositories. Config wouldResourceLoader package modules will be served as static files via an HTTP serverused to deliver config to JavaScript clients. Config modifications would be made via code review, and deployed using usual deployment tools (scap, helmBy using ResourceLoader where we can, whatever). Varnish would cache configs with a small TTL. Mediawiki would load the config on page load and render the JS config along with `mw.config`e can avoid extra round trips for config. Remote clients would be able to request the configThe config will be loaded by a packageFiles callback function from the HTTP servicewherever it is stored or available.
==== Pros
- decentralized config== HTTP Endpoint
- code review used for changesThere will be clients that aren't able to use ResourceLoader, such as mobile apps. These clients will need a way to request the current config. We will set up a public endpoint that will serve config.
==== Cons
- not easy for Product managers to make changes
- deployment via scap/helm is not straightforward== Canonical storage ideas
=== mediawiki-config
- difficult to build a GUIStream configs would be stored in the mediawiki-config repository. ResourceLoader would get configs from mediawiki-config php code. We would set up an HTTP endpoint likely in the MW API somewhere that would know how to serve the config remotely to client apps.
==== Pros
- Stream - mediawiki-config repository purpose overlaps with mediawiki-config
== conftool& ResourceLoader already exist and work
==== Cons
[[ https://wikitech.wikimedia.org/wiki/Conftool | Conftool ]] is used for [[ https://wikitech.wikimedia.org/wiki/MediaWiki_and_EtcdConfig | dynamic Mediawiki config ]]. It might make sense to re-use this here,
enabling the EventLogging extension or other client side code to query the Mediawiki API for this config.- SWAT deployment schedule may not be flexible enough for Product teams.
=== git repositories
`mw.config` could be loaded with values from conftoolStore all config in one or more git repositories. Config would be served as static files via an HTTP server. Config modifications would be made via code review, and deployed using usual deployment tools (scap, helm, whatever). A Mediawiki API endpoint would allow remote clients to request the configResourceLoader would request configs from the HTTP endpoint for Mediawiki client configuration. Remote apps would request config directly from the HTTP endpoint.
==== Pros
- already exists
- easy for engineers to make changes- decentralized config
- JSONSchema validation- code review used for config supportedhanges
==== Cons
- might not be the right fit for remote clients- not easy for Product managers to make changes
- difficult for Product managers to make changes- deployment via scap/helm is not straightforward
- difficult to build a GUI (is this true?)
- currently only used by SRE(?), could this be extended for more general use?- Stream config repository purpose overlaps with mediawiki-config
== JsonConfig Mediawiki extension
We're moving schemas away from on-wiki hosting for good reasons. However, configuration is environment specific, and does not need to be decentralized in the same way. We already use the [[ https://www.mediawiki.org/wiki/Extension:JsonConfig | JsonConfig Mediawiki extension ]] for various purposes, but none are as critical as this use would be. In orResourceLoader to populate `mw.cwould request current JsonConfig`, Mediawiki would need to request current JsonConfig from meta.wikimedia.org. Remote apps could get config directly from meta.wikimedia.org.
==== Pros
- already exists
- easy for engineers and Product managers to make changes
- has a GUI (Mediawiki)
- Uses existing API and CDN (via varnish and RESTBase?) to cache and expire (is this true?)
- JsonConfig improvements (YAML, JSONSchema validation of configs) would be beneficial for other use cases
==== Cons
- JsonConfig code not well maintained, may need to be rewritten from scratch
- JsonConfig does not (currently) support YAML or JSONSchema validation of config
- Configuration of Mediawiki via a wiki is circular and weird
== mobile apps config?
TODO: how is this done now?