(NOTE: This task description was taken and modified from the [[ https://docs.google.com/document/d/1dpCo33RpZAbQG15nM_GcZ_zqA3sj0S4h_0CQ4Fsahkg/edit# | Event Platform Product Usage design document ]].)
Wikimedia’s Mediawiki deployments already have a config distribution solution: mediawiki-config and ResourceLoader. Changes to mediawiki-config can be pushed at least twice daily via the usual SWAT schedule.
Stream config will be added as a ResourceLoader module via package modules. This will expose mediawiki-config settings as JSON. For Mediawiki JavaScript clients, the stream config will be loaded along with any other registered ResourceLoader modules. However, ResourceLoader can only return full module JavaScript snippets for use in Mediawiki JavaScript. For remote clients like mobile apps, we will develop a new MW API endpoint that will serve the stream config as JSON directly. Stream config will then available via a URIs like
GET /api.php?action=stream_config
GET /api.php?action=stream_config&streams=analytics.virtual_pageview|analytics.link_click
We already have an ‘EventServiceStreamConfig’ defined for the EventBus extension. This config specifies the name of the ‘Event Service’ to which the EventBus extension should send a particular event stream. The EventServiceName is used to look up a URL endpoint in (Production|Labs)Services.php. This should also be used by the EventLogging extension JavaScript to lookup the URL to which it should send events. eventgate-analytics will be configured to also get the EventServiceStreamConfig from the stream_config API endpoint This will allow it to dynamically configure the stream -> schema mapping it uses to restrict events of certain types to specific streams.
By using mediawiki-config and ResourceLoader we get a number of benefits:
- Automated cache invalidation (via last modified timestamps)
- Config changes go through usual code review and CI process
- No new service or config repositories to maintain
- Ability to configure different settings for different wikis
Example config:
```lang=php
$wgEventStreams = [
# virtual_page_view events
[
'stream' => 'analytics.virtual_pageview'
// Must validate with a schema that has this schema title
'schema_title' => 'mediawiki/page/virtual-view',
// Client side should only produce this event 50% of the time
'sample_rate' => '0.5',
// If sample_token is not set, then just sample random. Otherwise,
// use the session_key value as the 'token' for sampling.
// See:
// https://github.com/wikimedia/mediawiki-extensions-EventLogging/blob/503f5922c9ff0b6fc2c5279c95f0101126ad99de/modules/ext.eventLogging/core.js#L179-L200
// (populationSize == 1/sample_rate)
// We need this to support 'cross-schema stitching':
// https://phabricator.wikimedia.org/T205569
'sample_token' => 'session_id'
// Used to get EventServiceUrl from ProductionServices.php
'EventServiceName' => 'eventgate-analytics-public'
],
// link clink experiment, should only run for 1 month exactly.
[
'stream' => 'analytics.link-click',
'schema_title' => 'link-click',
// Client producers should only send events if current
// dt is in this range
'time_range' => [
'begin' => '2019-10-01T00:00:00Z',
'end' => '2019-10-31T11:59:59Z'
],
// Used to get EventServiceUrl from ProductionServices.php
'EventServiceName' => 'eventgate-analytics-public'
],
// All mediawiki job streams use the same schema, use a pattern to match their config
[
'stream' => '/^mediawiki\.job\..+/',
'schema_title': 'mediawiki/job',
'EventServiceName' => 'eventgate-main'
]
]
```
(NOTE: a new REST route handler is in the works...should we somehow use it for this?)