When first designing EventStreamConfig and its MW config var wgEventStreams, we decided to make wgEventStreams a numerically indexed rather than associative array, event though the EventStreamConfig API will return string keyed objects.
The main motivation for doing this was to have deterministic results when querying for stream configs that use 'regex stream names'. This feature allows for declaring a pattern of stream names that all have the same configs. This was needed to support the existent job queue use case, where all the mediawiki.job.* stream names use the same schema. We wanted to be able to use the numerically indexed array to guarantee that if a stream name that matches a query is encountered early in the array, it will take precedence over a later one. In this way, perhaps some mediawiki.job.specific_stream could be declared earlier in the array, and if queried for, would be used rather than a dynamically generated stream name that matches the mediawiki.job.* pattern.
This decision to use numerically a numerically indexed array now has two pretty strong downsides.
- We cannot discover streams that use a regex stream name. We have no way of using stream config to determine what streams exist that match a pattern, and thus cannot use that information to ingest them into Hadoop.
- We cannot use the StaticSiteConfig array merging feature to set per wiki overridden settings, as numerical arrays are not recursively merged, and comparison for any merging is done by key...and in this case a key of e.g. 0 in the 'testwiki' override settings will not match with key 0 in the 'default' settings.
This task is about solving 2.
While we are at it, we should also try to make it easy to declare streams in beta (in InitialiseSettings-labs.php) without messing with the defaults declared in InitaliseSettings.php too.
See also: https://docs.google.com/document/d/1dpCo33RpZAbQG15nM_GcZ_zqA3sj0S4h_0CQ4Fsahkg/edit?ts=5d2d44c2#