Page MenuHomePhabricator

[MEP] Determine how stream configuration is authored and deployed
Open, LowPublic

Description

Discussion of stream configuration has repeatedly collided with the iceberg of only existing in vague bags of configuration nestled somewhere in InitialiseSettings.php.

  • How to divide responsibilities between stream configuration and schema
  • How to document stream configuration
  • How to specify a structure for the stream configuration
  • How to build a process around deploying and modifying stream configuration

In contrast, our schema system, while still evolving, has the definite benefit of being tangible for its users, and easily browsed at schema.wikimedia.org.

In discussions of how to start leveling up the stream configuration, @Ottomata and I discussed the possibility of writing a stream configuration schema that could be at least validated by the EventStreamConfig extension. A second question is whether stream configuration might be maintained themselves in the same way that schema are, as part of a repository of YAML files which evolve over time and leave a paper trail. However, unlike schema, they would suffer few(er) backwards compatibility concerns, and the two could interact. Perhaps the stream configurations could even be in the same repository as the schema, just under a different directory.

The problem then becomes how this YAML (or whatever), in its own repository, makes its way into Mediawiki config. We might develop that process in a variety of ways, and that is what we can discuss on this ticket.

Event Timeline

I like this idea quite a bit. My first pass at how I'd set it up is:

  1. Create a new repo for stream configs and add it as a git submodule to operations/mediawiki-config. (Note that there is precedent for this; see the existing fonts and portals submodules.)
  2. In operations/mediawiki-config, create a new file, wmf-config/StreamConfigs.php, that reads the stream config YAML from the submodule and initializes $wgEventStreams with it. Require StreamConfigs.php in InitialiseSettings.php.

reads the stream config YAML from the submodule and initializes $wgEventStreams

How to handle per-wiki configuration? Should we just duplicate the InitialiseSettings structure into the YAML?

reads the stream config YAML from the submodule and initializes $wgEventStreams

How to handle per-wiki configuration? Should we just duplicate the InitialiseSettings structure into the YAML?

Hmm, yeah, matching the IS.php structure is probably the easiest/best way of handling it.

Create a new repo for stream configs and add it as a git submodule to operations/mediawiki-config

Instead of a new repo, could we just make a directory or file in operations/medawiki-config with the yaml config and the PHP code to read it? Does it need to be its own repo?

No, I don't think it's necessary to keep stream configs in a separate, dedicated repo. The main benefit of doing so that I can see would be that it would keep a clean git commit history separate from the very busy operations/mediawiki-config history. But that's more of a nice-to-have. Re-reading the task description just now, it seems to assume a standalone repo for stream config YAML, so I was probably just running with that assumption.

I'm not opposed to a separate repo, was just wondering. I have the feeling that fewer repos are better here, but I can agree that even just opening InitialiseSettings.php in an editor is pretty crazy (it is 26000 lines!).

Having a separate repo might make it easier for us to adapt to any changes in how operations/mediawiki-config changes over the coming years, as well as make it easier to add hooks etc and expose the repo for public browsing as in schema.wikimedia.org. For me at least, fewer repos is better mostly from a usability perspective of not needing to clone/keep track of more repositories in order to make a change, but here you've got to clone something either way (a standalone repo or operations/mediawiki-config). Having a small clean repo that only does one thing would probably make it easier for us to build an interface on top of it if we ever go that way, but more approachable for users either way. If "deployment" consists of pulling the submodule update into mediawiki-config, that seems kind of neat as well. I'd vote separate repo.

expose the repo for public browsing as in schema.wikimedia.org

Would this be better done by using the EventStreamConfig API instead? Maybe not, as it will answer with whatever is configured for whatever wiki you are using the action=streamconfigs API from.

jlinehan renamed this task from MEP: Should stream configurations be written in YAML? to [Metrics Platform] Specify stream configuration syntax relevant to Metrics Platform.Mar 3 2021, 6:51 PM
jlinehan renamed this task from [Metrics Platform] Specify stream configuration syntax relevant to Metrics Platform to [MEP] Determine how stream configuration is authored and deployed.Mar 3 2021, 6:53 PM

Another reason for moving this at least out of InitialiseSettings.php: Right now, to do different settings for different wikis, you have to copy/paste the entire stream config block to that wiki. If you just wanted to change e.g sample rate on ptwiki, you wouldn't just add a different sample rate setting for ptwiki, you'd have to copy the whole stream config entry from default to ptwiki, and then change the sample rate there. This duplicates the config settings, and if the default one changes for some reason, we'd have to also update any per-wiki overrides.

If stream config is moved outside of a statically declared PHP array, we could generate the full wgEventStreams array programmatically and DRY.

wgEventStreams is getting unruly!