Page MenuHomePhabricator

[Goal] M1: Metrics Platform: Control Plane: Analytics instrumentation stream management UI
Open, Needs TriagePublic

Description

Wireframes: https://miro.com/app/board/uXjVMfwX4PI=/

Notes

  1. All of this work can be done in the EventStreamConfig MediaWiki extension (herein ESC) because it's closely related to the creation of streams
  2. Because ESC is already deployed:
    • We have a responsibility to ensure that the code that we merge is of a high quality and well-tested as it will ride the train the train
    • Likewise, the features that we merge should only be available to authorised users

TODO

Tasks

Event Timeline

phuedx updated the task description. (Show Details)

For reference, we had a meeting and sync today and discussed some Control Plane implementation details and questions. Notes doc here:

I'd say there are 3 main questions about Control Plane AKA 'dynamic stream config' that should be addressed in a design doc.

  1. How are conflicts between dynamic (database) config and hardcoded config handled and resolved?
  2. How do global (metawiki) configs fit with per wiki configs? Currently, this is handled by WMF's mediawiki-config system, and per wiki settings are requested at that wiki's domain API. If we use a MediaWiki database table to store configs: on which wiki? all? only meta? If only meta, how do clients request this per wiki config?
  3. Stream dataset governance. The 'control plane' aims to add some stream dataset governance (ownership, other metadata, etc.) to stream config. Great! But this should be done for not just Metrics Platform. datahub.wikimedia.org is intended to be a centralized data catalog of all datasets at WMF. For other datasets, governence metadata will belong here. Likely stream dataset metadata belongs here too.

hi @Ottomata thanks for your intervention - it actually helped clarify a lot of fuzziness in my mind about how to build this thing.

For starters, here is a prelim design doc (still WIP and I will do my best to keep it updated as progress unfolds) that I hope addresses the main questions you posed.
I'll attempt to provide adequate inline responses here too:

  1. How are conflicts between dynamic (database) config and hardcoded config handled and resolved?

The dynamic config is meant to be the canonical config for Metrics Platform (MP) streams only. We would need to do some refactoring of the EventStreamConfig extension (namely StreamConfigs class) to figure out how to incorporate database reads into the building of the $streamConfigs array (would love your advice on this actually - I took a stab at outlining 2 approaches in the design doc).

In theory there wouldn't be conflicts (except for the 3 MP streams that are currently in prod - noted in doc) between hardcoded config that defines legacy EventLogging and Modern Event Platform (MEP) streams, and the dynamic config that informs MP streams. Because MP is additive, there wouldn't be a need to migrate per se since both implementations would be active side-by-side for a period. In some future state where MP is a success and adoption is widespread, once data parity is established for a given instrument, the hardcoded config for a legacy and/or MEP stream could just be removed and the corresponding MP stream will continue collecting events.

  1. How do global (metawiki) configs fit with per wiki configs? Currently, this is handled by WMF's mediawiki-config system, and per wiki settings are requested at that wiki's domain API. If we use a MediaWiki database table to store configs: on which wiki? all? only meta? If only meta, how do clients request this per wiki config?

In the MP context, after some discussion with @Ladsgroup, there will be a separate table that tracks per wiki overrides on metawiki which is proposed as the source of truth for MP streams for all wikis. Clients would be doing reads on every pageview to get the active stream config for an MP stream from the metawiki DB the query results of which would be cached by APCu - each prod host would have a TTL of 1 hour upon which cache is invalidated and live config is fetched. Until I actually start writing patches, I'm still a little unclear about how this all will work - I have more meetings set up next week to iron out deets.

  1. Stream dataset governance. The 'control plane' aims to add some stream dataset governance (ownership, other metadata, etc.) to stream config. Great! But this should be done for not just Metrics Platform. datahub.wikimedia.org is intended to be a centralized data catalog of all datasets at WMF. For other datasets, governence metadata will belong here. Likely stream dataset metadata belongs here too.

That is an important point and I 100% agree - if we move forward with building the MP Control Plane on MetaWiki, presumably it's feasible to feed the active stream config metadata from metawiki to datahub for MP streams? Could we create an API endpoint that datahub could read for MP stream dataset metadata?
As for datasets generated outside of Metrics Platform, I hope it's not a cop out to say they lie outside the scope of this project?
I will defer to the relevant product owners of this space (cc @DAbad) to inform what the resolution should be around those questions.

Thanks @cjming, this looks really great. I've only briefly read, and will read more deeply and comment next week.

Just a few of quick thoughts:

The dynamic config is meant to be the canonical config for Metrics Platform (MP) streams only

I appreciate the intention of making sure that this dynamic config is isolated from the hardcoded config by scoping it to this particular use case. However, stream configuration is a lower level concept/abstraction than Metrics Platform, and I think it would be wise to not add anything Metrics Platform specific to EventStreamConfig. EventStreamConfig is a dependency of MP (and EventLogging, and EventBus, and many other things). MP should not be a dependency of EventStreamConfig.

But, this is a good thing! I think we can do most of what you propose by just making sure we get the resolution between dynamic and hardcoded configs just right. We're going to have to handle dynamic config merging anyway (to support the per-wiki overrides), so we might as well implement this merging in such a way that hardcoded configs always take precedence (with some other possible safety checks in there too).

We'll be adding an (optional) dynamic stream config feature to the EventStreamConfig extension.

Clients would be doing reads on every pageview to get the active stream config for an MP stream from the metawiki DB

Makes sense. One quick q about this one: is this okay from an browser HTTP request standpoint? Currently the configs are shipped to browser clients via ResourceLoader, which I'm pretty sure doesn't work cross domain. You're going to have to request the configs directly from the metawiki API.

While we're at it, we should consult with the API Platform (cc @BPirkle ?) folks to check if we should be digging ourselves deeper into the MW Action API. Maybe they'll say go for it, but they might have something better for us that we should at least consider. Not sure.

BTW, if you haven't seen, some relevant old convos and context:

Specifically:

With the exception of topic -> schema mapping, and possibly ownership, product folks want to be able to easily change the schema usage configuration dynamically, without having to do a SWAT or MW train deploy. They don't necessarily want these things to be editable in a GUI, but they do want their engineers/analysts to be able to change these settings at will. For example, if a sampling rate is changed, say from 1/1000 1/100, clients should start sampling differently.

The decentralization we get from using git for schema storage will help a lot with development use cases. However, it might not be as necessary to have a decentralized storage for this type of configuration. I could see a centralized configuration storage database/service where these things are modified. I could also see all of this configuration living in git. Either would be fine. In either case there will need to be a read-only GUI that allows product managers to know what e.g. sampling settings are at any given time.

@Krinkle yesterday you had some thoughts about this service. I know you were worried about forcing clients to phone home, but I think that could be optional. Not every schema-usage will need to have its client's dynamically configured. $wgEventLoggingPhoneHome = false :) Also, perhaps MW (or whatever) could do the phoning-home to get configuration when rendering the page load response, instead of having the client send a separate request via Javascript later?

It looks like we were asking ourselves some of these very questions 5 years ago!

Oh, something else to address for this one:

How do global (metawiki) configs fit with per wiki configs?

Currently, beta wikis can also be configured via mediawiki-config. This means that streams declared in production are available in beta, but not necessarily vice-versa. How should the instrumentation developers work with beta once stream configs are in a metawiki db table? I suppose those will be different systems, so they'll have to handle manually declaring their streams via the new UI, but in meta.beta? Or, will beta wikis also phone home to production metawiki? (probably not a good idea).

Stream dataset governance. [...]

Great! But this should be done for not just Metrics Platform. [...]

That is an important point and I 100% agree - if we move forward with building the MP Control Plane on MetaWiki, presumably it's feasible to feed the active stream config metadata from metawiki to datahub for MP streams? Could we create an API endpoint that datahub could read for MP stream dataset metadata?

Yes, we plan on ingesting event stream dataset info from EventStreamConfig to datahub. T318863: [Event Platform] Event Platform and DataHub Integration.

But we don't want to use EventStreamConfig for dataset governance. Looking at your proposed schema, since we are using MW users/auth, it makes sense to have an actor field. I wouldn't go much beyond that though. Stuff like team ownership data stewards, etc. belong in the data catalog / datahub.

I appreciate the intention of making sure that this dynamic config is isolated from the hardcoded config by scoping it to this particular use case. However, stream configuration is a lower level concept/abstraction than Metrics Platform, and I think it would be wise to not add anything Metrics Platform specific to EventStreamConfig. EventStreamConfig is a dependency of MP (and EventLogging, and EventBus, and many other things). MP should not be a dependency of EventStreamConfig.
...
We'll be adding an (optional) dynamic stream config feature to the EventStreamConfig extension.

Agree and duly noted - then adding an optional dynamic stream config feature to ESC is the path forward here - and the StreamConfigs service class would provide the merged data? Would an extension hook into this process somehow and provide the info needed (i.e. wiki db name and queries) for getting the database/cache results which would then be merged by ESC?

When you say "we'll be adding...", is this a task that Event Platform team is undertaking? I'm happy to try to move this forward

And for the MP-specific features like the Special Pages UIs, it sounds like you'd rather not have this stuff live in ESC but say in a separate new extension that hooks into the ESC stream configs service?

is this okay from an browser HTTP request standpoint? Currently the configs are shipped to browser clients via ResourceLoader, which I'm pretty sure doesn't work cross domain. You're going to have to request the configs directly from the metawiki API.

I'm confused about this -- if the merged data is delivered by the ESC Stream Configs service (including per wiki overrides) as noted above, doesn't that suffice?

While we're at it, we should consult with the API Platform (cc @BPirkle ?) folks to check if we should be digging ourselves deeper into the MW Action API. Maybe they'll say go for it, but they might have something better for us that we should at least consider. Not sure.

I'll inquire and follow up

Agree and duly noted - then adding an optional dynamic stream config feature to ESC is the path forward here - and the StreamConfigs service class would provide the merged data? Would an extension hook into this process somehow and provide the info needed (i.e. wiki db name and queries) for getting the database/cache results which would then be merged by ESC?

Yes, it seems like the code and API needs to be updated to consider remote wiki configuration, eh? I'm not sure how to reconcile this with the mediawiki-config per wiki config stuff though. This will need a lot more thought. We may need to move the hardcoded configs out of mediawiki-config altogether?

When you say "we'll be adding...", is this a task that Event Platform team is undertaking? I'm happy to try to move this forward

Sorry, no, I was just summarizing the work that "we" EventStreamConfig developers (including you :) ) would be doing.

And for the MP-specific features like the Special Pages UIs, it sounds like you'd rather not have this stuff live in ESC but say in a separate new extension that hooks into the ESC stream configs service?

Hm, what is MP specific that is not just editing stream configuration settings?

I'm confused about this -- if the merged data is delivered by the ESC Stream Configs service (including per wiki overrides) as noted above, doesn't that suffice?

EventLogging uses EventStreamConfig PHP API to get relevant stream configs (in memory loaded via mediawiki-config), and then ships them to the browser client in a JSON file via ResourceLoader on page load. There is no extra HTTP request to the MW API for these.

if the merged data is delivered by the ESC Stream Configs service

So, it sounds like we will be making the EventStreamConfig extension loaded on a specific wiki ALWAYS reach out to the EventStreamConfig MW HTTP Action API endpoint on metawiki?

Agree and duly noted - then adding an optional dynamic stream config feature to ESC is the path forward here - and the StreamConfigs service class would provide the merged data? Would an extension hook into this process somehow and provide the info needed (i.e. wiki db name and queries) for getting the database/cache results which would then be merged by ESC?

I don't know the details of this case but it's quite easy and doable to read from another wiki's databases in production. Most of wikidata's database reads are not from wikidata.org but from other wikis.

it sounds like we will be making the EventStreamConfig extension loaded on a specific wiki ALWAYS reach out to the EventStreamConfig MW HTTP Action API endpoint on metawiki?

this is where I feel like I'm not grokking 100% -- aiui ESC extension on a specific wiki will always need to check cache/db (metawiki) directly for dynamic config, it doesn't need to hit that api endpoint -- ESC just needs to merge whatever config it gets from the config vars with the db/cache queries per whatever rules we define.

it's quite easy and doable to read from another wiki's databases in production

Ah, okay then! That does make things easier. Didn't realize this was allowed. So, we'll make EventStreamConfig tables global to all wikis.

@Ladsgroup in that case, would it be better/possible to not associate the tables with any (meta) wiki at all? Is it possible to have a non-wiki/global database that all wikis use? There is no logical need to have metawiki involved here.

Yeah. you can put it in x1 cluster. Accessible from mediawiki but central and not bound to a wiki. We already have multiple cases of that. For example url shortener lives in mediawiki but the short urls mapping are stored in x1 (e.g. w.wiki/5 mapping to its target). The coding for it is a bit awkward though but that's in my roadmap of improving rdbms library (T330590: External LBs should not be exposed to developers)

Cool. In that case, I suppose the MW UI that edits this data should only be enabled on metawiki, but reads can happen from anywhere? Does that sound right @cjming?

Cool. In that case, I suppose the MW UI that edits this data should only be enabled on metawiki, but reads can happen from anywhere? Does that sound right @cjming?

If EventStreamConfig extension is deployed everywhere, sure. The way url shortener extension used to do it was to have a "read-only" mode in configuration which was false in metawiki and true everywhere else.

If EventStreamConfig extension is deployed everywhere, sure

Ya, it is. Its used to serve config to EventLogging via ResourceLoader on page loads. Great!

While we're at it, we should consult with the API Platform (cc @BPirkle ?) folks to check if we should be digging ourselves deeper into the MW Action API. Maybe they'll say go for it, but they might have something better for us that we should at least consider. Not sure.

I'm missing a ton of context, but there's nothing wrong with using the Action API if it meets your needs. Alternatively, the MediaWiki REST API aka Core REST API aka rest.php is also extensible. Although from later discussion it sounds like cross-wiki db access may be even better for your situation.

Happy to answer any specific questions if needed.

the MW UI that edits this data should only be enabled on metawiki, but reads can happen from anywhere? Does that sound right @cjming?

@Ottomata yes! this is the proposed solution

Here are the steps that I'm intuiting need to happen pending some Qs (and based on the convo we just had):

In order to preserve decoupling between ESC + MP, creation of dynamic stream configs would still necessitate deployments but editing/updating (and expiry?) of them could be done thru GUI:

  • Create a new MP extension (to house the db schema, Special Pages, CRUD ops)

OR

  • Put all this functionality inside EventLogging which currently already has some MP-specific features.

However IF dynamic stream config creation is a requirement:

  • Introduce dynamic stream config feature in ESC:
    • Provide generic way to query (wiki dbname, the query itself, global cache key) for dynamic stream configs
    • Merge data to be made available (hard-coded config takes precedence) in StreamConfigs service class
    • db schema, gui, etc would live inside ESC

[…]

  • Create a new MP extension

OR

  • Put all this functionality inside EventLogging

IF dynamic stream config creation is a requirement:

  • […] db schema, gui, etc would live inside ESC

I have no opinion on whether to develop the central schema and GUI in the existing EventStreamConfig extension or a new MetricsPlatform extension (though the former seems easier).

However, I would recommend against placing it inside EventLogging. I think it is valueable for EventLogging to remain no-op by default with zero-setup or dependencies. The extension is a common dependency for popular MediaWiki features, in order to satisfy the EventLogging interface even on third-party wikis where it would be exist without the effectively WMF-specific services of EventBus and EventStreamConfig.

Developing the control plane in EventLogging would likely mean choosing against keeping your code simple and enabled by default for local development, as it this would be mutually exclusive with writing it such that it is disabled by default and with no hard dependency on EventStreamConfig.

Not opposing anything @Krinkle says, but @Krinkle IIUC EventLogging does already have soft/optional dependencies on both EventStreamConfig and 'wikimedia/metrics-platform' (vendor?). I'm guessing your reason for recommending not putting the table and GUI in EventLogging is because...its a DB table, not just code and config? Or, something else?

(FWIW, here is a comment I made on EventLogging vs Metrics Platform library dependencies.)

IIUC EventLogging does already have soft/optional dependencies […]

That's correct. It is not a fixed cost however, doing more of the same would make the new code being developed more complex than it otherwise would be.

I'm guessing your reason for recommending not putting the table and GUI in EventLogging is because...its a DB table, not just code and config?

Not exactly. The way we tend to develop extension that are "global" or "central" in nature (GlobalCssJs, Wikibase, UrlShortener, CentralNotice, GlobalUsage, GlobalPreferences, etc) is to make them easy to work with in CI and locally for staff and other contributors, by using the singular local wiki as the central wiki for itself by default. I suggested this pattern to @cjming as well. However, when placed in EventLogging, which is widely installed as no-op, it would result in the creation of a database table on each wiki, and register a SpecialPage on the wiki that has no relevant to it — unless we code it such that it is all with optional dependencies, and behind layers of feature flags that make it all conditionally enabled. That seems expensive, risky, and likely to cause problems from time to time both in the form of bugs, and in the form of increasing on-boarding difficulty for new staff.

When placed in EventStreamConfig, it seems this pattern would work well. I can't think of a reason for needing to develop it there in a way other than to simply expose and provision it by default when EventStreamConfig locally, even if left empty/unused.

Got it thanks.

EventStreamConfig has some of the same difficulties, but not as many, since it is only used by WMF. Doing it in ESC would be fine, but ESC should stay agnostic to the clients that use it, so if we do it we'll have to do it in a more "dynamic stream config" feature way, rather than the "metrics platform config" way, which is more complicated for MP developers to deal with.

So if we don't do it in ESC, we either need a new extension...oo, or what about the WikimediaEvents extension?

So I'm getting the message that EventStreamConfig and EventLogging extensions are not good candidates for this new feature for the various reasons noted above.

I believe the original intent behind this goal was to try to bypass things like security and performance etc reviews and try to expedite a proof of concept iteratively. And based on the steps that are required to get a new extension deployed, I'm inclined not to build another events-related extension but find an existing events-related extension that's deployed everywhere to contain this stuff with the caveat that the db + guis would live on MetaWiki alone.

@Krinkle is WikimediaEvents a reasonable option for the MP Control Plane?

@Ottomata Is ESC still an option? I get that it'll be more complex, need to be agnostic, etc. I'm leaning towards this because I think that was Sam's original plan but I don't know who makes the final decision here and how to arrive at a consensus for where to build this thing.

Is ESC still an option?

Sure, I think ESC is fine as long as we implement it agnostically like you say. The beta config stuff will be complicated and change the way things work now, but that's fine as long as it is accounted for and documented.

Are we still waiting from @DAbad to hear about Emil's original requirements for dynamic stream creation?

thanks @Ottomata - happy to have your green light -- I'll do my best to wrap up the design doc including beta config and confer with you there on that.

As for dynamic stream creation, when @DAbad and I recently spoke about it, I got the go ahead to move forward without it -- but if this is going to live in ESC now anyway, is it ok to make a provision for it? Or should we still plan on excluding dynamic stream creation for now?

My preference: if we don't need dynamic stream creation, I'd prefer not to add it.

but if this is going to live in ESC now anyway

If we don't need dynamic stream creation, then it shouldn't live in ESC, right?

alrighty, I believe this is the current consensus:

No dynamic stream creation for now -- MP streams will still need to be created/deployed like how all stream configs are currently.
Consequently the MP CP should not be contained in ESC until/unless dynamic stream creation is back on the table.

The management of these dynamic MP stream configs can still be handled by the MP CP which can live in the WikimediaEvents extension (db, special pages on MetaWiki only).

Per discussion with @Krinkle and as I understand it, there needs to be a way to deliver the dynamic stream configs alongside the stream configs that are available via ESC's API. When it was proposed for MP CP to live in ESC, we could just merge the two stream config arrays there. If we don't want to disrupt how current PHP and API calls are made to fetch stream configs, how to override the data that is read from wmf-config/ext-EventStreamConfig.php by ResourceLoader which gets this data from the ESC.StreamConfigs service class?
In other words, where does the db/cache query take place to override MP stream configs if they are still to be deployed as hard-coded config?

Attribution to @Krinkle for the following options to address:

  1. Add a hook in EventLogging's PHP function that produces 'data.json' that WME/MP can use to register a callback to modify the stream configs array before it gets returned
  2. Add a hook in ESC's service class which is called by EventLogging that can be used by WME/MP to modify the stream configs array

I'm not sure if there are other alternatives to consider - of the two above, I defer to Timo's view that it'll be easier/safer to do the hook in ESC.

Why do the stream configs need modified / merged at all? MP CP will be a totally standalone configuration service, used for configuring the behavior of MP clients. You can serve up the MP specific configuration via ResourceLoader in WikimediaEvents, and then in Javascript client side, use mw.eventLog.streamConfigs + your custom MP CP?

(Actually...given what Timo said above, perhaps all of the MP specific code in EL really belongs in WikimediaEvents anyway? Metrics Platform is a Wikimedia specific producer of events? Anyway, ignore this comment...:D )

Create a new service in WME to provide MP dynamic stream configs - this seems straightforward - db/cache query happens here.

What I'm not clear about in this new approach is how this MP stream configs service is accessed by any given wiki in the current contexts where MP clients are available (php, js. java/android - swift/ios TK). Back when ESC was an option, it seemed clear to me how to do this - piggyback on the ESC.StreamConfigs service class which is already called from EventBus, EventLogging.
Hence the notion of a hook in ESC.

But with a new service, I would include it in EventLogging and EventBus and merge this with the ESC stream configs there?

Hm, right, I often forget about the non MW+EventLogging clients. For MW browsers, I was imagining:

  • WikimediaEvents MP code loads MP config
  • EventLogging loads stream config
  • WikimediaEvents JavaScript MP code uses EventLogging JS API to get stream config, and uses that with MP config to modify MW JS instrumentation behavior.

For mobile apps, you'll need an MP config HTTP API? So:

  • WikimediaEvents MP code loads MP config
  • EventLogging loads stream config
  • WikimediaEvents PHP MP code uses EventLogging PHP API to get stream config, and uses that with MP config to host an MP config API that mobile apps can request from?

WikimediaEvents already depends on EventLogging, so is seems sufficient enough to use it to get the stream configs you need. But, you could also consider adding a direct dependency from WikimediaEvents on EventStreamConfig, and using ResourceLoader stuff to ship stream configs to WikimediaEvents MP code just like EventLogging does.