Page MenuHomePhabricator

Develop test environment solution for MEP analytics events
Closed, ResolvedPublic

Description

Background

As a data analyst, I don't want events generated during instrumentation development & testing to be together with the events generated by actual users running production clients because it will affect my metrics computed from client-side analytics data.

The way Modern Event Platform and Event Platform Clients work, there's currently nothing preventing from a dev/debug build of a client (e.g. MW Vagrant) from sending events to the same streams (and thus the same tables in the database) as clients in production.

Most likely bad ideas

  • Adding a is_debug boolean field to a common schema and then requiring analysts to include WHERE NOT is_debug in every query
    • No, just no
  • Setting up a separate EventGate instance for receiving events produced during testing and populating a "test" version of the database
    • Clients would need to override the destination URL of each stream, which misses the point of having the stream config specify the destination instead of hardcoding it in the client
    • Creates too much overhead
    • Requires too much maintenance

Proposal

Assuming EventGate doesn't need to see the stream configuration. (See Caveat section below otherwise.) This is a reasonable assumption because schema name and version are both sent in the event payload in the $schema field. EventGate should just look at that, validate event data against the schema repository, and if everything is good then it inserts the event into a table specified by meta.stream that's present in the same event payload. Under this assumption, all that's required to specify if the event is validated is the $schema and where it ends up after being validated is meta.stream. A client running in a test/dev/debug environment simply needs to prefix meta.stream in its payload with "beta_" before sending the event to the destination URL for those events to be separate from production events.

Benefits

  • All events generated during testing (and validated against schemas) end up in beta_* tables.
  • All of the instrumentation stays the same. Events are logged to production-version names of streams (e.g. EPC.log("edit", data) and EPC.log has internal logic which checks for some flag and prepends beta_ to stream name if running in a dev/test environment.
  • These events don't need require long term retention; all beta_* tables can just be deleted once a week every week to prevent overpopulation due to beta versions of inactive streams.
  • Analysts can work with non-beta_* tables for metrics/reports.
  • Analysts, Engineers, and QA folks only need to check beta_* tables to see if the events they generated during development/testing made it into the database without problems.

Caveat

If EventGate looks at stream config to compare the received event against, then this requires every stream (that we want to test) in the config to have a "beta_" copy of it.

  • Cons:
    • stream config up to x2 as long and in some ways redundant
    • have to manually add "beta_" copies of streams you wish to test, then remember to remove the ones you feel confident about
    • a fancy, challenging alternative to the manual approach would be to have a version of the stream config auto-generated with "beta_"-prepended stream names and then a target stream config would be stitched together from these two source stream configs
  • Pros:
    • beta_* streams can have different sampling rates like 100% for every stream since events produced to that stream are only from dev/testing and we don't want any sampling applied to those. In fact, under our ruleset the "beta_" shadow can omit the sampling rate (since 1 is assumed by default)
    • Only include beta_ shadows of streams for the instrumentation that is being worked on. Client won't log events for streams during dev/testing that don't have beta_ versions.
    • Event CC'ing still works: e.g. events sent to beta_edit stream are copied to beta_edit.growth stream

Other ideas for how to handle testing with the new MEP components are welcome.

Event Timeline

Adding @Ottomata and @Nuria for feedback and thoughts as we start thinking about the best way to do this. Past experience with the beta analytics cluster that you'd like to (or not like to) see repeated would be helpful. How is testing done for EventBus events? Do you have a vision or concept for how this should work?

Just to remind everybody that this was something that product teams cited as being a bit confusing and something they had trouble with, both apps and web teams. Whatever we decide on, we'll help document it and make sure that it's a workflow that people use. Yeah!

Feel free to rope in anybody you think would have a good opinion here.

Tagging @pmiazga because he's had good perspective on this during CR, and @Mayakp.wiki who is working on QA workflows.

LGoto triaged this task as Low priority.Dec 20 2019, 6:24 PM
Milimetric raised the priority of this task from Low to High.Apr 13 2020, 3:56 PM
Milimetric moved this task from Incoming to Event Platform on the Analytics board.

Change 627910 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[mediawiki/vagrant@master] Use eventgate-wikimedia-dev by default

https://gerrit.wikimedia.org/r/627910

Change 627910 merged by jenkins-bot:
[mediawiki/vagrant@master] Use eventgate-wikimedia-dev by default

https://gerrit.wikimedia.org/r/627910

Ottomata claimed this task.

Closing this, reopen if necessary.

I think this is done. Dev environemnts can use eventgate-wikimedia-dev (This is usable in the EventLogging extension as well as in mw-vagrant), and users can now view events in beta at https://stream-beta.wmflabs.org/v2/ui/#/