Develop test environment solution for MEP analytics events
Closed, ResolvedPublic
Actions

Description

Background

As a data analyst, I don't want events generated during instrumentation development & testing to be together with the events generated by actual users running production clients because it will affect my metrics computed from client-side analytics data.

The way Modern Event Platform and Event Platform Clients work, there's currently nothing preventing from a dev/debug build of a client (e.g. MW Vagrant) from sending events to the same streams (and thus the same tables in the database) as clients in production.

Most likely bad ideas

Adding a is_debug boolean field to a common schema and then requiring analysts to include WHERE NOT is_debug in every query
- No, just no
Setting up a separate EventGate instance for receiving events produced during testing and populating a "test" version of the database
- Clients would need to override the destination URL of each stream, which misses the point of having the stream config specify the destination instead of hardcoding it in the client
- Creates too much overhead
- Requires too much maintenance

Proposal

Assuming EventGate doesn't need to see the stream configuration. (See Caveat section below otherwise.) This is a reasonable assumption because schema name and version are both sent in the event payload in the $schema field. EventGate should just look at that, validate event data against the schema repository, and if everything is good then it inserts the event into a table specified by meta.stream that's present in the same event payload. Under this assumption, all that's required to specify if the event is validated is the $schema and where it ends up after being validated is meta.stream. A client running in a test/dev/debug environment simply needs to prefix meta.stream in its payload with "beta_" before sending the event to the destination URL for those events to be separate from production events.

Benefits

All events generated during testing (and validated against schemas) end up in beta_* tables.
All of the instrumentation stays the same. Events are logged to production-version names of streams (e.g. EPC.log("edit", data) and EPC.log has internal logic which checks for some flag and prepends beta_ to stream name if running in a dev/test environment.
These events don't need require long term retention; all beta_* tables can just be deleted once a week every week to prevent overpopulation due to beta versions of inactive streams.
Analysts can work with non-beta_* tables for metrics/reports.
Analysts, Engineers, and QA folks only need to check beta_* tables to see if the events they generated during development/testing made it into the database without problems.

Caveat

If EventGate looks at stream config to compare the received event against, then this requires every stream (that we want to test) in the config to have a "beta_" copy of it.

Cons:
- stream config up to x2 as long and in some ways redundant
- have to manually add "beta_" copies of streams you wish to test, then remember to remove the ones you feel confident about
- a fancy, challenging alternative to the manual approach would be to have a version of the stream config auto-generated with "beta_"-prepended stream names and then a target stream config would be stitched together from these two source stream configs
Pros:
- beta_* streams can have different sampling rates like 100% for every stream since events produced to that stream are only from dev/testing and we don't want any sampling applied to those. In fact, under our ruleset the "beta_" shadow can omit the sampling rate (since 1 is assumed by default)
- Only include beta_ shadows of streams for the instrumentation that is being worked on. Client won't log events for streams during dev/testing that don't have beta_ versions.
- Event CC'ing still works: e.g. events sent to beta_edit stream are copied to beta_edit.growth stream

Other ideas for how to handle testing with the new MEP components are welcome.

Details

	Subject	Repo	Branch	Lines +/-
	Use eventgate-wikimedia-dev by default	mediawiki/vagrant	master	+91 -64

Customize query in gerrit

Related Objects
Search...

Status	Assigned	Task
Declined	None	T259734 BUOD-KR1-Q4+: Certify that analytics schema and instruments have been upgraded to use the MEP system (clearing the legacy system for sunsetting)
Declined	Ottomata	T259157 BUOD-KR1-Q3: Require that all new schema/instruments are created with the MEP system
Resolved	Ottomata	T238837 Develop test environment solution for MEP analytics events
Resolved	mforns	T253069 Set up an instance of EventStreams in beta that will allow for consuming any stream
Resolved	Ottomata	T187102 Vagrant's /var/log/daemon.log filling up with kafka errors

Event Timeline

• jlinehan created this task.Nov 21 2019, 4:24 PM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptNov 21 2019, 4:24 PM

mpopov updated the task description. (Show Details)Nov 21 2019, 5:10 PM

mpopov updated the task description. (Show Details)Nov 21 2019, 5:17 PM

mpopov updated the task description. (Show Details)Nov 21 2019, 5:21 PM

mpopov updated the task description. (Show Details)Nov 21 2019, 5:41 PM

• jlinehan moved this task from Inbox to Sign-off on the Better Use Of Data board.Nov 21 2019, 6:37 PM

Adding @Ottomata and @Nuria for feedback and thoughts as we start thinking about the best way to do this. Past experience with the beta analytics cluster that you'd like to (or not like to) see repeated would be helpful. How is testing done for EventBus events? Do you have a vision or concept for how this should work?

Just to remind everybody that this was something that product teams cited as being a bit confusing and something they had trouble with, both apps and web teams. Whatever we decide on, we'll help document it and make sure that it's a workflow that people use. Yeah!

Feel free to rope in anybody you think would have a good opinion here.

• jlinehan added a parent task: T228175: Event Platform Client Libraries.Nov 26 2019, 4:35 PM

LGoto moved this task from Needs triage to Tracking on the Product-Infrastructure-Team-Backlog-Deprecated board.Nov 27 2019, 4:40 PM

• jlinehan moved this task from Sign-off to Done! on the Better Use Of Data board.Dec 5 2019, 3:52 PM

• jlinehan mentioned this in T238544: MEP Client MediaWiki JS (MVP).Dec 5 2019, 4:05 PM

Tagging @pmiazga because he's had good perspective on this during CR, and @Mayakp.wiki who is working on QA workflows.

• jlinehan added a subscriber: Mayakp.wiki.Dec 13 2019, 6:47 PM

LGoto triaged this task as Low priority.Dec 20 2019, 6:24 PM

Ottomata added a project: Event-Platform.Apr 7 2020, 5:40 PM

Restricted Application added a project: Analytics. · View Herald TranscriptApr 7 2020, 5:40 PM

Milimetric raised the priority of this task from Low to High.Apr 13 2020, 3:56 PM

Milimetric moved this task from Incoming to Event Platform on the Analytics board.

Ottomata moved this task from Backlog to Estimated/ Discussed on the Event-Platform board.Apr 14 2020, 1:22 PM

mpopov removed mpopov as the assignee of this task.Apr 17 2020, 5:58 PM

• jlinehan added a parent task: T259157: BUOD-KR1-Q3: Require that all new schema/instruments are created with the MEP system.Jul 29 2020, 5:02 PM

• jlinehan added a project: Product-Data-Infrastructure.Aug 3 2020, 2:32 PM

• jlinehan moved this task from Inbox to Task Backlog on the Product-Data-Infrastructure board.Aug 5 2020, 5:12 PM

• jlinehan removed a parent task: T228175: Event Platform Client Libraries.Aug 5 2020, 5:37 PM

Ottomata added a subtask: T253069: Set up an instance of EventStreams in beta that will allow for consuming any stream.Aug 17 2020, 3:34 PM

Ottomata added a subtask: T187102: Vagrant's /var/log/daemon.log filling up with kafka errors.Aug 17 2020, 3:43 PM

Ottomata mentioned this in T187102: Vagrant's /var/log/daemon.log filling up with kafka errors.

Ottomata moved this task from Estimated/ Discussed to In Progress Before Value Streams Kickoff (August 15th) on the Event-Platform board.Aug 25 2020, 4:06 PM

Ottomata moved this task from In Progress Before Value Streams Kickoff (August 15th) to Estimated/ Discussed on the Event-Platform board.

Change 627910 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[mediawiki/vagrant@master] Use eventgate-wikimedia-dev by default

https://gerrit.wikimedia.org/r/627910

gerritbot added a project: Patch-For-Review.Sep 16 2020, 8:41 PM

Change 627910 merged by jenkins-bot:
[mediawiki/vagrant@master] Use eventgate-wikimedia-dev by default

https://gerrit.wikimedia.org/r/627910

Maintenance_bot removed a project: Patch-For-Review.Sep 24 2020, 6:10 PM

• sdkim moved this task from Done! to Inbox on the Better Use Of Data board.Oct 29 2020, 4:39 PM

Ottomata closed subtask T187102: Vagrant's /var/log/daemon.log filling up with kafka errors as Resolved.Jan 11 2021, 4:37 PM

Closing this, reopen if necessary.

I think this is done. Dev environemnts can use eventgate-wikimedia-dev (This is usable in the EventLogging extension as well as in mw-vagrant), and users can now view events in beta at https://stream-beta.wmflabs.org/v2/ui/#/

Ottomata closed subtask T253069: Set up an instance of EventStreams in beta that will allow for consuming any stream as Resolved.Jan 25 2021, 3:03 PM

Develop test environment solution for MEP analytics eventsClosed, ResolvedPublicActions