## Background
As a data analyst, I don't want events generated during instrumentation development & testing to be together with the events generated by actual users running production clients because it will affect my metrics computed from client-side analytics data.
The way Modern Event Platform and Event Platform Clients work, there's currently nothing preventing from a dev/debug build of a client (e.g. MW Vagrant) from sending events to the same streams (and thus the same tables in the database) as clients in production.
## Most likely bad ideas
- Adding a `is_debug` boolean field to a common schema and then requiring analysts to include `WHERE NOT is_debug` in every query
- No, just no
- Setting up a separate EventGate instance for receiving events produced during testing and populating a "test" version of the database
- Clients would need to override the destination URL of each stream, which misses the point of having the stream config specify the destination instead of hardcoding it in the client
- Creates too much overhead
- Requires too much maintenance
## Proposal
Assuming EventGate doesn't need to see the stream configuration. (See //Caveat// section below otherwise.) This is a reasonable assumption because schema name //and// version are both sent in the event payload in the `$schema` field. EventGate should just look at that, validate event data against the schema repository, and if everything is good then it inserts the event into a table specified by `meta.stream` that's present in the same event payload. Under this assumption, all that's required to specify //**if**// the event is validated is the `$schema` and //**where**// it ends up after being validated is `meta.stream`. A client running in a test/dev/debug environment simply needs to prefix `meta.stream` in its payload with "beta_" before sending the event to the destination URL for those events to be separate from production events.
### Benefits
- All events generated during testing (and validated against schemas) end up in `beta_*` tables.
- All of the instrumentation stays the same. Events are logged to production-version names of streams (e.g. `EPC.log("edit", data)` and `EPC.log` has internal logic which checks for some flag and prepends `beta_` to stream name if running in a dev/test environment.
- These events don't need require long term retention; all `beta_*` tables can just be deleted once a week every week to prevent overpopulation due to beta versions of inactive streams.
- Analysts can work with non-`beta_*` tables for metrics/reports.
- Analysts, Engineers, and QA folks only need to check `beta_*` tables to see if the events they generated during development/testing made it into the database without problems.
### Caveat
If EventGate looks at stream config to compare the received event against, then this requires every stream (that we want to test) in the config to have a "beta_" copy of it.
- **Cons**:
- stream config //up to// x2 as long and in some ways redundant
- have to manually add "beta_" copies of streams you wish to test, then remember to remove the ones you feel confident about
- a fancy, challenging alternative to the manual approach would be to have a version of the stream config auto-generated with "beta_"-prepended stream names and then a target stream config would be stitched together from these two source stream configs
- **Pros**:
- `beta_*` streams can have different sampling rates like 100% for every stream since events produced to that stream are only from dev/testing and we don't want any sampling applied to those. In fact, under our ruleset the "beta_" shadow can omit the sampling rate (since 1 is assumed by default)
- Only include `beta_` shadows of streams for the instrumentation that is being worked on. Client won't log events for streams during dev/testing that don't have `beta_` versions.
- Event CC'ing still works: e.g. events sent to `beta_edit` stream are copied to `beta_edit.growth` stream
-----
Other ideas for how to handle testing with the new MEP components are welcome.