Change Details

## Background As a data analyst, I don't want events generated during instrumentation development & testing to be together with the events generated by actual users running production clients because it will affect my metrics computed from client-side analytics data. The way Modern Event Platform and Event Platform Clients work, there's currently nothing preventing from a dev/debug build of a client (e.g. MW Vagrant) from sending events to the same streams (and thus the same tables in the database) as clients in production. ## Most likely bad ideas - Adding a `is_debug` boolean field to a common schema and then requiring analysts to include `WHERE NOT is_debug` in every query - No, just no - Setting up a separate EventGate instance for receiving events produced during testing and populating a "test" version of the database - Clients would need to override the destination URL of each stream, which misses the point of having the stream config specify the destination instead of hardcoding it in the client - Creates too much overhead - Requires too much maintenance ## Proposal Assuming EventGate doesn't need to see the stream configuration. (See //Caveat// section below.) This is a reasonable assumption because schema name //and// version are both sent in the event payload in the `$schema` field. EventGate should just look at that, validate event data against the schema repository, and if everything is good then it inserts the event into a table specified by `meta.stream` that's present in the same event payload. Under this assumption, all that's required to specify //**if**// the event is validated is the `$schema` and //**where**// it ends up after being validated is `meta.stream`. A client running in a test/dev/debug environment simply needs to prefix `meta.stream` in its payload with "beta_" before sending the event to the destination URL for those events to be separate from production events. ### Benefits - All events generated during testing (and validated against schemas) end up in `beta_*` tables. - These events don't need require long term retention; all `beta_*` tables can just be deleted once a week every week to prevent overpopulation due to beta versions of inactive streams. - Analysts can work with non-`beta_*` tables for metrics/reports. - Analysts, Engineers, and QA folks only need to check `beta_*` tables to see if the events they generated during development/testing made it into the database without problems. ### Caveat If EventGate looks at stream config to compare the received event against, then this would require every stream in the config to have a "beta_" copy of it. - **Cons**: stream config //up to// x2 as long and in some ways redundant - **Pros**: - `beta_*` streams can have different sampling rates like 100% for every stream since events produced to that stream are only from dev/testing. In fact, under our ruleset the "beta_" shadow can omit the sampling rate (since 1 is assumed by default) - only include `beta_` shadows of streams for the instrumentation that is being worked on. Client won't log events for streams during dev/testing that don't have `beta_` versions. - Event CC'ing still works: e.g. events sent to `beta_edit` stream are copied to `beta_edit.growth` stream ----- Other ideas for how to handle testing with the new MEP components are welcome.

## Background As a data analyst, I don't want events generated during instrumentation development & testing to be together with the events generated by actual users running production clients because it will affect my metrics computed from client-side analytics data. The way Modern Event Platform and Event Platform Clients work, there's currently nothing preventing from a dev/debug build of a client (e.g. MW Vagrant) from sending events to the same streams (and thus the same tables in the database) as clients in production. ## Most likely bad ideas - Adding a `is_debug` boolean field to a common schema and then requiring analysts to include `WHERE NOT is_debug` in every query - No, just no - Setting up a separate EventGate instance for receiving events produced during testing and populating a "test" version of the database - Clients would need to override the destination URL of each stream, which misses the point of having the stream config specify the destination instead of hardcoding it in the client - Creates too much overhead - Requires too much maintenance ## Proposal Assuming EventGate doesn't need to see the stream configuration. (See //Caveat// section below otherwise.) This is a reasonable assumption because schema name //and// version are both sent in the event payload in the `$schema` field. EventGate should just look at that, validate event data against the schema repository, and if everything is good then it inserts the event into a table specified by `meta.stream` that's present in the same event payload. Under this assumption, all that's required to specify //**if**// the event is validated is the `$schema` and //**where**// it ends up after being validated is `meta.stream`. A client running in a test/dev/debug environment simply needs to prefix `meta.stream` in its payload with "beta_" before sending the event to the destination URL for those events to be separate from production events. ### Benefits - All events generated during testing (and validated against schemas) end up in `beta_*` tables. - All of the instrumentation stays the same. Events are logged to production-version names of streams (e.g. `EPC.log("edit", data)` and `EPC.log` has internal logic which checks for some flag and prepends `beta_` to stream name if running in a dev/test environment. - These events don't need require long term retention; all `beta_*` tables can just be deleted once a week every week to prevent overpopulation due to beta versions of inactive streams. - Analysts can work with non-`beta_*` tables for metrics/reports. - Analysts, Engineers, and QA folks only need to check `beta_*` tables to see if the events they generated during development/testing made it into the database without problems. ### Caveat If EventGate looks at stream config to compare the received event against, then this requires every stream (that we want to test) in the config to have a "beta_" copy of it. - **Cons**: - stream config //up to// x2 as long and in some ways redundant - have to manually add "beta_" copies of streams you wish to test, then remember to remove the ones you feel confident about - a fancy, challenging alternative to the manual approach would be to have a version of the stream config auto-generated with "beta_"-prepended stream names and then a target stream config would be stitched together from these two source stream configs - **Pros**: - `beta_*` streams can have different sampling rates like 100% for every stream since events produced to that stream are only from dev/testing and we don't want any sampling applied to those. In fact, under our ruleset the "beta_" shadow can omit the sampling rate (since 1 is assumed by default) - Only include `beta_` shadows of streams for the instrumentation that is being worked on. Client won't log events for streams during dev/testing that don't have `beta_` versions. - Event CC'ing still works: e.g. events sent to `beta_edit` stream are copied to `beta_edit.growth` stream ----- Other ideas for how to handle testing with the new MEP components are welcome.

## Background As a data analyst, I don't want events generated during instrumentation development & testing to be together with the events generated by actual users running production clients because it will affect my metrics computed from client-side analytics data. The way Modern Event Platform and Event Platform Clients work, there's currently nothing preventing from a dev/debug build of a client (e.g. MW Vagrant) from sending events to the same streams (and thus the same tables in the database) as clients in production. ## Most likely bad ideas - Adding a `is_debug` boolean field to a common schema and then requiring analysts to include `WHERE NOT is_debug` in every query - No, just no - Setting up a separate EventGate instance for receiving events produced during testing and populating a "test" version of the database - Clients would need to override the destination URL of each stream, which misses the point of having the stream config specify the destination instead of hardcoding it in the client - Creates too much overhead - Requires too much maintenance ## Proposal Assuming EventGate doesn't need to see the stream configuration. (See //Caveat// section below otherwise.) This is a reasonable assumption because schema name //and// version are both sent in the event payload in the `$schema` field. EventGate should just look at that, validate event data against the schema repository, and if everything is good then it inserts the event into a table specified by `meta.stream` that's present in the same event payload. Under this assumption, all that's required to specify //**if**// the event is validated is the `$schema` and //**where**// it ends up after being validated is `meta.stream`. A client running in a test/dev/debug environment simply needs to prefix `meta.stream` in its payload with "beta_" before sending the event to the destination URL for those events to be separate from production events. ### Benefits - All events generated during testing (and validated against schemas) end up in `beta_*` tables. - All of the instrumentation stays the same. Events are logged to production-version names of streams (e.g. `EPC.log("edit", data)` and `EPC.log` has internal logic which checks for some flag and prepends `beta_` to stream name if running in a dev/test environment. - These events don't need require long term retention; all `beta_*` tables can just be deleted once a week every week to prevent overpopulation due to beta versions of inactive streams. - Analysts can work with non-`beta_*` tables for metrics/reports. - Analysts, Engineers, and QA folks only need to check `beta_*` tables to see if the events they generated during development/testing made it into the database without problems. ### Caveat If EventGate looks at stream config to compare the received event against, then this would requires every stream (that we want to test) in the config to have a "beta_" copy of it. - **Cons**: - stream config //up to// x2 as long and in some ways redundant - have to manually add "beta_" copies of streams you wish to test, then remember to remove the ones you feel confident about - a fancy, challenging alternative to the manual approach would be to have a version of the stream config auto-generated with "beta_"-prepended stream names and then a target stream config would be stitched together from these two source stream configs - **Pros**: - `beta_*` streams can have different sampling rates like 100% for every stream since events produced to that stream are only from dev/testing and we don't want any sampling applied to those. In fact, under our ruleset the "beta_" shadow can omit the sampling rate (since 1 is assumed by default) - only- Only include `beta_` shadows of streams for the instrumentation that is being worked on. Client won't log events for streams during dev/testing that don't have `beta_` versions. - Event CC'ing still works: e.g. events sent to `beta_edit` stream are copied to `beta_edit.growth` stream ----- Other ideas for how to handle testing with the new MEP components are welcome.