Description
Based on discussion with DE(https://docs.google.com/document/d/1S2Ij3FikNGdN8ZwZKcwpAfNmGQoNtah9AwNilxqhC3k/edit), it has been decided to move forward with static stream config option as a preference.
However, there are unknowns in terms of performance issues that need to be investigated to ensure that the workflow will scale. This work is necessary irrespective of static vs dynamic approach, though they would have slightly different risk profiles.
For example, how will filtering will impact performance (especially for dashboarding) Parquet / Iceberg should be good at predicate pushdown? Does Iceberg have indexes?
Acceptance Criteria
- Documented set of usage scenarios covering scaling of storage, requests etc
- Set of usage scenarios, stack ranked by likelihood
- Posit performance implications for each
- Propose mitigation for most likely scenarios