Discussion points for grooming:
- How do we want to load datasets from Airflow to Cassandra?
- Solution from Analytics team
- Do we want to recommend different loading methods based on dataset size?
- Can we offer general Airflow components to do the loading? Can we abstract it - for example - pass Pandas Dataframe to common load function.
- How do we handle/store access credentials in Airflow/ETL solutions.
In prod we likely won't rely on cqlsh to load data.
We'll need a scalable solution to load IMA data. Some implementation details will depend on how we'll access Cassandra from k8. Discussion ongoing at https://phabricator.wikimedia.org/T280042.