Discussion points for grooming:
- How do we want to load datasets from Airflow to Cassandra?
- [[ https://github.com/wikimedia/analytics-refinery-source/blob/master/refinery-job/src/main/scala/org/wikimedia/analytics/refinery/job/HiveToCassandra.scala | Solution ]] from Analytics team
- Do we want to recommend different loading methods based on dataset size?
- Can we offer general Airflow components to do the loading? Can we abstract it - for example - pass Pandas Dataframe to common load function.
- How do we handle/store access credentials in Airflow/ETL solutions.
--------------------------------
In prod we likely won't rely on `cqlsh` to load data.
We'll need a scalable solution to load IMA data. Some implementation details will depend on how we'll access Cassandra from k8. Discussion ongoing at https://phabricator.wikimedia.org/T280042.