As a data scientist, I need wmfdata to access MariaDB replicas when it is used in a notebook executed on the cluster so that I can schedule the notebook as a data pipeline through Airflow.
In the Product Analytics ETL modernization sync-up on 26 June 2023 (notes) we identified that in the current wmfdata-python MariaDB module:
- It checks POSIX group membership to determine which cnf to retrieve username & password from for connecting
- It uses the analytics-mysql executable to determine which host & port to use for connecting (after parsing output)
To make it usable on the cluster: