Page MenuHomePhabricator

Enable Wmfdata-Python to access MariaDB replicas from the cluster
Open, LowPublic

Description

As a data scientist, I need wmfdata to access MariaDB replicas when it is used in a notebook executed on the cluster so that I can schedule the notebook as a data pipeline through Airflow.

In the Product Analytics ETL modernization sync-up on 26 June 2023 (notes) we identified that in the current wmfdata-python MariaDB module:

  • It checks POSIX group membership to determine which cnf to retrieve username & password from for connecting
  • It uses the analytics-mysql executable to determine which host & port to use for connecting (after parsing output)

To make it usable on the cluster:

  • Need a way of specifying which cnf to use (e.g. if we store the mysql password on HDFS and need to read it as analytics-product system user): T340469
  • Need a way of retrieving host & port info: T340472

Event Timeline

mpopov added a project: Movement-Insights.
mpopov moved this task from Incoming to Watching on the Movement-Insights board.
mpopov removed a project: Movement-Insights.
mpopov added a subscriber: nshahquinn-wmf.

@nshahquinn-wmf: Did you want to keep tabs on this on the Movement Insights board?

nshahquinn-wmf renamed this task from Enable wmfdata-py to access MariaDB replicas on the cluster to Enable Wmfdata-Python to access MariaDB replicas from the cluster.Jan 24 2025, 11:07 PM
nshahquinn-wmf added a project: Epic.