Page MenuHomePhabricator

Let user specify cnf to use when connecting to MariaDB
Open, Needs TriagePublic

Description

From https://github.com/wikimedia/wmfdata-python/blob/2ae3b559898f40d84493d475c0e2a83969b65985/wmfdata/mariadb.py:

# Check which group the user is in, and use the appropriate credentials file
    user = getpass.getuser()
    if user in grp.getgrnam("analytics-privatedata-users").gr_mem:
        option_file = "/etc/mysql/conf.d/analytics-research-client.cnf"
    elif user in grp.getgrnam("researchers").gr_mem:
        option_file = "/etc/mysql/conf.d/research-client.cnf"
    # For users in analytics-users, for example
    else:
        raise PermissionError(
            "Your account does not have permission to access the Analytics "
            "MariaDB cluster."
        )

We need system users like analytics-product to be able to run a notebook which uses wmfdata.mariadb to retrieve data. We need a way of pointing it to a cnf stored on HDFS, for example.

Event Timeline

In https://gitlab.wikimedia.org/repos/data-engineering/dumps/mediawiki-content-dump/-/blob/6743db0e987a4567352eec4277e5a7f4092de423/notebooks/Access%20MariaDB%20From%20Cluster.ipynb Xabriel demonstrated how to retrieve the password from HDFS.

Rather than specifying cnf, wmfdata.mariadb could be made to connect with username & password pair (which mariadb.connect() allows in lieu of an options file). Until T214469 is resolved we'll have to continue using "research" user and then can retrieve the password separately. We may want to wrap the subprocess code into a helper function in wmfdata.utils that accepts a path to the password text file in HDFS.

For pipelines that need to run as system users like analytics, analytics-search, analytics-product we can make password files available. (analytics and analytics-search already do in https://github.com/wikimedia/operations-puppet/blob/production/modules/profile/manifests/analytics/cluster/secrets.pp)