Page MenuHomePhabricator

Configure Datahub Authentication and Authorization for the MVP
Closed, ResolvedPublic

Description

Ensure that the authentication and authorization configuration matches what we need for the MVP.

Currently we use the Wikitech LDAP database as the authentication database.
We had attempted to filter the user list down to those accounts who were either a member of the 'nda' or 'wmf' groups in T301462: Configure LDAP authentication for the DataHub frontend but it hasn't ye't worked properly.

This means that there are currently 38,549 user accounts permitted to log into DataHub. What's more, anyone can create one, as it is only a Wikitech account.

We need a way to link the LDAP accounts with the existing nda and wmf LDAP groups, so that we can apply different policies to staff members and those who have signed the NDA.

This ingestion plugin: https://datahubproject.io/docs/metadata-ingestion/source_docs/ldap/ is supposed to import LDAP users and groups, however it is currently lacking in functionality and does not currently apply group memberships correctly.

We need a working solution for the MVP phase whereby we can be certain about what rights should be applied to the wikitech account holders who are not members of either the wmf nor the nda group.

Event Timeline

BTullis renamed this task from Datahub Auth Spike to Configure Datahub Authentication and Authorization for the MVP.Apr 20 2022, 2:08 PM
BTullis claimed this task.
BTullis triaged this task as High priority.
BTullis updated the task description. (Show Details)
BTullis moved this task from Next Up to In Progress on the Data-Engineering-Kanban board.

Initially I used a recipe similar to the following to ingest our wmf and cma LDAP entries.

source:
  type: "ldap"
  config:
    ldap_server: ldaps://ldap-ro.eqiad.wikimedia.org
    base_dn: "dc=wikimedia,dc=org"
    ldap_user: ""
    ldap_password: ""
    filter: '(&(objectClass=inetorgPerson)(|(memberOf=cn=wmf,ou=groups,dc=wikimedia,dc=org)(memberOf=cn=nda,ou=groups,dc=wikimedia,dc=org)))'
    drop_missing_first_last_name: False

sink:
  type: 'datahub-rest'
  config:
    server: 'https://datahub-gms.discovery.wmnet:30443'

There were a lot of groups that we wouldn't necessarily want to be imported, so I have only limited the ingestion to those two groups for now.

source:
  type: "ldap"
  config:
    ldap_server: ldaps://ldap-ro.eqiad.wikimedia.org
    base_dn: "ou=groups,dc=wikimedia,dc=org"
    ldap_user: ""
    ldap_password: ""
    filter: '(&(objectClass=groupOfNames)(|(cn=wmf)(cn=nda)))'
    drop_missing_first_last_name: False

sink:
  type: 'datahub-rest'
  config:
    server: 'https://datahub-gms.discovery.wmnet:30443'

I've manually added the users to their correct wmf and nda groups.

image.png (451×415 px, 33 KB)

I've also added a datahubadmin groups and added all of the members of the Data-Engineering team to it.

I think this is the bare minimum that we need for the MVP - https://datahub.wikimedia.org/policies

image.png (638×1 px, 152 KB)

  • All logged in users can see all entities.
  • All WMF staff and NDA signatories can edit all metadata annotation fields of all datasets.
  • All WMF staff and NDA signatories can view profiling and usage information for all datasets.
  • Members of the DataHub administrators (currently the DE team) can access and configure all platform features.

Marking as done, although I will gather feedback on whether logged in users see what they expect.

If we wish to restrict what non-staff can see further, we could disable the All Users - View Entity Page policy, but I'm leaving it enabled for now.