Page MenuHomePhabricator

Datahub user records are not being created after login
Closed, ResolvedPublic

Description

We are aware of an issue with datahub whereby newly logged in users do not appear to be added to the database as user records.

This means that it is impossible to assign elevated rights to the individual, such as adding them to the datahubadmins group.

For example, we know that the users stevemunene and jebe have logged in, but there is no urn for them.

MariaDB [datahub]> select urn,aspect from metadata_aspect_v2 where urn like '%stevemunene%';
Empty set (0.000 sec)

MariaDB [datahub]> select urn,aspect from metadata_aspect_v2 where urn like '%jebe%';
Empty set (0.042 sec)

Compare that result to the result for btullis who is already a member of datahubadmins

MariaDB [datahub]> select urn,aspect from metadata_aspect_v2 where urn like '%btullis%';
+-------------------------+-----------------------+
| urn                     | aspect                |
+-------------------------+-----------------------+
| urn:li:corpuser:Btullis | corpUserEditableInfo  |
| urn:li:corpuser:btullis | corpUserInfo          |
| urn:li:corpuser:btullis | corpUserKey           |
| urn:li:corpuser:btullis | groupMembership       |
| urn:li:corpuser:btullis | groupMembership       |
| urn:li:corpuser:btullis | groupMembership       |
| urn:li:corpuser:btullis | groupMembership       |
| urn:li:corpuser:btullis | nativeGroupMembership |
| urn:li:corpuser:btullis | nativeGroupMembership |
+-------------------------+-----------------------+
9 rows in set (0.039 sec)

When stevemunene logs in, we can see two specific errors in the log files.

10:30:15 [application-akka.actor.default-dispatcher-82433] ERROR application - The submitted callback is of type: class javax.security.auth.callback.NameCallback : javax.security.auth.callback.NameCallback@619b7182
10:30:15 [application-akka.actor.default-dispatcher-82433] ERROR application - The submitted callback is of type: class javax.security.auth.callback.PasswordCallback : javax.security.auth.callback.PasswordCallback@72e4e2a1

I'm not yet sure what this means.

It's possible that this behaviour started after the upgrade of datahub to version 0.9.0 but we're still correllating these dates.

Event Timeline

Expediting this into the current sprint, since it is currently blocking newer staff members from using datahub.

@Stevemunene found the following reference, which contains errors similar to what we observe: https://www.linen.dev/s/datahubspace/t/2235443/hi-team-i-m-trying-to-integrate-authentication-for-frontend-

It is possible that adding authzIdentity="{USERNAME}" to the configuration (or perhaps java.naming.security.authentication="simple") would help in our situation too, but it would be good to know why this has broken when it was previously working.

I have asked DataHub themselves about this and I am currently awaiting a reply.

Change 883939 had a related patch set uploaded (by Stevemunene; author: Stevemunene):

[operations/deployment-charts@master] Enable oidc env vars for datahub

https://gerrit.wikimedia.org/r/883939

Mentioned in SAL (#wikimedia-analytics) [2023-01-27T11:03:36Z] <steve_munene> datahub: apply on main for T327884

Change 883939 merged by jenkins-bot:

[operations/deployment-charts@master] Enable oidc env vars for datahub

https://gerrit.wikimedia.org/r/883939

Mentioned in SAL (#wikimedia-analytics) [2023-01-27T11:41:22Z] <steve_munene> datahub helmfile apply on main for T327884

We implemented a change to enable OIDC provisioning and exact groups, as per a suggestion from the datahub slack channel. Which was

The user login -> corpUser generation happens when these properties are set to true:
auth.oidc.jitProvisioningEnabled
auth.oidc.extractGroupsEnabled

Our env vars look like this now:

Environment:
  SERVICE_IDENTIFIER:                  datahub-frontend-main
  JAVA_OPTS:                           -Xms512m -Xmx512m -Dhttp.port=9002 -Dconfig.file=/datahub/datahub-frontend/conf/application.conf -Djava.security.auth.login.config=/datahub/datahub-frontend/conf/auth/jaas-ldap.conf -Dlogback.configurationFile=/datahub/datahub-frontend/conf/logback.xml -Dlogback.debug=false -Dpidfile.path=/dev/null
                                       
  AUTH_NATIVE_ENABLED:                 false
  DATAHUB_ENCRYPTION_KEY:              <set to the key 'datahub_encryption_key' in secret 'datahub-frontend-main-secret-config'>     Optional: false
  ELASTICSEARCH_PASSWORD:              <set to the key 'elasticsearch_password' in secret 'datahub-frontend-main-secret-config'>     Optional: false
  MYSQL_PASSWORD:                      <set to the key 'mysql_password' in secret 'datahub-frontend-main-secret-config'>             Optional: false
  TOKEN_SERVICE_SIGNING_KEY:           <set to the key 'token_service_signing_key' in secret 'datahub-frontend-main-secret-config'>  Optional: false
  DATAHUB_GMS_HOST:                    datahub-gms-main-tls-service.datahub.svc.cluster.local
  DATAHUB_GMS_PORT:                    8501
  DATAHUB_SECRET:                      <set to the key 'datahub_encryption_key' in secret 'datahub-frontend-main-secret-config'>  Optional: false
  DATAHUB_APP_VERSION:                 0.9.0
  DATAHUB_PLAY_MEM_BUFFER_SIZE:        100m
  DATAHUB_ANALYTICS_ENABLED:           false
  KAFKA_BOOTSTRAP_SERVER:              kafka-test1006.eqiad.wmnet:9092
  ELASTIC_CLIENT_HOST:                 datahubsearch.svc.eqiad.wmnet
  ELASTIC_CLIENT_PORT:                 9200
  DATAHUB_TRACKING_TOPIC:              DataHubUsageEvent_v1
  DATAHUB_GMS_USE_SSL:                 true
  AUTH_OIDC_JIT_PROVISIONING_ENABLED:  true
  AUTH_OIDC_EXTRACT_GROUPS_ENABLED:    true

However, we are still experiencing the same challenge as before with the newer users.
Still exploring other possible causes and solutions.

Did some more reading on JAAS user extractions specifically the authzIdentity and java.naming.security.authentication="simple" and it is likely that both config options are required. However, this still does not explain why the previously working JAAS ldap broke.

When authzIdentity is supplied and the user has been successfully authenticated then an additional UserPrincipal is created using the authorization identity and it is associated with the current Subject. ( A Subject represents a grouping of related information for a single entity, such as a person. Such information includes the Subject's identities as well as its security-related attributes )
After successful authentication, a user Principal can be associated with a particular Subject to augment that Subject with an additional identity. Authorization decisions can then be based upon the Principals that are associated with a Subject.
Looking to explore this.

I have manually added the record for @JEbe-WMF using the following technique.

  • I used a manual ingestion recipe since it is not possible to create users from the UI.
  • I created a special LDAP recipe that only selects one user.
btullis@stat1008:~/src/datahub/ingestion$ cat ldap-jebe.yaml 
source:
  type: "ldap"
  config:
    ldap_server: ldaps://ldap-ro.eqiad.wikimedia.org
    base_dn: "dc=wikimedia,dc=org"
    ldap_user: ""
    ldap_password: ""
    filter: 'uid=jebe'
    drop_missing_first_last_name: False
    user_attrs_map:
      firstName: cn

sink:
  #type: 'console'
  type: 'datahub-rest'
  config:
    server: 'https://datahub-gms.discovery.wmnet:30443'

I had to add the user_attrs_map entry because the LDAP ingestion source wants a firstName value and does not know which attribute to use to source it.

I ran the ingestion like this:

btullis@stat1008:~/src/datahub/ingestion$ datahub ingest -c ldap-jebe.yaml 
[2023-01-31 12:36:02,029] INFO     {datahub.cli.ingest_cli:182} - DataHub CLI version: 0.9.0
[2023-01-31 12:36:02,130] INFO     {datahub.ingestion.run.pipeline:175} - Sink configured successfully. DataHubRestEmitter: configured to talk to https://datahub-gms.discovery.wmnet:30443
[2023-01-31 12:36:02,244] INFO     {datahub.ingestion.run.pipeline:200} - Source configured successfully.
[2023-01-31 12:36:02,245] INFO     {datahub.cli.ingest_cli:129} - Starting metadata ingestion
-[2023-01-31 12:36:02,402] INFO     {datahub.cli.ingest_cli:150} - Finished metadata ingestion
/
Cli report:
{'cli_entry_location': '/home/btullis/.conda/envs/2022-07-15T13.59.31_btullis/lib/python3.7/site-packages/datahub/__init__.py',
 'cli_version': '0.9.0',
 'mem_info': '110.61 MB',
 'os_details': 'Linux-5.10.0-0.deb10.19-amd64-x86_64-with-debian-10.13',
 'py_exec_path': '/home/btullis/.conda/envs/2022-07-15T13.59.31_btullis/bin/python',
 'py_version': '3.7.6 (default, Jan  8 2020, 19:59:22) \n[GCC 7.3.0]'}
Source (ldap) report:
{'dropped_dns': [],
 'event_ids': ['uid=jebe,ou=people,dc=wikimedia,dc=org'],
 'events_produced': '1',
 'events_produced_per_sec': '2',
 'failures': {},
 'running_time': '0.36 seconds',
 'start_time': '2023-01-31 12:36:02.227122 (now).',
 'warnings': {'<general>': ['Defaulting to uid as it was found in attrs and not set in user_attrs_map in recipe']}}
Sink (datahub-rest) report:
{'current_time': '2023-01-31 12:36:02.590703 (now).',
 'failures': [],
 'gms_version': 'null',
 'pending_requests': '0',
 'records_written_per_second': '2',
 'start_time': '2023-01-31 12:36:02.101759 (now).',
 'total_duration_in_seconds': '0.49',
 'total_records_written': '1',
 'warnings': []}

 Pipeline finished with at least 2 warnings ; produced 1 events in 0.36 seconds.

I was then able to add Jennifer to the datahubadmins group from the UI as expected.

This doesn't fix the issue, but it is a workaround.

Did some more reading on JAAS user extractions specifically the authzIdentity and java.naming.security.authentication="simple" and it is likely that both config options are required. However, this still does not explain why the previously working JAAS ldap broke.

When authzIdentity is supplied and the user has been successfully authenticated then an additional UserPrincipal is created using the authorization identity and it is associated with the current Subject. ( A Subject represents a grouping of related information for a single entity, such as a person. Such information includes the Subject's identities as well as its security-related attributes )
After successful authentication, a user Principal can be associated with a particular Subject to augment that Subject with an additional identity. Authorization decisions can then be based upon the Principals that are associated with a Subject.
Looking to explore this.

@Stevemunene - Do you want to try adding these two parameters to the JAAS configuration then? We can try it out on staging by using an SSH tunnel and a local edit to /etc/hosts on your workstation, similar to this: T327799#8564520

Yes, adding the authzIdentity to be the same value set as the username. Leaving out the authentication for now and shall update based on results.

Change 885360 had a related patch set uploaded (by Stevemunene; author: Stevemunene):

[operations/deployment-charts@master] Add authzIdentity to jaas config

https://gerrit.wikimedia.org/r/885360

Change 885360 merged by jenkins-bot:

[operations/deployment-charts@master] Add authzIdentity to jaas config

https://gerrit.wikimedia.org/r/885360

Change 885786 had a related patch set uploaded (by Stevemunene; author: Stevemunene):

[operations/deployment-charts@master] Add authzIdentity to jaas config chart increment

https://gerrit.wikimedia.org/r/885786

Change 885786 merged by jenkins-bot:

[operations/deployment-charts@master] Add authzIdentity to jaas config chart increment

https://gerrit.wikimedia.org/r/885786

Hi @Stevemunene did this get deployed in the end?
If so, I guess it didn't work because I still can't find a database record for your user.

image.png (318×949 px, 27 KB)

https://datahub.wikimedia.org/search?page=1&query=steve&unionType=0

What's our next move, do you think? Will we have to go back to DataHub and ask for more suggestions?

Hi,
This was deployed and tested on staging environment and the records were still not created and the behavior was still generally the same. Been exploring the how and why we query results as we do as per this issue ,
authzIdentity is also mentioned here. Asking around on DataHub slack for suggestions.

This is still not resolved by the recent upgrade to DataHub 0.10.4, but we can now press ahead with the plan to switch DataHub authentication to OIDC in T305874

This is now looking good. After making progress with T305874: Switch DataHub authentication to OIDC and being able to test in staging, it would appear that just-in-time provisioning of user accounts with their CN is working as expected.

image.png (592×815 px, 60 KB)

We just need to be able to roll out that change to production in order to be able to resolve this ticket.

Stevemunene moved this task from In Progress to Done on the Data-Platform-SRE board.

After the switch to OIDC we can confirm that new users, can login and their records are created.