Page MenuHomePhabricator

Clarify permissions of /wmf/data/discovery/ datasets
Open, HighPublic

Description

As HDFS user I want datasets to have meaningful permissions so that access to PII data is better controlled.

In T270629 it was decided to strengthen the permissions of the datasets by removing read perms to others.

Search application uses analytics-search-users and most its application should run under the analytics-search.

Datasets and current permissions are as follow:

drwxr-x---   - analytics-search analytics-search-users          0 2021-01-07 01:01 /wmf/data/discovery/cirrus_namespace_index_map
drwxrwxr-x   - analytics-search analytics-search-users          0 2021-01-07 00:41 /wmf/data/discovery/fulltext_head_queries
drwxrwxr-x   - ebernhardson     analytics-search-users          0 2020-06-16 00:30 /wmf/data/discovery/glent
drwxrwxr-x   - analytics-search analytics-search-users          0 2019-10-21 01:39 /wmf/data/discovery/mjolnir
drwxrwxr-x   - analytics-search analytics-search-users          0 2020-11-10 16:08 /wmf/data/discovery/ores
drwxrwxr-x   - analytics-search analytics-search-users          0 2016-02-02 20:12 /wmf/data/discovery/popularity_score
drwxr-xr-x   - analytics-search analytics-search-users          0 2020-01-15 17:54 /wmf/data/discovery/popularity_score_esbulk
drwxrwxr-x   - analytics-search analytics-search-users          0 2020-07-15 23:50 /wmf/data/discovery/popularity_score_v2
drwxrwxr-x   - ebernhardson     analytics-search-users          0 2019-12-21 00:52 /wmf/data/discovery/query_clicks
drwxr-xr-x   - analytics-search analytics-search-users          0 2020-07-20 21:10 /wmf/data/discovery/reports
drwxrwxr-x   - ebernhardson     analytics-search-users          0 2020-05-05 23:10 /wmf/data/discovery/search_satisfaction
drwxr-xr-x   - analytics-search analytics-search-users          0 2021-01-04 20:19 /wmf/data/discovery/transfer_to_es
drwxr-xr-x   - analytics-search analytics-search-users          0 2020-07-14 12:37 /wmf/data/discovery/wdqs
drwxrwxr-x   - analytics-search analytics-search-users          0 2020-07-14 12:33 /wmf/data/discovery/wikidata
datasetPIIobsolete
cirrus_namespace_index_mapnono
fulltext_head_queriesyesno
glentyesno
mjolniryesno
oresnono
popularity_scorenoyes
popularity_score_esbulknoyes
popularity_score_v2nono
query_clicksyesno
reportsyesno?
search_satisfactionyesno
transfer_to_esnono
wdqsnoyes
wikidatanono

Giving access to only analytics-search-users will prevent other users even analytics-privatedata-users while there does not seem to be any reason why a analytics-privatedata-users could not read search datasets, we should perhaps chgrp analytics-privatedata-users all search datasets including PII and leave analytics-search-users with o+rx for others?

AC:

  • perms of all /wmf/data/discovery/ are clarified

Event Timeline

dcausse updated the task description. (Show Details)

https://gerrit.wikimedia.org/r/c/analytics/refinery/+/654833/3/bin/refinery-deploy-to-hdfs was created by analytics to force the umask when deploying refinery to hdfs to allow oozie to read workflow files :)

CBogen triaged this task as High priority.Jan 11 2021, 4:19 PM
CBogen moved this task from needs triage to Ops / SRE on the Discovery-Search board.