Page MenuHomePhabricator

Deprecate the 'researchers' posix group
Closed, ResolvedPublic8 Estimated Story Points

Description

The researchers POSIX group is not needed anymore, people that still need access should be placed either into analytics-users (for simple access) or analytics-privatedata-users (for Hadoop/data/etc.. access).

The only thing that the group grants at the moment, other than stat100x ssh access, is read permissions for /etc/mysql/conf.d/researchers-client.cnf on every stat100x host to query the wiki replicas. The same perms are granted to analytics-privatedata-users and we can argue that accessing the wiki replicas can be considered as private-data access. The name of the group is misleading of course, no member of the Research team (or collaborator) needs it for example :)

These are the current users:

researchers:
  gid: 714
  description: Access statistics hosts and also provides
               access to research mysql credentials.
               If a user is added to this group it should not
               need to be in analytics-users or analytics-privatedata-users.
               In case of doubt, please ask to the Analytics team.
               More info https://wikitech.wikimedia.org/wiki/Analytics/Data_access#Access_Groups
  members: [catrope, dduvall, mattflaschen, cooltey, marktraceur,
            jhernandez, daisy, etonkovidova, legoktm, risler,
            sbisson, matmarex, nikerabbit, dstrine, jdittrich, debt,
            mlitn, sharvaniharan, kharlan]

I am going to send an email to these users to clean up accounts not needed anymore :)

Event Timeline

Change 643674 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] admin: add comment about the 'researchers' group

https://gerrit.wikimedia.org/r/643674

I don't really know which one (if any) I should be in. IIRC I was added there for eventlogging access, which I still occasionally use.

I don't really know which one (if any) I should be in. IIRC I was added there for eventlogging access, which I still occasionally use.

Hi @Nikerabbit! Thanks for answering. So eventlogging's data is not pushed anymore to DBs, we deprecated that feature, everything in on Hadoop/Hive currently. Hive provides a SQL-like cli to access Eventlogging's tables, so it is similar to a mysql client, if you are interested I can follow up with you to show how things work, but I'll need to move your account first to analytics-privatedata-users :)

Change 643674 merged by Elukey:
[operations/puppet@production] admin: add comment about the 'researchers' group

https://gerrit.wikimedia.org/r/643674

I think I was added to analytics-privatedata-users recently to work on an Oozie job so I should be fine.

Change 643943 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] admin: remove sbisson from 'researchers'

https://gerrit.wikimedia.org/r/643943

I think I was added to analytics-privatedata-users recently to work on an Oozie job so I should be fine.

Definitely, thanks a lot for the feedback!

Change 643943 merged by Elukey:
[operations/puppet@production] admin: remove sbisson from 'researchers'

https://gerrit.wikimedia.org/r/643943

Change 644181 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] admin: remove user nikerabbit from 'researchers'

https://gerrit.wikimedia.org/r/644181

Change 644181 merged by Elukey:
[operations/puppet@production] admin: remove user nikerabbit from 'researchers'

https://gerrit.wikimedia.org/r/644181

Pinging people in here too: @Catrope @dduvall @cooltey @MarkTraceur @Jhernandez @dchen @Etonkovidova @Legoktm @matmarex @DStrine @debt @Sharvaniharan

Email sent:

Hi!
If you are receiving this email it means that your shell username is listed in operations/sre puppet among the ones in the 'researchers' POSIX group. We (as Analytics team) are trying to deprecate it since nowadays it has been superseded by two other groups:

  • analytics-users: grants access to the stat100x hosts but no PII data access (no mysql access to wiki replicas or Hadoop etc..).
  • analytics-privatedata-users: grants access to the stat100x hosts, together with mysql access to wiki replicas and Hadoop (the latter requires an extra Kerberos account).

    The main usage of the researchers group has been, as far as I know, mostly for the read access to /etc/mysql/conf.d/research-client.cnf, containing credentials for the Analytics Mariadb Wiki replicas. The same access (different file but same account in it) can be obtained via 'analytics-privatedata-users', so I'd ask to comment in the task if that access is still needed (and in case, if there are reasons blocking you from moving to analytics-privatedata-users) or not (and in case, I'll remove your user from the group).

I'm not sure what this is but I'm pretty sure I don't use this. thanks for the ping.

Change 645275 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] admin: remove access for user dstrine

https://gerrit.wikimedia.org/r/645275

I'm not sure what this is but I'm pretty sure I don't use this. thanks for the ping.

Thanks @DStrine! Since this group is the only one in which you are in, I filed a change to remove your user from the ones allowed to ssh to the stat100x hosts (https://gerrit.wikimedia.org/r/645275). Let me know if I misunderstood and you need to access the stat100x hosts.

Follow up questions: Do you use Turnilo/Superset/Logstash/etc..?

I've used it in the past for Hadoop/Hive queries I believe, but it has been some time since I've need it. I'd prefer to be removed, and if/when I need it I'll ask for the appropriate role to your team 👍

I've used it in the past for Hadoop/Hive queries I believe, but it has been some time since I've need it. I'd prefer to be removed, and if/when I need it I'll ask for the appropriate role to your team 👍

Thanks for the feedback! I see that you are in the WMF LDAP group, do you use turnilo/superset/etc..?

@elukey Yes, I did use them more frequently, and now I do only occasionally.

@elukey Yes, I did use them more frequently, and now I do only occasionally.

Perfect, this info is needed to update your records in the SRE puppet configs. Just as FYI, some superset dashboards might need to be in analytics-privatedata-users so follow up with us in the future if you have problems!

I don't need any analytics access anymore.

Change 648112 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] admin: remove user legoktm from 'researchers'

https://gerrit.wikimedia.org/r/648112

Change 648112 merged by Elukey:
[operations/puppet@production] admin: remove user legoktm from 'researchers'

https://gerrit.wikimedia.org/r/648112

Just sent another email as heads up. I added Monday 21st as deadline, if I don't hear anything from people I'll proceed with the removal of the username from the researchers group. It will be possible to get added to analytics-privatedata-users in case needed :)

Change 645275 merged by Elukey:
[operations/puppet@production] admin: remove access for user dstrine

https://gerrit.wikimedia.org/r/645275

I still need access, please move me to 'analytics-privatedata-users'.

@elukey I still need access for the Add-Link project. (Sorry for missing the deadline!)

@elukey I still need access for the Add-Link project. (Sorry for missing the deadline!)

No problem at all, I haven't removed the group yet! Just to understand your use case, how do you use the analytics tools? Do you ssh to stat100x? If so, it is only to check for eventlogging mysql data? I am asking since the eventlogging mysql database has been deprecated/removed long time ago, we are now offering superset.wikimedia.org's sqllab as replacement (see https://wikitech.wikimedia.org/wiki/Analytics/Systems/Superset#SQL_Lab for some info). If you don't need to ssh to stat100x hosts for other use cases (like hive/spark/etc..) we offer the possibility to be in the analytics-privatedata-users group to access PII data but only from our UIs (so without then need of a ssh key etc..). Let me know :)

@elukey I still need access for the Add-Link project. (Sorry for missing the deadline!)

No problem at all, I haven't removed the group yet! Just to understand your use case, how do you use the analytics tools? Do you ssh to stat100x? If so, it is only to check for eventlogging mysql data? I am asking since the eventlogging mysql database has been deprecated/removed long time ago, we are now offering superset.wikimedia.org's sqllab as replacement (see https://wikitech.wikimedia.org/wiki/Analytics/Systems/Superset#SQL_Lab for some info). If you don't need to ssh to stat100x hosts for other use cases (like hive/spark/etc..) we offer the possibility to be in the analytics-privatedata-users group to access PII data but only from our UIs (so without then need of a ssh key etc..). Let me know :)

Yes I access via SSH, and I need access to hive/spark/etc for a specific use case. For the Add-Link project we have a script (run-pipeline.sh) that accesses those services to generate datasets (SQLite and MySQL table files), and I along with @Tgr and @MGerlach need to be able to run that script.

@elukey I still need access for the Add-Link project. (Sorry for missing the deadline!)

No problem at all, I haven't removed the group yet! Just to understand your use case, how do you use the analytics tools? Do you ssh to stat100x? If so, it is only to check for eventlogging mysql data? I am asking since the eventlogging mysql database has been deprecated/removed long time ago, we are now offering superset.wikimedia.org's sqllab as replacement (see https://wikitech.wikimedia.org/wiki/Analytics/Systems/Superset#SQL_Lab for some info). If you don't need to ssh to stat100x hosts for other use cases (like hive/spark/etc..) we offer the possibility to be in the analytics-privatedata-users group to access PII data but only from our UIs (so without then need of a ssh key etc..). Let me know :)

Yes I access via SSH, and I need access to hive/spark/etc for a specific use case. For the Add-Link project we have a script (run-pipeline.sh) that accesses those services to generate datasets (SQLite and MySQL table files), and I along with @Tgr and @MGerlach need to be able to run that script.

Super thanks for explaining! We have set up months ago Kerberos auth (see https://wikitech.wikimedia.org/wiki/Analytics/Systems/Kerberos/UserGuide), if you need to use spark/hive/hadoop etc.. you'll also need to request a Kerberos user/principal.

Change 654258 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] admin: move user matmarex from 'researchers' to 'analytics-privatedata-users'

https://gerrit.wikimedia.org/r/654258

Change 654259 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] admin: remove ssh access to jherndandez

https://gerrit.wikimedia.org/r/654259

Change 654258 merged by Elukey:
[operations/puppet@production] admin: move user matmarex from 'researchers' to 'analytics-privatedata-users'

https://gerrit.wikimedia.org/r/654258

Change 654259 merged by Elukey:
[operations/puppet@production] admin: remove ssh access to jherndandez

https://gerrit.wikimedia.org/r/654259

Change 654277 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] admin: remove members of 'reseachers' already in other posix groups

https://gerrit.wikimedia.org/r/654277

Change 654277 merged by Elukey:
[operations/puppet@production] admin: remove members of 'reseachers' already in other posix groups

https://gerrit.wikimedia.org/r/654277

Change 654871 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] admin: move user kharlan from 'researchers' to 'analytics-privatedata-users'

https://gerrit.wikimedia.org/r/654871

Change 654873 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] admin: set user mattflaschen in ldap_only

https://gerrit.wikimedia.org/r/654873

Change 654874 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] admin: set user 'daisy' in ldap_only

https://gerrit.wikimedia.org/r/654874

Change 654877 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] admin: set user 'etonkovidova' to ldap_only

https://gerrit.wikimedia.org/r/654877

Change 654880 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] admin: set user 'risler' to ldap_only

https://gerrit.wikimedia.org/r/654880

Change 654882 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] admin: absent user 'jdittrich'

https://gerrit.wikimedia.org/r/654882

Change 654884 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] admin: set user 'debt' to ldap_only

https://gerrit.wikimedia.org/r/654884

Change 654871 merged by Elukey:
[operations/puppet@production] admin: move user kharlan from 'researchers' to 'analytics-privatedata-users'

https://gerrit.wikimedia.org/r/654871

Change 654873 merged by Elukey:
[operations/puppet@production] admin: set user mattflaschen in ldap_only

https://gerrit.wikimedia.org/r/654873

Change 654874 merged by Elukey:
[operations/puppet@production] admin: set user 'daisy' in ldap_only

https://gerrit.wikimedia.org/r/654874

Change 654877 merged by Elukey:
[operations/puppet@production] admin: set user 'etonkovidova' to ldap_only

https://gerrit.wikimedia.org/r/654877

Change 654880 merged by Elukey:
[operations/puppet@production] admin: set user 'risler' to ldap_only

https://gerrit.wikimedia.org/r/654880

Change 654882 merged by Elukey:
[operations/puppet@production] admin: absent user 'jdittrich'

https://gerrit.wikimedia.org/r/654882

Change 654884 merged by Elukey:
[operations/puppet@production] admin: set user 'debt' to ldap_only

https://gerrit.wikimedia.org/r/654884

Change 654892 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] Remove 'reseachers' and 'gpu-testers' posix group from Analytics cfgs

https://gerrit.wikimedia.org/r/654892

Change 654892 merged by Elukey:
[operations/puppet@production] Remove 'reseachers' and 'gpu-testers' posix group from Analytics cfgs

https://gerrit.wikimedia.org/r/654892

elukey triaged this task as Medium priority.Jan 7 2021, 4:34 PM
elukey set the point value for this task to 8.
elukey moved this task from In Progress to Done on the Analytics-Kanban board.
elukey moved this task from Q3 2020/2021 to Done on the Analytics-Clusters board.