Page MenuHomePhabricator

Requesting access to analytics-privatadata-users group for Carly Bogen
Closed, ResolvedPublic

Description

Requestor provided information and prerequisites

This section is to be completed by the individual requesting access.

  • Wikitech username: CBogen
  • Preferred shell username: cbogen
  • Email address: cbogen@wikimedia.org
  • Ssh public key (must be dedicated key for wmf production): AAAAB3NzaC1yc2EAAAADAQABAAABgQDIrJI1LDfugc+7hn48AYXsry/K5BzZDCLotCX/lTPN5gRQgornLWN8sHT549H/+FQ1jzYszOkc1hpsUDIuJt5Rnfwg2xdBk+gwDamPPR1MkpONgOBV50LtHVQd0tIG+a68QHvcgPARjNM+8deEi4eA633qymqzyCIN/LzSj9xEAJ5DXrlK7EV5+IISvCPBVR5tLGK6E+NDFfWoNFUYjaKStYoJRM7VDma6nIUky38+x1jLL+YSfjeaii4OquAs6Grz0XVLAkiwRZXscgYUfL8+LiBK34RuKQl+e5RlYd+y+bPl1k6h0ZZfvSz9DIy0vUXr/7yQqYHU8DTEV4pP5fcTF2u7MGFace4qLf0LgqpDq53sA6ALgcWTAm5xa7tiiEdyAfyccjZ9//VxSsuKQyypFAbxpYJd6zfwvn5UptNVRVoqNObYSHbnJrk0uI7p47HhuuTgnA3fkdqPOsdRnmc+cgIgaSz5hpf95eM19A9t2Ve4u026NWW2JGvS2crX8VE= cbogen@wikimedia.org
  • Requested group membership: analytics-privatedata-users
  • Reason for access: to analyze search data for SDAW metrics and MediaSearch feature development
  • Name of approving party (hiring manager for WMF staff): Amanda Bittaker
  • Requestor -- Please Acknowledge that you have read and signed the L3 Wikimedia Server Access Responsibilities document: Yes
  • Requestor -- Please coordinate obtaining a comment of approval on this task from the approving party: Done

SRE Clinic Duty Confirmation Checklist for Access Requests

This checklist should be used on all access requests to ensure that all steps are covered, including expansion to existing access. Please double check the step has been completed before checking it off.

This section is to be confirmed and completed by a member of the SRE team.

  • - User has signed the L3 Acknowledgement of Wikimedia Server Access Responsibilities Document.
  • - User has a valid NDA on file with WMF legal. (This can be checked by Operations via the NDA tracking sheet & is included in all WMF Staff/Contractor hiring.)
  • - User has provided the following: wikitech username, preferred shell username, email address, and full reasoning for access (including what commands and/or tasks they expect to perform)
  • - User has provided a public SSH key. This ssh key pair should only be used for WMF cluster access, and not share with any other service (this includes not sharing with WMCS access, no shared keys.)
  • - access request (or expansion) has sign off of WMF sponsor/manager (sponser for volunteers, manager for wmf staff)
  • - non-sudo requests: 3 business day wait must pass with no objections being noted on the task
  • - Patchset for access request

For additional details regarding access request requirements, please see https://wikitech.wikimedia.org/wiki/Requesting_shell_access

Original task:
Hi, I'd like to request membership in the analytics-privatedata-users access group. I need to access search log data for my work on the Structured Data Across Wikimedia project and as the Search team Program Manager.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Hi @Abit - I'm told I should have my manager comment on the ticket that the request is appropriate and I need access to this data (which I do for analyzing search data for SDAW metrics and MediaSearch feature development). Let me know if you have any questions, thanks!

I hereby comment that @CBogen's request is appropriate and she needs access to this data to analyze search data for SDAW metrics and MediaSearch feature development; please give her access ;)

Joe triaged this task as Medium priority.
Joe added subscribers: Nuria, Joe.

Hi @CBogen the procedure go get shell access is outlined in https://wikitech.wikimedia.org/wiki/Production_access. Specifically, you will need to fill in the information as outlined in https://wikitech.wikimedia.org/wiki/Production_access#Filing_the_request.

I'm adding the skeleton of the access request template in the task above, I'd kindly ask you to fill in the required fields.

Also, please review https://wikitech.wikimedia.org/wiki/Analytics/Data_access#Access_Groups and https://wikitech.wikimedia.org/wiki/Analytics/Data_access#User_responsibilities - although I guess from your request you already figured this out.

Also pinging @Nuria for analytics approval.

Joe renamed this task from Request for SSH access to analytics-privatadata-users group to Requesting access to analytics-privatadata-users group for Carly Bogen.Jul 21 2020, 8:13 AM
Joe updated the task description. (Show Details)
Joe moved this task from Untriaged to Awaiting User Input on the SRE-Access-Requests board.

@CBogen Have you pinged the data analysts about working with you in gathering this data? it exists in quite a raw form and querying it requires familiarity with hadoop.

@Nuria, @EBernhardson from the Search team is gathering the data, see T257361

@CBogen I imagine @EBernhardson will be putting that data in superset (pinging him here so he can let us know) in which case you just need permits to see the dashboards.

This data isn't going to be available in superset, carley is looking for a report on the top 10k queries to a wiki. We have a script that generates this report and plazes .csv.gz into hdfs, but she needs a way to access that .csv.gz file.

@EBernhardson let's talk about abetter process for this. If all it is required is access to a file it can be placed on the stats machines on a known location (an example, there are other choices) and no permits to see more delicate data should not be needed.

Let's please add @CBogen to wmf ldap so she has access to superset as well.

Joe changed the task status from Open to Stalled.Jul 21 2020, 3:50 PM
Joe removed Joe as the assignee of this task.

Let's please add @CBogen to wmf ldap so she has access to superset as well.

Carly already is in the wmf LDAP group, so she has access to superset already AIUI.

Given the task needs a wider discussion about the best process to adopt, I'm going to set it to stalled and remove myself as the assignee. @Nuria whenever the details are ironed out feel free to change status to Open, and the on-clinic-duty SRE will take care of it.

@CBogen can you confirm that you have access to https://superset.wikimedia.org ? (you need to use your user/password with which you log in into wikitech wiki)

@Nuria yes, I already have access to superset, thanks. My understanding is that this data that Erik is gathering for me is considered sensitive and can't be placed in an alternate location, which is why I needed this access.

this data that Erik is gathering for me is considered sensitive and can't be placed in an alternate location

Not quite correct, the permits you are requesting allow you to write on the datastore, for example, but from your request you just need to be able to read a file that cannot be made public, which is quite a different thing.

Search queries in general are considered PII, so a report containing a list of 10k queries essentially still counts as PII. We can probably put them somewhere, but i would like it to be a place where files can be placed automagically and on-demand. Right now there isn't any particular schedule to the reports, someone requests one so we trigger the job and it sends an email containing an hdfs path with the report when done.

Search queries in general are considered PII, so a report containing a list of 10k queries essentially still counts as PII.

This is correct and superset has access to read PII data.

Per comment on ticket above this access is not needed, closing

nettrom_WMF subscribed.

I'm reopening this task as we've now got visualizations/dashboards in Superset that @CBogen needs access to, and these require the group membership requested in this task.

@CBogen : Hi, this needs approval from the following people. Once those are done on task, I'll add you to analytics-privatedata-users:

  • Your manager
  • @Ottomata for access to analytics-privatedata-users

@CBogen : Hi, this needs approval from the following people. Once those are done on task, I'll add you to analytics-privatedata-users:

  • Your manager
  • @Ottomata for access to analytics-privatedata-users

Hi @Abit - can you please provide approval for me on this task as my manager? I need this in order to see the visual editor Media Search analytics. Thanks!

Hi @CBogen, do you need direct access to data in Hadoop and Hive, or will you just be using Superset to access that data via Presto / Druid? We've since updated some docs to hopefully make this more clear:

https://wikitech.wikimedia.org/wiki/Analytics/Data_access#What_access_should_I_request?

In either case, I approve. We just need to know which access to grant ya.

Hi @CBogen, do you need direct access to data in Hadoop and Hive, or will you just be using Superset to access that data via Presto / Druid? We've since updated some docs to hopefully make this more clear:

https://wikitech.wikimedia.org/wiki/Analytics/Data_access#What_access_should_I_request?

In either case, I approve. We just need to know which access to grant ya.

Pretty sure I just need Superset access to that data via Presto / Druid, but @nettrom_WMF can you confirm? Thanks!

Hi @CBogen, do you need direct access to data in Hadoop and Hive, or will you just be using Superset to access that data via Presto / Druid? >

Pretty sure I just need Superset access to that data via Presto / Druid, but @nettrom_WMF can you confirm? Thanks!

Yes, I can confirm that it's Presto/Druid access in Superset that's needed, as described here.

Ok, so just wmf LDAP and analytics-privatedata-users posix membership is needed. Thank you.

In T258413#6836429, @MoritzMuehlenhoff wrote:
@CBogen : Hi, this needs approval from the following people. Once those are done on task, I'll add you to analytics-privatedata-users:

Your manager
@Ottomata for access to analytics-privatedata-users
Hi @Abit - can you please provide approval for me on this task as my manager? I need this in order to see the visual editor Media Search analytics. Thanks!

I approve.

Change 665068 had a related patch set uploaded (by Muehlenhoff; owner: Muehlenhoff):
[operations/puppet@production] Add cbogen to analytics-privatedata-users

https://gerrit.wikimedia.org/r/665068

Change 665068 merged by Muehlenhoff:
[operations/puppet@production] Add cbogen to analytics-privatedata-users

https://gerrit.wikimedia.org/r/665068

MoritzMuehlenhoff claimed this task.

@CBogen : I've added you to the group, you should be able to access Superset now. I'm closing the task, please reopen if you run into any issues.

@CBogen : I've added you to the group, you should be able to access Superset now. I'm closing the task, please reopen if you run into any issues.

Unfortunately I still can't access the two dashboards I need:

https://superset.wikimedia.org/superset/dashboard/222/
MediaSearch sessions

The error I get is:

presto error: Permission denied: user=cbogen, access=EXECUTE, inode="/wmf/data/event":analytics:analytics-privatedata-users:drwxr-x--- at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:351) at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkTraverse(FSPermissionChecker.java:311) at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:238) at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:189) at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkTraverse(FSPermissionChecker.java:541) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkTraverse(FSDirectory.java:1705) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkTraverse(FSDirectory.java:1723) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.resolvePath(FSDirectory.java:642) at org.apache.hadoop.hdfs.server.namenode.FSDirStatAndListingOp.getListingInt(FSDirStatAndListingOp.java:55) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getListing(FSNamesystem.java:3660) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getListing(NameNodeRpcServer.java:1147) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getListing(ClientNamenodeProtocolServerSideTranslatorPB.java:671) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:507) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1034) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1003) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:931) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1926) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2854)

Change 665125 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] Add cbogen to analytics-privatedata-users

https://gerrit.wikimedia.org/r/665125

Change 665125 merged by Ottomata:
[operations/puppet@production] Add cbogen to analytics-privatedata-users

https://gerrit.wikimedia.org/r/665125

The previous patch didn't add you to analytics-privatedata-users, https://gerrit.wikimedia.org/r/c/operations/puppet/+/665125 does, merged and applied. Try again?

All good now, thank you!