Page MenuHomePhabricator

Requesting access to analytics-privatedata-users for jamesur
Closed, ResolvedPublic

Description

shell username: jamesur
Full name: James Alexander

Given the increasing need for LCA private data pulls and my role in analyzing them on our side it has become evident that it's time to ask for access to the stats cluster. I've talked to Toby already and he's ok with it, my manager (Philippe) is CC'd for approval if needed. I already have restricted (terbium/fluorine/bast) access.

Event Timeline

Jalexander raised the priority of this task from to Needs Triage.
Jalexander updated the task description. (Show Details)
Restricted Application added a subscriber: Aklapper. · View Herald Transcript
gerritbot subscribed.

Change 191218 had a related patch set uploaded (by Dzahn):
add jamesur to analytics-privatedata-users

https://gerrit.wikimedia.org/r/191218

Patch-For-Review

James,

What is it you want access to? stats boxes are different than analytics cluster. If you want access to private webrequest logs on the Hadoop cluster, then analytics-privatedata-access is what you need. If not, then probably want statistics-privatedata-users. stat1002 hosts several sampled and filtered webrequest data files that are analyzable without having to query the full unsampled webrequest logs.

Aye, sorry for the bad wording on my part, the Hadoop cluster is in fact what I'm looking for. Unfortunately when my use case comes up its for legal process reasons and so tends to require the full unsampled logs.

If you want access to private webrequest logs on the Hadoop cluster, then analytics-privatedata-access

so the gerrit change above is correct then? "analytics-privatedata-users", right?

Great, thanks, looks good. I also think we need to wait the requisite 3 days. This can be merged on Friday.

Is this still valid? Will @Jalexander continue to handle legal-related requests given the Community Engagement restructure?

yes -- this access is needed for a current issue that we need to address.

Is this still valid? Will @Jalexander continue to handle legal-related requests given the Community Engagement restructure?

While obviously pieces of it are still being decided we do know that at least for the next couple months that answer is yes (this current work flow won't change between myself, Philippe and Legal) and there is very good chance that the CA portion of the work flow (which includes working with legal on subpoenas and other requests like this, since they also often include on wiki data and our areas of expertise/knowledge) will carry over more permanently as well.

Also wha Toby said :)

Change 191218 merged by Ottomata:
add jamesur to analytics-privatedata-users

https://gerrit.wikimedia.org/r/191218

K you should be good to go. To check:

ssh stat1002.eqiad.wmnet
hive --database wmf
show tables;
describe webrequest;

:)

increasing need for LCA private data pulls

Is this need/process documented somewhere?

increasing need for LCA private data pulls

Is this need/process documented somewhere?

Not specifically because it is a general need regarding the process for subpoenas/search warrants and other legal process ( https://meta.wikimedia.org/wiki/Legal/Legal_Policies#Subpoenas ), it is required because whenever we have subpoenas or other legal requirements we need to be able to pull all data we have regarding a page/ip etc. This is actually most important because we need to know what we have as soon as possible to help us fight releasing any of it (and if we need to release some of it to narrow the scope of the release as much as possible). It is significantly easier to, for example, tell them that there is no information available that meets their request.

K you should be good to go. To check:

ssh stat1002.eqiad.wmnet
hive --database wmf
show tables;
describe webrequest;

:)

Verified thank you!