Page MenuHomePhabricator

Add amire80 to analytics-privatedata-users group
Closed, ResolvedPublic

Description

Hi,

According to @Nuria's comment at T122479, I need access to stat1002.eqiad.wmnet in order to run scheduled queries.

If it's relevant, my username on terbium and Gerrit is amire80.

Thanks.

(As with T122479, I wasn't sure which project tags to add here, so please fix them accordingly. Thanks.)

Event Timeline

Amire80 created this task.Dec 28 2015, 5:55 PM
Amire80 raised the priority of this task from to Normal.
Amire80 updated the task description. (Show Details)
Amire80 added subscribers: Krenair, KartikMistry, Nuria and 3 others.
Krenair set Security to Access Request.Dec 28 2015, 5:57 PM
Nuria added a comment.Dec 28 2015, 6:02 PM

Some example of thetypes of queries you need to run will be great (no need of those to file for access but just a s a reference)

Most of these will be for the wikishared database. When I run them on terbium, they take a few minutes.

One of them is to the databases of each Wikipedia, checking which articles that had the contentranslation revision tag were deleted a day before. On terbium this takes about five minutes for all the languages.

You can see the shell scripts here: https://phabricator.wikimedia.org/diffusion/ECTX/browse/master/scripts/daily-stats/

There are 3 things that you need to get access:

  • a signed NDA, that you should already have, otherwise you wouldn't have terbium access,
  • the ok with a comment here from your direct supervisor
  • the ok from the service owner, which probably would be @Nuria (for server access) and myself (for database access).

After that, access will be reviewed and, usually, accepted.

Nuria added a comment.Dec 28 2015, 7:29 PM

I think Amire80 might alreday have acess to 1002, i seem to remember him having run queries there before. If he does not it i s OK to grant it on our end.

@Nuria, as far as I can see, unless I am misinterpreting the permissions, he should not access yet to that specific machine, but has access to others (terbium and fluorine).

Amire80 added a subscriber: Arrbee.Dec 28 2015, 7:37 PM

Adding @Arrbee (Runa) as supervisor.

I only remember running stuff on terbium.

Change 261217 had a related patch set uploaded (by Jcrespo):
Add amire80 to statistics-users for quering mysql analytics-slave

https://gerrit.wikimedia.org/r/261217

Aside from that patch, assuming it is granted I *may* have to provide extra mysql grants to the stats user.

Hello, the request is approved from my side. Thanks.

The access is now in review, a minimum of 3 days is required for security review. That would usually mean getting a decision by 5th of January, but I apologize on behalf of the team if there is any delay, as these days many ops will be traveling or on vacations.

Dzahn added a subscriber: Dzahn.Jan 4 2016, 6:56 PM

added +1 for https://gerrit.wikimedia.org/r/#/c/261217/ i can merge this (tomorrow then), i'm here

Change 261217 merged by Dzahn:
Add amire80 to statistics-users for quering mysql analytics-slave

https://gerrit.wikimedia.org/r/261217

Dzahn added a comment.EditedJan 5 2016, 6:27 PM

I think Amire80 might alreday have acess to 1002, i seem to remember him having run queries there before. If he does not it i s OK to grant it on our end.

Hi all,

User @Amire80 has been added to the group "statistics-users" now, since i merged the pending patch to do that.

But the server stat1002 does not have that user.?

id: amire80: no such user

It seems other groups are requested that give access to stat1002 then. Do you know which?

P.S. Trying to figure out which make sense. Looking at https://wikitech.wikimedia.org/wiki/Analytics/Data_access#Access_Groups now (thanks Robh for the link)

Dzahn added a subscriber: Ottomata.EditedJan 5 2016, 6:40 PM

amire80 has been added to the group statistics-users on stat1003

[stat1003:~] $ id amire80
uid=2076(amire80) gid=500(wikidev) groups=500(wikidev),726(statistics-users)

and the docs say:

statistics-users

Access to stat1003 for number crunching and connecting to the SQL research slaves.

Is "connecting to SQL research slaves" what you want? Then this is ticket is resolved and the title should just be stat1003 instead of stat1002 and you' d use that.

Or is it really stat1002 and one of the other groups described on https://wikitech.wikimedia.org/wiki/Analytics/Data_access#Access_Groups?

@jcrespo I guess it depends on the mysql grants you said you have to add anyways?

@Ottomata Was statistics-users and stat1003 correct for this use case?

It works for me in stat1003 (and thanks to @akosiaris for some extra help with ssh configuration).

@Nuria, is that the right host?

Dzahn closed subtask Restricted Task as Resolved.Jan 7 2016, 8:26 PM
Nuria added a comment.Jan 11 2016, 5:40 PM

I think either 1003 or 1002 work to access analytics slaves thus this ticket can be closed if things are working for amir.

Dzahn closed this task as Resolved.Jan 11 2016, 11:25 PM
Dzahn claimed this task.

thanks for the confirmation. resolving

Amire80 reopened this task as Open.May 4 2016, 8:02 PM
Amire80 added a subscriber: ellery.

@ellery says that I will definitely need stat1002 to run the queries I need. Currently I don't seem to have access to it.

Please clarify how you will run these queries. If MySQL, then you only need access to the 'researchers' group, which will get you access to stat1003 and the research user password.

If you need access to Hive/Hadoop and private webrequest data, then you'll need to be in the analytics-privatedata-users group. This will get you access to stat1002 and stat1004.

Yes, can we please start talking in terms of groups instead of hosts?

ellery added a comment.EditedMay 4 2016, 8:23 PM

@Amire80 needs to run hive queries to count the number times users navigate across wikipedia language projects.

Something like this:

SELECT
        hour, 
        day,
        month, 
        year,
        REGEXP_EXTRACT(parse_url(referer,'HOST'), '([a-z]*)(.m)?.wikipedia.org', 1) AS prev,
        normalized_host.project AS curr
    FROM
        wmf.webrequest
    WHERE 
        -- select a relevant timespan to query over
        year = 2016
        AND month IN (5)
        AND day IN (1)
        AND hour IN (1,2)
        -- only consider wikipedia article requests from users
        AND webrequest_source = 'text'
        AND is_pageview 
        AND agent_type = 'user'
        AND normalized_host.project_class = 'wikipedia'
        -- only consider wikipedia article referers (this is an approximation)
        AND parse_url(referer,'HOST') RLIKE 'wikipedia.org'
        AND parse_url(referer,'PATH') RLIKE  '^/wiki/'
Dzahn removed Dzahn as the assignee of this task.May 4 2016, 9:26 PM
Dzahn added a comment.May 4 2016, 9:29 PM

it will be handled, i'm just giving it back to the pool because access requests are handled based on a duty rotation each week and this was a couple months ago.

MoritzMuehlenhoff renamed this task from access for amire80 to stat1002.eqiad.wmnet to Add amire80 to analytics-privatedata-users group.May 6 2016, 8:22 AM
MoritzMuehlenhoff claimed this task.

@Arrbee : Runa, you're Amir's manager, right? Please confirm this access request adding Amir to the analytics-privatedata-users group (see Andrew Otto's explanation above for the impact)

Change 287179 had a related patch set uploaded (by Muehlenhoff):
Add amire80 to analytics-privatedata-users group

https://gerrit.wikimedia.org/r/287179

Arrbee added a comment.May 6 2016, 8:41 AM

Hi @MoritzMuehlenhoff , I would like to confirm that this request is approved for @Amire80 . Thanks.

Thanks, I'll merge this on Monday.

Change 287179 merged by Muehlenhoff:
Add amire80 to analytics-privatedata-users group

https://gerrit.wikimedia.org/r/287179

MoritzMuehlenhoff closed this task as Resolved.May 9 2016, 7:19 AM

@Amire80 : I've merged the patch, let me know if you run into any problems when logging in.