Page MenuHomePhabricator

Requesting access to analytics-privatedata-users for musikanimal
Closed, ResolvedPublic

Description

Username: musikanimal
Full name: Leon Ziemba
Public key:

ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC5jAvhIngD3svnIyBaHkhZTPEJc80jM363NfWUaFNcdi7n/VudTa3t8vL9jb1OZBUWnL/gfIW4VeLU4rKsfQkcpw6BpL9Qmr50Ewex9eU2pN3/tu1JN9OGNoJry8q81ZaxpH2wJD0JmCC4nlL84Ie7YjZQdcDpeDp4NL/eqEN30DilejVc34cMFpxcH2UYtJnoHGgSPBNsRvftrSniENKlWBrNF+Gjeg+awidUnlpTfGA0q8AGa5Fo69GkHxAzUymgNgeCY6w2H/HqgFcKT53YWgkViBZC0vi3Y0X0EDxnTgYbbKmSij7JU7Z4qJzzd+Tscd/xcO20hPsAYXcW/nF5 musikanimal@wikimedia.org

I am an engineer for Community Tech. As I understand it, being part of the analytics-privatedata-users access group allows me to connect to the Analytics team's MariaDB slaves. This will be very helpful for work I'm doing right now. My colleagues have been helping me run test queries on enwiki.cu_changes, see T156318, so I was going to eventually ask for prod db access, but if I have access to identical, unsanitized slaves than I won't need it :) We will be doing numerous similar projects in 2017 as part of the community health initiative to counter harassment.

Next, as (mostly) a volunteer effort, I want to identify bots that inflate pageviews stats returning by the RESTBase /metrics/pageviews/top endpoint. I have a system setup on Topviews where users can report false positives, so that I can autoexclude such pages from the tool. Much of the time this easy, just compare mobile versus desktop, but other times it's hard to say. Being able to dig deeper and see if there are unreasonable requests coming from a single IP, or finite set of IPs, etc., will lend some clarity. Obviously I won't be sharing any private data, but the hope is I can offer more reliable data by filtering out known false positives. In doing this I'll hopefully also be able to help improve bot detection in general for the Pageviews API, passing on my finding to the Analytics team. I admittedly am not very familiar with the database schema, but I suppose getting access is the first step :) I am under the impression that help from some Analytics team members is at my disposal in pursuing this effort, so if I am unsure about something I won't hesitate to reach out to them first.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

+1 from me. Getting @MusikAnimal access to both Hadoop and the DB replicas will help with many projects in Community-Tech.

+1 @MusikAnimal should have this access. But I think you need a +1 from Danny too. cc @DannyH

Actually he needs @kaldari to sign off as his manager.

Sorry, should've looked that up

@MusikAnimal : Please generate a new SSH key for production access (see https://wikitech.wikimedia.org/wiki/Production_shell_access) and paste the public key here.

@MoritzMuehlenhoff Done! Should have clarified in the description, this is a new key pair and is not used anywhere else:

ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC5jAvhIngD3svnIyBaHkhZTPEJc80jM363NfWUaFNcdi7n/VudTa3t8vL9jb1OZBUWnL/gfIW4VeLU4rKsfQkcpw6BpL9Qmr50Ewex9eU2pN3/tu1JN9OGNoJry8q81ZaxpH2wJD0JmCC4nlL84Ie7YjZQdcDpeDp4NL/eqEN30DilejVc34cMFpxcH2UYtJnoHGgSPBNsRvftrSniENKlWBrNF+Gjeg+awidUnlpTfGA0q8AGa5Fo69GkHxAzUymgNgeCY6w2H/HqgFcKT53YWgkViBZC0vi3Y0X0EDxnTgYbbKmSij7JU7Z4qJzzd+Tscd/xcO20hPsAYXcW/nF5 musikanimal@wikimedia.org

Change 336193 had a related patch set uploaded (by Muehlenhoff):
Add musikanimal to analytics-privatadata-users

https://gerrit.wikimedia.org/r/336193

Change 336193 merged by Muehlenhoff:
Add musikanimal to analytics-privatadata-users

https://gerrit.wikimedia.org/r/336193

MoritzMuehlenhoff claimed this task.

@MusikAnimal I've enabled your access, you should now be able to log into stat1002.eqiad.wmnet. https://wikitech.wikimedia.org/wiki/Production_shell_access has the docs to setup your SSH config for that. And please use separate SSH agents for your labs and production SSH key: https://wikitech.wikimedia.org/wiki/Managing_multiple_SSH_agents
Please reopen this task or ping me on IRC if you run into any problems.