Page MenuHomePhabricator

Introduction to Hive class {flea} [13 pts]
Closed, ResolvedPublic

Description

There are a few people in the process of requesting access to the analytics cluster for querying data through hive. We thought it would be a good idea to run a hands on session for this group.

Prerequisites:

  1. Request access for stat1002 (https://wikitech.wikimedia.org/wiki/Requesting_shell_access)

Most of you have already done this, and are in the pipeline to get access. Feel free to reach out if there's any trouble in this process.

  1. Set up your ssh config (https://wikitech.wikimedia.org/wiki/SSH_access)

Add/update your ~/.ssh/config file. It should look something like this: http://pastebin.com/Mb0vCkd1. The User value should be your labs/prod username accordingly.

  1. Add keys to the ssh-agent. On the terminal, something like:

ssh-add ~/.ssh/id_rsa
ssh-add ~/.ssh/id_rsa_prod

  1. If your access has been granted, and ssh config is all good, you should be able to get into stat1002 from the terminal, like this:

ssh stat1002.eqiad.wmnet

It will prompt to confirm the RSA fingerprint, and when you say yes, log you in to the server.

You can quit the session by typing exit.

  1. SQL basics.

Ping me/anyone on #wikimedia-analytics if you run into any trouble in these steps.

Once this is done, you are all set to query data. I would like to host this session next week, and explain

  • How Hive works
  • How to query pageview data and anything else you may be interested in
  • Privacy concerns around the data
  • How to monitor your queries' progress, and troubleshoot common errors

Event Timeline

madhuvishy claimed this task.
madhuvishy raised the priority of this task from to Normal.
madhuvishy updated the task description. (Show Details)
madhuvishy added a subscriber: kevinator.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptSep 23 2015, 10:47 PM

Scheduled for 29th September, 11 am PST.

ggellerman renamed this task from Introduction to Hive class {flea} to Introduction to Hive class {flea} [13pts].Sep 30 2015, 3:38 PM
ggellerman set Security to None.
Milimetric renamed this task from Introduction to Hive class {flea} [13pts] to Introduction to Hive class {flea} [13 pts].Sep 30 2015, 3:39 PM
kevinator closed this task as Resolved.Oct 2 2015, 3:35 PM