Page MenuHomePhabricator

[EPIC] Learn about our databases and how to use them
Closed, ResolvedPublic

Description

We have three databases/tables of interest:

  • Event logging (what you will use for Task 1)
    • Once ssh'd into stat1002 or stat1003: run mysql -h analytics-store.eqiad.wmnet to open the mysql command line interface
    • In R (on stat1002), install our internal "wmf" package and use wmf::build_query() to execute queries and get that data into R
    • Install wmf via devtools::install_git('https://gerrit.wikimedia.org/r/wikimedia/discovery/wmf')
  • Webrequests (accessed via Hive)
  • Cirrus searches (also accessed via Hive)

You need to know what they are, their structures, how to operate within them (e.g. aggregations and UDFs inside of Hive), and how to get data out of them for analysis with R/Python/whatever because like 99% of your job description will require you to use this data. So thorough knowledge is very important. For more info on these db's, see: https://meta.wikimedia.org/wiki/Discovery/Analytics#Databases_and_Datasets

This task will be split up into 3 sub-tasks, each for learning one of those databases/tables:

  • T143137 is for learning event logging data.
  • T143762 is for learning web requests data and working with Hive/Hadoop.
  • T147216 is for learning cirrus search requests data.

Event Timeline

mpopov created this task.Aug 16 2016, 5:42 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptAug 16 2016, 5:42 PM
mpopov updated the task description. (Show Details)Aug 16 2016, 6:07 PM
mpopov renamed this task from Learn about our databases and how to use them to [EPIC] Learn about our databases and how to use them.Aug 16 2016, 6:16 PM
mpopov updated the task description. (Show Details)
mpopov changed the point value for this task from 6 to 18.
mpopov updated the task description. (Show Details)Aug 16 2016, 6:23 PM
mpopov removed the point value for this task.
mpopov updated the task description. (Show Details)Aug 16 2016, 6:48 PM
debt triaged this task as Normal priority.Aug 16 2016, 8:15 PM
mpopov updated the task description. (Show Details)Aug 24 2016, 1:16 AM
debt added a project: Discovery.
debt added a subscriber: debt.

yay!

debt closed this task as Resolved.Nov 15 2016, 11:39 PM

...and, done! :)