Page MenuHomePhabricator

Add user_properties mysql table data to hadoop cluster
Closed, DeclinedPublic

Description

From IRC:

12:02 PM <joal> addshore: Creating a task would be the first step, and then if you feel like it you could add an SQL bit in this function of this file: https://github.com/wikimedia/analytics-refinery/blob/master/python/refinery/sqoop.py#L151
12:04 PM <joal> addshore: It'd be great of you do it as you're not the first one having asked for it

Event Timeline

user_properties is not the best case for one-off sqoop, because it is constantly updated.
We would benefit from a real time approach, but this is not going to happen in the near future.

mforns triaged this task as Medium priority.Jan 17 2019, 5:53 PM
mforns moved this task from Incoming to Smart Tools for Better Data on the Analytics board.
Milimetric subscribed.

We did end up instrumenting user preferences, this data is flowing in real time on events. We had some problems getting too much of it so the instrumentation now includes a short allow list. Look to update that if you're interested in specific properties.