Page MenuHomePhabricator

Access to HUE for Mayakpwiki
Closed, ResolvedPublic1 Estimate Story Points


Wikitech username: Mayakpwiki
preferred shell username: Mayakpwiki
developer access username / Instance shell account name in preferences: Mayakpwiki
Full name: Maya Kampurath

REQUEST : I would like to get access to HUE to be able to explore and query our Data Lake. I am a contractor working as a Data Quality Analyst in the Product Analytics team and Kate Zimmerman is my manager.

I have signed the NDA with Legal and have also been added to the NDA group.

Reference task where my initial access was set up: T227633: Requesting access to machines [stat1004, stat1007, stat1006, notebook1003, and notebook1004] and groups for Mayakpwiki

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJul 26 2019, 8:51 PM
Restricted Application added a project: Operations. · View Herald TranscriptJul 26 2019, 9:53 PM
Peachey88 updated the task description. (Show Details)Jul 26 2019, 9:54 PM
Nuria added a subscriber: Nuria.Jul 26 2019, 10:23 PM the nda group will give you access to hue, best place to do your work is probably jupyter notebooks as they are intended as a repository of queries and work to share with others

@Nuria: It was worked out on IRC that they probably need their Hue account created, since they already have NDA LDAP access, see:

fdans added a subscriber: fdans.Jul 29 2019, 3:28 PM just a comment on hue. It might not be the best tool for querying the data lake. We (as in the analytics team) prefer using either hive/beeline directly or jupyter notebooks.

Hue's UI is so bad.

fdans triaged this task as High priority.Jul 29 2019, 3:29 PM
fdans moved this task from Incoming to Operational Excellence on the Analytics board.
fdans moved this task from Operational Excellence to Ops Week on the Analytics board.
RobH added a subscriber: RobH.Jul 30 2019, 4:53 PM

This seems to be something that the Analytics team needs to handle directly, rather than ops clinic duty, as the directions for HUE require someone who is already an Admin on it to grant other access.

(If this isn't the case, and it should be handled by clinic duty, please state such!)

@fdans : yes I will be using Jupyter notebooks for the most part but would like to get HUE access for simple queries like validate the metrics values on Turnilo/Superset dashboards. Also, I feel the Hue UI is good for being able to see sample data in a table. It would be beneficial to do these small checks via HUE.
Thanks and please let me know where we stand on the access.

Nuria added a comment.EditedJul 31 2019, 5:16 PM hue has no ability to connect to druid (which is the data that powers both superset and turnilo), it can only connect to the hive datastore;

To see sampling data in a table this is all the code needed on a jupyter notebook to connect to hive

from pyspark.sql.types import ArrayType, StringType
from pyspark.context import SparkContext
from pyspark.sql.session import SparkSession

sc = SparkContext.getOrCreate()
spark = SparkSession(sc)

df = spark.sql("select from blah desc"), False) please give a try to jupyter and let me see on my end what is needed for access

Thanks @Nuria for the query and suggestion. I will use Jupyter and Beeline in the meantime. Please let me know whenever my HUE access is granted.
Thanks for your help with this.

Following up on this request : I have been able to use Jupyter notebooks for some of my work. However, I would still like to get access to HUE for running small, simple queries on hive tables. Thanks!

nshahquinn-wmf added a comment.EditedAug 19 2019, 3:48 PM

Let me just support Maya's request here. I work primarily in JupyterLab, but I still use Hue frequently for various things:

  • Running quick queries or exploring the Data Lake (since Hue has a nice graphical table explorer, autocompletion, and a query history)
  • Checking Oozie workflows and jobs

From a security standpoint, there is no difference since Maya already has full data access via Jupyter/SSH.

mforns moved this task from Ops Week to Incoming on the Analytics board.Aug 19 2019, 5:50 PM
Nuria assigned this task to JAllemandou.Aug 20 2019, 1:21 PM

Action has been taken that should have granted access to shell username Mayakpwiki. can you test please? :)

Checked connection and ran queries against mediawiki history. Access is working as expected. Thanks @JAllemandou and @Nuria for your help !

JAllemandou closed this task as Resolved.Aug 20 2019, 7:38 PM
JAllemandou set the point value for this task to 1.