Access to HUE for Mayakpwiki
Closed, ResolvedPublic1 Estimated Story Points
Actions

Assigned To

Authored By

	Mayakp.wiki
	Jul 26 2019, 8:51 PM

Description

Wikitech username: Mayakpwiki
preferred shell username: Mayakpwiki
developer access username / Instance shell account name in preferences: Mayakpwiki
Full name: Maya Kampurath

REQUEST : I would like to get access to HUE to be able to explore and query our Data Lake. I am a contractor working as a Data Quality Analyst in the Product Analytics team and Kate Zimmerman is my manager.

I have signed the NDA with Legal and have also been added to the NDA group.

Reference task where my initial access was set up: T227633: Requesting access to machines [stat1004, stat1007, stat1006, notebook1003, and notebook1004] and groups for Mayakpwiki

Related Objects

Mentioned Here: T227633: Requesting access to machines [stat1004, stat1007, stat1006, notebook1003, and notebook1004] and groups for Mayakpwiki

Event Timeline

Mayakp.wiki created this task.Jul 26 2019, 8:51 PM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJul 26 2019, 8:51 PM

Mayakp.wiki added a subscriber: SRE-Access-Requests.Jul 26 2019, 8:52 PM

Peachey88 added projects: SRE-Access-Requests, Analytics.Jul 26 2019, 9:53 PM

Peachey88 removed subscribers: SRE-Access-Requests, Analytics.

Restricted Application added a project: SRE. · View Herald TranscriptJul 26 2019, 9:53 PM

Peachey88 updated the task description. (Show Details)Jul 26 2019, 9:54 PM

@Mayakp.wiki the nda group will give you access to hue, best place to do your work is probably jupyter notebooks as they are intended as a repository of queries and work to share with others

@Nuria: It was worked out on IRC that they probably need their Hue account created, since they already have NDA LDAP access, see: https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Access#HTTP_Access

Approved

@Mayakp.wiki just a comment on hue. It might not be the best tool for querying the data lake. We (as in the analytics team) prefer using either hive/beeline directly or jupyter notebooks.

Hue's UI is so bad.

• fdans triaged this task as High priority.Jul 29 2019, 3:29 PM

• fdans moved this task from Incoming to Operational Excellence on the Analytics board.

• fdans moved this task from Operational Excellence to Ops Week on the Analytics board.

This seems to be something that the Analytics team needs to handle directly, rather than ops clinic duty, as the directions for HUE require someone who is already an Admin on it to grant other access.

(If this isn't the case, and it should be handled by clinic duty, please state such!)

RobH moved this task from Untriaged to Manager/NDA Approval/Confirmation on the SRE-Access-Requests board.Jul 30 2019, 4:53 PM

RobH moved this task from Manager/NDA Approval/Confirmation to In Discussion on the SRE-Access-Requests board.

RobH moved this task from Backlog to Acknowledged on the SRE board.

@fdans : yes I will be using Jupyter notebooks for the most part but would like to get HUE access for simple queries like validate the metrics values on Turnilo/Superset dashboards. Also, I feel the Hue UI is good for being able to see sample data in a table. It would be beneficial to do these small checks via HUE.
Thanks and please let me know where we stand on the access.

@Mayakp.wiki hue has no ability to connect to druid (which is the data that powers both superset and turnilo), it can only connect to the hive datastore;

To see sampling data in a table this is all the code needed on a jupyter notebook to connect to hive

from pyspark.sql.types import ArrayType, StringType
from pyspark.context import SparkContext
from pyspark.sql.session import SparkSession

sc = SparkContext.getOrCreate()
spark = SparkSession(sc)

df = spark.sql("select from blah desc")
df.show(20, False)

@Mayakp.wiki please give a try to jupyter and let me see on my end what is needed for access

Thanks @Nuria for the query and suggestion. I will use Jupyter and Beeline in the meantime. Please let me know whenever my HUE access is granted. https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Access#HTTP_Access
Thanks for your help with this.

colewhite removed projects: SRE, SRE-Access-Requests.Aug 5 2019, 10:20 PM

colewhite added a project: SRE.

Following up on this request : I have been able to use Jupyter notebooks for some of my work. However, I would still like to get access to HUE for running small, simple queries on hive tables. Thanks!

nshahquinn-wmf subscribed.Aug 19 2019, 3:38 PM

Let me just support Maya's request here. I work primarily in JupyterLab, but I still use Hue frequently for various things:

Running quick queries or exploring the Data Lake (since Hue has a nice graphical table explorer, autocompletion, and a query history)
Checking Oozie workflows and jobs

From a security standpoint, there is no difference since Maya already has full data access via Jupyter/SSH.

mforns moved this task from Ops Week to Incoming on the Analytics board.Aug 19 2019, 5:50 PM

• Nuria assigned this task to JAllemandou.Aug 20 2019, 1:21 PM

Assigning to @joal who has ops duty this week
https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Access#Admin_Instructions_to_sync_a_Hue_account

Action has been taken that should have granted access to shell username Mayakpwiki.
@Mayakp.wiki can you test please? :)

Checked connection and ran queries against mediawiki history. Access is working as expected. Thanks @JAllemandou and @Nuria for your help !

JAllemandou closed this task as Resolved.Aug 20 2019, 7:38 PM

JAllemandou set the point value for this task to 1.

Access to HUE for MayakpwikiClosed, ResolvedPublic1 Estimated Story PointsActions

Description

Related Objects

Event Timeline

Access to HUE for Mayakpwiki
Closed, ResolvedPublic1 Estimated Story Points
Actions