Page MenuHomePhabricator

Hive database for analytics-wmde user
Closed, ResolvedPublic

Description

@Ottomata Could you please create a Hive database for user analytics-wmde which currently exists on stat1005 and is also planed to come to existence on stat1004 (T180902)?

Event Timeline

Restricted Application removed a project: Patch-For-Review. · View Herald TranscriptNov 19 2017, 12:54 PM
GoranSMilovanovic renamed this task from Hive table for analytics-wmde user to Hive database for analytics-wmde user.Nov 19 2017, 12:54 PM
GoranSMilovanovic removed GoranSMilovanovic as the assignee of this task.
Addshore moved this task from Unsorted 💣 to Next on the User-Addshore board.Nov 20 2017, 9:34 AM

I guess this should somehow be within our puppet stuff?

GoranSMilovanovic added a comment.EditedNov 20 2017, 10:20 AM

Do we have any examples, anywhere, on what goes into a manifest for a Hadoop/Hive/Sqoop user? Must be the case.

Most probably Analytics/Rafinery somewhere... hmmm... let's see, Analytics/Systems/Data Lake/Administration/Edit/Pipeline page it says "Sqoop job runs in 1003 (although that might change, check puppet) and thus far it logs to: /var/log/refinery/sqoop-mediawiki.log", but no link to puppet and this would probably only give an idea on how to puppetize the future sqoop job on stat1004.

Also I am not sure whether browsing puppet/modules/reportupdater/ helps, but from what I've seen: probably not.

Finally, I've found this example (carefully: it's a part of a third party Puppet module do deploy Hadoop) which maybe helps in figuring out the constraints that a user must satisfy in order to do Hadoop; but I am not sure whether the same or a similar approach would apply to Hive and Sqoop as well.

@Ottomata: please, how does one puppetize a Hadoop/Hive+Sqoop user, if you know of an example somewhere? Note: the user will be orchestrating his Hive/Sqoop jobs from within R on stat1004 and stat1005.

Hm, so creating a hive database is easy, in fact, you can do it!

But, a new system user that has Hadoop access is not easy for complicated reasons. See also:

https://phabricator.wikimedia.org/T174110
https://phabricator.wikimedia.org/T174465

This is halfway in progress, but is mostly complicated because it changes the way ops manages user accounts. It requires a bunch of communication and buy in from ops folks. It can be done, but it is low priority for me at the moment. :(

@Ottomata Please take your time if you are about to claim this task at all.

I could have recalled the Apache Sqoop/stat1005 related problem earlier in the production of the WDCM system (in fact, I have initiated that discussion).
The responsibility for the WDCM being late in production is thus mine. I am also aware of T174110 and T174465.

One last small favor I ask from you now: please provide an estimate of when do you think it would be possible to have the analytics-wmde users on stat1004 and stat1005 with the access rights as requested (on stat1004: mySQL, Scoop, Hive; on stat1005: Hive, beyond what it already has) - if you can provide such an estimate at this point. Thank you.

Hm, estimate! I think we could make this a goal next quarter at the latest.

Addshore moved this task from Next to Unsorted 💣 on the User-Addshore board.Dec 11 2017, 4:06 PM
GoranSMilovanovic closed this task as Resolved.Dec 14 2017, 11:59 PM
GoranSMilovanovic claimed this task.

Closing the task as (conditionally) resolved given that we already have T171258 and its branches for everything related to WDCM puppetization under the analytics-wmde user account.

Addshore reopened this task as Open.Dec 18 2017, 3:09 PM

Thats not really a reason to close this task.
This task is a sub task of T171258.

Either we need this task to be done (it should stay open) or we don't need it (and we should close it as declined, not resolved)

Addshore removed GoranSMilovanovic as the assignee of this task.Dec 18 2017, 3:10 PM

@Addshore Then leave it opened. We will get back to this as soon the labs WDCM component gets puppetized. Thanks.

GoranSMilovanovic closed this task as Resolved.Sep 2 2020, 11:00 PM
GoranSMilovanovic claimed this task.

@Addshore Following the introduction of Kerberos authentication, all Hive and Spark scripts needed for analytics in this case are run by analytics-privatedata. So there is no need for any Hive database for analytics-wmde user, I guess.