Hive database for analytics-wmde user
@Ottomata Could you please create a Hive database for user analytics-wmde which currently exists on stat1005 and is also planed to come to existence on stat1004 (T180902)?

I guess this should somehow be within our puppet stuff?

GoranSMilovanovic added a comment.EditedNov 20 2017, 10:20 AM

Do we have any examples, anywhere, on what goes into a manifest for a Hadoop/Hive/Sqoop user? Must be the case.

Most probably Analytics/Rafinery somewhere... hmmm... let's see, Analytics/Systems/Data Lake/Administration/Edit/Pipeline page it says "Sqoop job runs in 1003 (although that might change, check puppet) and thus far it logs to: /var/log/refinery/sqoop-mediawiki.log", but no link to puppet and this would probably only give an idea on how to puppetize the future sqoop job on stat1004.

Also I am not sure whether browsing puppet/modules/reportupdater/ helps, but from what I've seen: probably not.

Finally, I've found this example (carefully: it's a part of a third party Puppet module do deploy Hadoop) which maybe helps in figuring out the constraints that a user must satisfy in order to do Hadoop; but I am not sure whether the same or a similar approach would apply to Hive and Sqoop as well.

@Ottomata: please, how does one puppetize a Hadoop/Hive+Sqoop user, if you know of an example somewhere? Note: the user will be orchestrating his Hive/Sqoop jobs from within R on stat1004 and stat1005.

Hm, so creating a hive database is easy, in fact, you can do it!

But, a new system user that has Hadoop access is not easy for complicated reasons. See also:

This is halfway in progress, but is mostly complicated because it changes the way ops manages user accounts. It requires a bunch of communication and buy in from ops folks. It can be done, but it is low priority for me at the moment. :(

@Ottomata Please take your time if you are about to claim this task at all.

I could have recalled the Apache Sqoop/stat1005 related problem earlier in the production of the WDCM system (in fact, I have initiated that discussion).
The responsibility for the WDCM being late in production is thus mine. I am also aware of T174110 and T174465.

One last small favor I ask from you now: please provide an estimate of when do you think it would be possible to have the analytics-wmde users on stat1004 and stat1005 with the access rights as requested (on stat1004: mySQL, Scoop, Hive; on stat1005: Hive, beyond what it already has) - if you can provide such an estimate at this point. Thank you.

Hm, estimate! I think we could make this a goal next quarter at the latest.

Closing the task as (conditionally) resolved given that we already have T171258 and its branches for everything related to WDCM puppetization under the analytics-wmde user account.

Thats not really a reason to close this task.
This task is a sub task of T171258.

Either we need this task to be done (it should stay open) or we don't need it (and we should close it as declined, not resolved)

@Addshore Then leave it opened. We will get back to this as soon the labs WDCM component gets puppetized. Thanks.

@Addshore Following the introduction of Kerberos authentication, all Hive and Spark scripts needed for analytics in this case are run by analytics-privatedata. So there is no need for any Hive database for analytics-wmde user, I guess.