Do we have any examples, anywhere, on what goes into a manifest for a Hadoop/Hive/Sqoop user? Must be the case.
Most probably Analytics/Rafinery somewhere... hmmm... let's see, Analytics/Systems/Data Lake/Administration/Edit/Pipeline page it says "Sqoop job runs in 1003 (although that might change, check puppet) and thus far it logs to: /var/log/refinery/sqoop-mediawiki.log", but no link to puppet and this would probably only give an idea on how to puppetize the future sqoop job on stat1004.
Also I am not sure whether browsing puppet/modules/reportupdater/ helps, but from what I've seen: probably not.
Finally, I've found this example (carefully: it's a part of a third party Puppet module do deploy Hadoop) which maybe helps in figuring out the constraints that a user must satisfy in order to do Hadoop; but I am not sure whether the same or a similar approach would apply to Hive and Sqoop as well.
@Ottomata: please, how does one puppetize a Hadoop/Hive+Sqoop user, if you know of an example somewhere? Note: the user will be orchestrating his Hive/Sqoop jobs from within R on stat1004 and stat1005.
Hm, so creating a hive database is easy, in fact, you can do it!
But, a new system user that has Hadoop access is not easy for complicated reasons. See also:
This is halfway in progress, but is mostly complicated because it changes the way ops manages user accounts. It requires a bunch of communication and buy in from ops folks. It can be done, but it is low priority for me at the moment. :(
@Ottomata Please take your time if you are about to claim this task at all.
I could have recalled the Apache Sqoop/stat1005 related problem earlier in the production of the WDCM system (in fact, I have initiated that discussion).
The responsibility for the WDCM being late in production is thus mine. I am also aware of T174110 and T174465.
One last small favor I ask from you now: please provide an estimate of when do you think it would be possible to have the analytics-wmde users on stat1004 and stat1005 with the access rights as requested (on stat1004: mySQL, Scoop, Hive; on stat1005: Hive, beyond what it already has) - if you can provide such an estimate at this point. Thank you.