Page MenuHomePhabricator

Run critical Analytics Hadoop jobs and make sure that they work with the new auth settings.
Closed, ResolvedPublic13 Estimated Story Points

Description

Run critical Analytics Hadoop jobs and make sure that they work with the new auth settings.

Details

ProjectBranchLines +/-Subject
operations/puppetproduction+5 -0
operations/puppetproduction+8 -0
operations/puppetproduction+1 -1
operations/puppetproduction+21 -13
operations/puppetproduction+224 -165
operations/puppetproduction+2 -0
operations/puppetproduction+39 -31
operations/puppet/cdhmaster+122 -0
operations/puppetproduction+1 -1
operations/puppetproduction+1 -1
operations/puppetproduction+2 -0
operations/puppetproduction+4 -1
operations/puppetproduction+3 -3
operations/puppetproduction+3 -1
operations/puppetproduction+1 -1
operations/puppetproduction+1 -1
analytics/refinerymaster+243 -0
operations/puppetproduction+2 -2
operations/puppetproduction+2 -2
operations/puppetproduction+50 -0
operations/puppetproduction+10 -4
operations/puppetproduction+238 -7
Show related patches Customize query in gerrit

Event Timeline

Nuria triaged this task as High priority.Dec 18 2018, 9:33 PM
Nuria created this task.

Change 489243 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] role::analytics_test_cluster::coordinator: add basic camus support

https://gerrit.wikimedia.org/r/489243

Change 489243 merged by Elukey:
[operations/puppet@production] role::analytics_test_cluster::coordinator: add basic camus support

https://gerrit.wikimedia.org/r/489243

Change 490004 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] role::analytics_test_cluster::coordinator: add admin settings

https://gerrit.wikimedia.org/r/490004

Change 490004 merged by Elukey:
[operations/puppet@production] role::analytics_test_cluster::coordinator: add admin settings

https://gerrit.wikimedia.org/r/490004

Change 490067 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] role::analytics_test_cluster::coord: add kafkatee instance

https://gerrit.wikimedia.org/r/490067

Change 490067 merged by Elukey:
[operations/puppet@production] role::analytics_test_cluster::coord: add kafkatee instance

https://gerrit.wikimedia.org/r/490067

Change 491777 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] camus: make webrequest_text config more similar to prod

https://gerrit.wikimedia.org/r/491777

Change 491777 merged by Elukey:
[operations/puppet@production] camus: make webrequest_text config more similar to prod

https://gerrit.wikimedia.org/r/491777

Change 491779 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] Rename kafka webrequest test topic

https://gerrit.wikimedia.org/r/491779

Change 491779 merged by Elukey:
[operations/puppet@production] Rename kafka webrequest test topic

https://gerrit.wikimedia.org/r/491779

hdfs@analytics1030:/mnt/hdfs/wmf/data/raw/webrequest$ ls
webrequest_test_text

So if I got it correctly, it is now only a matter of using the right bundle.xml/properties and we should be able to refine this data too!

Change 491791 had a related patch set uploaded (by Elukey; owner: Elukey):
[analytics/refinery@master] Add oozie webrequest test bundle

https://gerrit.wikimedia.org/r/491791

Change 491944 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] profile::analytics::refinery::job::test::camus: fix topic whitelist

https://gerrit.wikimedia.org/r/491944

Change 491944 merged by Elukey:
[operations/puppet@production] profile::analytics::refinery::job::test::camus: fix topic whitelist

https://gerrit.wikimedia.org/r/491944

Change 491949 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] profile::analytics::refinery::job::test::camus: fix checked topic

https://gerrit.wikimedia.org/r/491949

Change 491949 merged by Elukey:
[operations/puppet@production] profile::analytics::refinery::job::test::camus: fix checked topic

https://gerrit.wikimedia.org/r/491949

Change 491969 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] role::analytics_test_cluster::coordinator: ensure hive-site.xml in HDFS

https://gerrit.wikimedia.org/r/491969

Change 491969 merged by Elukey:
[operations/puppet@production] role::analytics_test_cluster::coordinator: ensure hive-site.xml in HDFS

https://gerrit.wikimedia.org/r/491969

Change 491973 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] Move ensure hive-site.xml from (test) hadoo coord to ui

https://gerrit.wikimedia.org/r/491973

Change 491973 merged by Elukey:
[operations/puppet@production] Move ensure hive-site.xml from (test) hadoo coord to ui

https://gerrit.wikimedia.org/r/491973

Just saw this on the Confluent mailing list. Parking it here for future reference:

https://groups.google.com/forum/?utm_medium=email&utm_source=footer#!msg/confluent-platform/twFcx3H689U/x9g3WANKBAAJ

We don't yet use Kafka Connect, but if/when we do and we have a Kerberized Hadoop Cluster, we'll need to be aware of that. :/

elukey moved this task from In Progress to Paused on the Analytics-Kanban board.Mar 28 2019, 9:16 AM
elukey moved this task from Paused to In Progress on the Analytics-Kanban board.Apr 12 2019, 9:04 AM
elukey moved this task from Backlog to In Progress on the User-Elukey board.Apr 16 2019, 11:02 AM
elukey moved this task from In Progress to Paused on the Analytics-Kanban board.Jun 4 2019, 4:03 PM
elukey moved this task from In Progress to Stalled on the User-Elukey board.Jun 6 2019, 4:21 PM
elukey moved this task from Paused to In Progress on the Analytics-Kanban board.Jun 20 2019, 2:11 PM

Change 518097 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet/cdh@master] Add cdh::systemd_timer

https://gerrit.wikimedia.org/r/518097

Change 518220 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] profile::hue: add a parameter to selectively enable oozie security

https://gerrit.wikimedia.org/r/518220

Change 518220 merged by Elukey:
[operations/puppet@production] profile::hue: add a parameter to selectively enable oozie security

https://gerrit.wikimedia.org/r/518220

Change 518469 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] role::analytics_test_cluste::hadoop::master|stanby: allow https port

https://gerrit.wikimedia.org/r/518469

Change 518469 merged by Elukey:
[operations/puppet@production] role::analytics_test_cluste::hadoop::master|stanby: allow https port

https://gerrit.wikimedia.org/r/518469

Change 518646 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] hadoop: set 'hdfs' as admin user for the Hadoop test cluster

https://gerrit.wikimedia.org/r/518646

Change 518646 merged by Elukey:
[operations/puppet@production] hadoop: set 'hdfs' as admin user for the Hadoop test cluster

https://gerrit.wikimedia.org/r/518646

Change 518648 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] hadoop: format dfs.cluster.administrators correctly

https://gerrit.wikimedia.org/r/518648

Change 518648 merged by Elukey:
[operations/puppet@production] hadoop: format dfs.cluster.administrators correctly

https://gerrit.wikimedia.org/r/518648

Nuria closed subtask Restricted Task as Resolved.Jun 24 2019, 9:11 PM

Change 518097 abandoned by Elukey:
Add cdh::systemd_timer

https://gerrit.wikimedia.org/r/518097

Change 518915 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] Introduce the kerberos module

https://gerrit.wikimedia.org/r/518915

Change 518915 merged by Elukey:
[operations/puppet@production] Introduce the kerberos module

https://gerrit.wikimedia.org/r/518915

Change 518954 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] Replace profile::analytics::systemd_timer with kerberos::systemd_timer

https://gerrit.wikimedia.org/r/518954

Change 518958 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] camus: add support for kerberos

https://gerrit.wikimedia.org/r/518958

Change 519057 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] Set oozie as proxy for the Hadoop testing cluster

https://gerrit.wikimedia.org/r/519057

Change 519057 merged by Elukey:
[operations/puppet@production] Set oozie as proxy for the Hadoop testing cluster

https://gerrit.wikimedia.org/r/519057

Change 518954 merged by Elukey:
[operations/puppet@production] Replace profile::analytics::systemd_timer with kerberos::systemd_timer

https://gerrit.wikimedia.org/r/518954

Change 518958 merged by Elukey:
[operations/puppet@production] camus: add support for kerberos

https://gerrit.wikimedia.org/r/518958

Change 519263 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] profile::hadoop::common: set r+o to the trustore file

https://gerrit.wikimedia.org/r/519263

Change 519263 merged by Elukey:
[operations/puppet@production] profile::hadoop::common: set r+o to the trustore file

https://gerrit.wikimedia.org/r/519263

Summary of things done:

  1. hue works now with oozie
  2. oozie is kerberized
  3. camus works fine with kerberos
  4. webrequest_load runs successfully replacing hive actions with hive2 actions

Point 4) example:

diff --git a/oozie/util/hive/partition/add/workflow.xml b/oozie/util/hive/partition/add/workflow.xml
index ca07199..4e6d892 100644
--- a/oozie/util/hive/partition/add/workflow.xml
+++ b/oozie/util/hive/partition/add/workflow.xml
@@ -42,13 +42,26 @@
             <description>HDFS path(s) naming the input dataset.</description>
         </property>
     </parameters>
+      <credentials>
+         <credential name='my-hive-creds' type='hive2'>
+            <property>
+               <name>hive2.server.principal</name>
+               <value>hive/analytics1030.eqiad.wmnet@WIKIMEDIA</value>
+            </property>
+            <property>
+               <name>hive2.jdbc.url</name>
+               <value>jdbc:hive2://analytics1030.eqiad.wmnet:10000/default</value>
+            </property>
+         </credential>
+      </credentials>
+

     <start to="add_partition"/>

-    <action name="add_partition">
-        <hive xmlns="uri:oozie:hive-action:0.2">
+    <action name="add_partition" cred="my-hive-creds">
+        <hive2 xmlns="uri:oozie:hive2-action:0.1">
             <job-tracker>${job_tracker}</job-tracker>
             <name-node>${name_node}</name-node>
             <job-xml>${hive_site_xml}</job-xml>
             <configuration>
                 <!--make sure oozie:launcher runs in a low priority queue -->
@@ -65,13 +78,13 @@
                     <value>${queue_name}</value>
                 </property>
             </configuration>
-
+            <jdbc-url>jdbc:hive2://analytics1030.eqiad.wmnet:10000/default</jdbc-url>
             <script>${hive_script}</script>
             <param>database=${replaceAll(table, "\\..*", "")}</param>
             <param>table=${replaceAll(table, "^.*\\.", "")}</param>
             <param>location=${location}</param>
             <param>partition_spec=${partition_spec}</param>
-        </hive>
+        </hive2>
         <ok to="end"/>
         <error to="kill"/>
     </action>

Change 519355 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] cdh::oozie: add hive2/hcat credentials classes

https://gerrit.wikimedia.org/r/519355

Change 519355 merged by Elukey:
[operations/puppet@production] cdh::oozie: add hive2/hcat credentials classes

https://gerrit.wikimedia.org/r/519355

For the scope of this task, I would call it done. Several follow ups will need to be done, but overall most of the critical analytics tools are working!

elukey set the point value for this task to 13.Jun 27 2019, 9:01 AM
elukey moved this task from In Progress to Done on the Analytics-Kanban board.

Change 519368 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] role::analytics_test_cluster::hadoop::ui: configure hive for hue

https://gerrit.wikimedia.org/r/519368

Change 519368 merged by Elukey:
[operations/puppet@production] role::analytics_test_cluster::hadoop::ui: configure hive for hue

https://gerrit.wikimedia.org/r/519368

Nuria added a comment.Jun 27 2019, 4:47 PM

Nice, major milestone.

Nuria closed this task as Resolved.Jun 27 2019, 4:47 PM