Page MenuHomePhabricator

Enable 'analytics_cluster' role on Labs instance
Closed, ResolvedPublic

Description

I tried to enable the analytics role (hadoop+hive) on a freshly created Labs instance according to this guide. The first vagrant up worked well and MediaWiki was accessible via Curl. But when trying to provision hadoop I got this error:

$ vagrant roles enable analytics
$ vagrant provision --debug
...
DEBUG ssh: stderr: Error: Puppet::Parser::AST::Resource failed with error ArgumentError: Could not find declared class ::cdh::hadoop at /vagrant/puppet/modules/role/manifests/hadoop.pp:45 on node mediawiki-vagrant.dev
Wrapped exception:
Could not find declared class ::cdh::hadoop
....

Thus, I manually installed the CDH because the puppet/modules/cdh directory was empty (by clone the CDH repo). But, now I'm getting this error message:

DEBUG ssh: stderr: Error: Invalid parameter db_root_password at /vagrant/puppet/modules/role/manifests/hive.pp:17 on node mediawiki-vagrant.dev
...

I'm assuming that some parameters need to be adjust due to the manual installation. How do I fix this? And is it supposed to be necessary to install the CDH module manually?

Event Timeline

mschwarzer@mlp:/srv/mediawiki-vagrant$ vagrant --version
Vagrant 1.7.4``

puppet/modules/cdh is a submodule, so that part sounds like your initial clone of the mediawiki/vagrant.git repo did not initialize the submodules (there are 4). git clone --recursive should do this, but it can be done manually with git submodule update --init --recursive.

The second error sounds like version of the CDH puppet module that you have pulled in has changed from the version that was pinned using the submodule. Removing your manual clone of CDH and using git submodule update --init --recursive will hopefully get you back to properly working Puppet code.

I created a fresh new instance (mlp.math) but ran into a different problem

=> default: Notice: /Stage[main]/Cdh::Oozie::Server/Exec[oozie_sharelib_install]/returns: the destination path for sharelib is: /user/oozie/share/lib/lib_20161206113733
==> default: Notice: /Stage[main]/Cdh::Oozie::Server/Exec[oozie_sharelib_install]/returns: log4j:WARN No appenders could be found for logger (org.apache.htrace.core.Tracer).
==> default: Notice: /Stage[main]/Cdh::Oozie::Server/Exec[oozie_sharelib_install]/returns: log4j:WARN Please initialize the log4j system properly.
==> default: Notice: /Stage[main]/Cdh::Oozie::Server/Exec[oozie_sharelib_install]/returns: log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
==> default: Notice: /Stage[main]/Cdh::Oozie::Server/Exec[oozie_sharelib_install]/returns: /usr/lib/oozie/bin/oozie-setup.sh: line 166: 18557 Killed                  ${JAVA_BIN} ${OOZIE_OPTS} -cp ${OOZIECPPATH} org.apache.oozie.tools.OozieSharelibCLI "${@}"
D

This reminds to terrible problems I had in the past with vagrant, java and the oom killer.

A second run was slightly better:

==> default: Error: /Stage[main]/Cdh::Hadoop::Mount/Mount[hdfs-fuse]: Could not evaluate: Execution of '/bin/mount /mnt/hdfs' returned 1: modprobe: ERROR: ../libkmod/libkmod.c:556 kmod_search_moddep() could not open moddep file '/lib/modules/3.13.0-100-generic/modules.dep.bin'
==> default: fuse: device not found, try 'modprobe fuse' first
==> default: INFO /data/jenkins/workspace/generic-package-ubuntu64-14-04/CDH5.5.2-Packaging-Hadoop-2016-01-25_16-03-19/hadoop-2.6.0+cdh5.5.2+992-1.cdh5.5.2.p0.10~trusty/hadoop-hdfs-project/hadoop-hdfs/src/main/native/fuse-dfs/fuse_options.c:164 Adding FUSE arg /mnt/hdfs
==> default: INFO /data/jenkins/workspace/generic-package-ubuntu64-14-04/CDH5.5.2-Packaging-Hadoop-2016-01-25_16-03-19/hadoop-2.6.0+cdh5.5.2+992-1.cdh5.5.2.p0.10~trusty/hadoop-hdfs-project/hadoop-hdfs/src/main/native/fuse-dfs/fuse_options.c:115 Ignoring option allow_other
==> default: INFO /data/jenkins/workspace/generic-package-ubuntu64-14-04/CDH5.5.2-Packaging-Hadoop-2016-01-25_16-03-19/hadoop-2.6.0+cdh5.5.2+992-1.cdh5.5.2.p0.10~trusty/hadoop-hdfs-project/hadoop-hdfs/src/main/native/fuse-dfs/fuse_options.c:115 Ignoring option dev
==> default: INFO /data/jenkins/workspace/generic-package-ubuntu64-14-04/CDH5.5.2-Packaging-Hadoop-2016-01-25_16-03-19/hadoop-2.6.0+cdh5.5.2+992-1.cdh5.5.2.p0.10~trusty/hadoop-hdfs-project/hadoop-hdfs/src/main/native/fuse-dfs/fuse_options.c:115 Ignoring option suid
==> default:
==> default: Notice: /Stage[main]/Cdh::Hive::Metastore/Service[hive-metastore]/ensure: ensure changed 'stopped' to 'running'
==> default: Info: /Stage[main]/Cdh::Hive::Metastore/Service[hive-metastore]: Unscheduling refresh on Service[hive-metastore]
==> default: Notice: /Stage[main]/Mediawiki/Mediawiki::User[admin_user_in_steward_suppress_on_wiki]/Mediawiki::Maintenance[mediawiki_user_Admin_wiki_steward,suppress]/Exec[mediawiki_user_Admin_wiki_steward,suppress]/returns: executed successfully
==> default: Notice: Finished catalog run in 151.05 seconds
The SSH command responded with a non-zero exit status. Vagrant
assumes that this means the command failed. The output for this command
should be in the log above. Please read the output to determine what
went wrong.

I'll add the reload option and try again.

Oh no that was stupid. Now the machine does not start at all. I vaguely rememer that I once had a problem with a broken fuse entry in the /etc/fstab file.

I added the Analytics project. Maybe some else is using analytics vagrant role and has experienced similar problems.

After destroying and recreating the instance, at least the hdfs was created.

vagrant@mediawiki-vagrant:/mnt/hdfs$ hadoop dfs -ls /
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

Found 3 items
drwxrwxrwt   - hdfs hdfs            0 2016-12-06 12:46 /tmp
drwxrwxr-x   - hdfs hadoop          0 2016-12-06 12:44 /user
drwxr-xr-x   - hdfs hdfs            0 2016-12-06 12:38 /var

please find attached std out and err of the vagrant up command

@mschwarzer as long as we do not reboot the instance you can now continue with the things you inteded to do.
I suggest that we ignore the fuse error

==> default: Error: /Stage[main]/Cdh::Hadoop::Mount/Mount[hdfs-fuse]: Could not evaluate: Execution of '/bin/mount /mnt/hdfs' returned 1: modprobe: ERROR: ../libkmod/libkmod.c:556 kmod_search_moddep() could not open moddep file '/lib/modules/3.13.0-100-generic/modules.dep.bin'
==> default: fuse: device not found, try 'modprobe fuse' first

for now.

just to be safe, I uncommented the following line in /etc/fstab

hadoop-fuse-dfs#dfs://mediawiki-vagrant.dev:8020       /mnt/hdfs       fuse    allow_other,usetrash,ro 0       0

Maybe the problem is that the hostname is incorrect.

Not rebooting is not really a suitable solution when using the VM for development, since I also need to enable other roles or change port-forwarding.

For instance, I tried to enable port forwarding for Hadoop and I got this error:

$ vagrant forward-port 50700 50700
$ vagrant reload
...
 INFO subprocess: Starting process: ["/usr/bin/sudo", "/usr/bin/env", "lxc-attach", "--name", "mediawiki-vagrant_default_1481027183350_3788", "--namespaces", "NETWORK|MOUNT", "--", "/sbin/ip", "-4", "addr", "show", "scope", "global", "eth0"]
 INFO subprocess: Command not in installer, restoring original environment...
DEBUG subprocess: Selecting on IO
 INFO subprocess: Starting process: ["/usr/bin/sudo", "/usr/bin/env", "lxc-info", "--name", "mediawiki-vagrant_default_1481027183350_3788"]
 INFO subprocess: Command not in installer, restoring original environment...
DEBUG subprocess: Selecting on IO
DEBUG subprocess: Waiting for process to exit. Remaining to timeout: 31999
DEBUG subprocess: Exit status: 0
DEBUG subprocess: stdout: Name:           mediawiki-vagrant_default_1481027183350_3788
DEBUG subprocess: stdout: State:          RUNNING
 INFO retryable: Retryable exception raised: #<Vagrant::LXC::Errors::ExecuteError: There was an error executing lxc-attach

For more information on the failure, enable detailed logging by setting
...
/opt/vagrant/embedded/lib/ruby/2.0.0/timeout.rb:66:in `timeout'
/opt/vagrant/embedded/gems/gems/vagrant-1.7.4/plugins/communicators/ssh/communicator.rb:42:in `wait_for_ready'
/opt/vagrant/embedded/gems/gems/vagrant-1.7.4/lib/vagrant/action/builtin/wait_for_communicator.rb:16:in `block in call'
 INFO interface: error: There was an error executing lxc-attach

For more information on the failure, enable detailed logging by setting
the environment variable VAGRANT_LOG to DEBUG.

I think it might be a good idea to play with
https://github.com/wikimedia/mediawiki-vagrant/blob/master/puppet/modules/role/manifests/hadoop.pp#L7

Maybe putting localhost or mlp.wikitech.org helps.

Moreover, the only port you need to forward in addition is the flink web ui port.
hdfs is manged here
https://github.com/wikimedia/mediawiki-vagrant/blob/master/puppet/modules/role/settings/hadoop.yaml

However, I still don't really understand that particular error message.
If you export VAGRANT_LOG=DEBUG you'll see that

 INFO subprocess: Starting process: ["/usr/bin/sudo", "/usr/bin/env", "lxc-attach", "--name", "mediawiki-vagrant_default_1481027183350_3788", "--", "/bin/true"]
 INFO subprocess: Command not in installer, restoring original environment...
DEBUG subprocess: Selecting on IO
DEBUG subprocess: Waiting for process to exit. Remaining to timeout: 32000
DEBUG subprocess: Exit status: 0
 INFO runner: Preparing hooks for middleware sequence...
 INFO runner: 3 hooks defined.
 INFO runner: Running action: machine_action_ssh_ip #<Vagrant::Action::Warden:0x00000003641420>
 INFO warden: Calling IN action: #<Proc:0x000000038ebea0@/opt/vagrant/embedded/gems/gems/vagrant-1.7.4/lib/vagrant/action/warden.rb:94 (lambda)>
 INFO warden: Calling IN action: #<Vagrant::LXC::Action::FetchIpWithLxcAttach:0x00000003641358>
 INFO subprocess: Starting process: ["/usr/bin/sudo", "/usr/bin/env", "lxc-attach", "-h"]
 INFO subprocess: Command not in installer, restoring original environment...
DEBUG subprocess: Selecting on IO
DEBUG subprocess: stderr: Usage: lxc-attach --name=NAME [-- COMMAND]

Execute the specified COMMAND - enter the container NAME

Options :
  -n, --name=NAME   NAME of the container
  -e, --elevated-privileges=PRIVILEGES
                    Use elevated privileges instead of those of the
                    container. If you don't specify privileges to be
                    elevated as OR'd list: CAP, CGROUP and LSM (capabilities,
                    cgroup and restrictions, respectively) then all of them
                    will be elevated.
                    WARNING: This may leak privileges into the container.
                    Use with care.
  -a, --arch=ARCH   Use ARCH for program instead of container's own
                    architecture.
  -s, --namespaces=FLAGS
                    Don't attach to all the namespaces of the container
                    but just to the following OR'd list of flags:
                    MOUNT, PID, UTSNAME, IPC, USER or NETWORK.
                    WARNING: Using -s implies -e with all privileges
                    elevated, it may therefore leak privileges into the
                    container. Use with care.
  -R, --remount-sys-proc
                    Remount /sys and /proc if not attaching to the
                    mount namespace when using -s in order to properly
                    reflect the correct namespace context. See the
                    lxc-attach(1) manual page for details.
      --clear-env   Clear all environment variables before attaching.
                    The attached shell/program will start with only
                    container=lxc set.
      --keep-env    Keep all current environment variables. This
                    is the current default behaviour, but is likely to
                    change in the future.
  -v, --set-var     Set an additional variable that is seen by the
                    attached program in the container. May be specified
                    multiple times.
      --keep-var    Keep an additional environment variable. Only
                    applicable if --clear-env is specified. May be used
                    multiple times.

Common options :
  -o, --logfile=FILE               Output log to FILE instead of stderr
  -l, --logpriority=LEVEL          Set log priority to LEVEL
  -q, --quiet                      Don't produce any output
  -P, --lxcpath=PATH               Use specified container path
  -?, --help                       Give this help list
      --usage                      Give a short usage message
      --version                    Print the version number

Mandatory or optional arguments to long options are also mandatory or optional
for any corresponding short options.

Due to the error with --provision the Hadoop ports weren't set up correctly:

$ vagrant forward-port -l
Local port => VM's port
-----------------------

Therefore I tried to configure them manually.

Vagrant seems to call the lxc-attach help function:

INFO subprocess: Starting process: ["/usr/bin/sudo", "/usr/bin/env", "lxc-attach", "-h"]

I don't see any use in this.

I have no idea what you could try next. Probably run vagrant destroy and adjust the config. Are you trying the analytics role or just the hadoop role? I wonder if nobody else is using that role.

Same problem when only enabling the hadoop role :/

@mschwarzer Maybe you find someone using the analytics vagrant role on IRC.

I don't know what is up at all with lxc-attach, but if fuse is causing you problems, try commenting it out and re-provisioning:

https://github.com/wikimedia/mediawiki-vagrant/blob/master/puppet/modules/role/manifests/hadoop.pp#L89-L98

Without having HDFS mounted Oozie fails, because it cannot access HDFS:

DEBUG ssh: stdout: Notice: /Stage[main]/Cdh::Oozie::Server/Exec[oozie_sharelib_install]/returns: log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Notice: /Stage[main]/Cdh::Oozie::Server/Exec[oozie_sharelib_install]/returns: /usr/lib/oozie/bin/oozie-setup.sh: line 166: 22955 Killed                  ${JAVA_BIN} ${OOZIE_OPTS} -cp ${OOZIECPPATH} org.apache.oozie.tools.OozieSharelibCLI "${@}"

 INFO interface: info: Notice: /Stage[main]/Cdh::Oozie::Server/Exec[oozie_sharelib_install]/returns: log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Notice: /Stage[main]/Cdh::Oozie::Server/Exec[oozie_sharelib_install]/returns: /usr/lib/oozie/bin/oozie-setup.sh: line 166: 22955 Killed                  ${JAVA_BIN} ${OOZIE_OPTS} -cp ${OOZIECPPATH} org.apache.oozie.tools.OozieSharelibCLI "${@}"
 INFO interface: info: ==> default: Notice: /Stage[main]/Cdh::Oozie::Server/Exec[oozie_sharelib_install]/returns: log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
==> default: Notice: /Stage[main]/Cdh::Oozie::Server/Exec[oozie_sharelib_install]/returns: /usr/lib/oozie/bin/oozie-setup.sh: line 166: 22955 Killed                  ${JAVA_BIN} ${OOZIE_OPTS} -cp ${OOZIECPPATH} org.apache.oozie.tools.OozieSharelibCLI "${@}"
==> default: Notice: /Stage[main]/Cdh::Oozie::Server/Exec[oozie_sharelib_install]/returns: log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
==> default: Notice: /Stage[main]/Cdh::Oozie::Server/Exec[oozie_sharelib_install]/returns: /usr/lib/oozie/bin/oozie-setup.sh: line 166: 22955 Killed                  ${JAVA_BIN} ${OOZIE_OPTS} -cp ${OOZIECPPATH} org.apache.oozie.tools.OozieSharelibCLI "${@}"
DEBUG ssh: stderr: Error: /usr/bin/oozie-setup sharelib create -fs hdfs://mediawiki-vagrant.dev -locallib /usr/lib/oozie/oozie-sharelib-yarn returned 137 instead of one of [0]

 INFO interface: info: Error: /usr/bin/oozie-setup sharelib create -fs hdfs://mediawiki-vagrant.dev -locallib /usr/lib/oozie/oozie-sharelib-yarn returned 137 instead of one of [0]
 INFO interface: info: ==> default: Error: /usr/bin/oozie-setup sharelib create -fs hdfs://mediawiki-vagrant.dev -locallib /usr/lib/oozie/oozie-sharelib-yarn returned 137 instead of one of [0]
==> default: Error: /usr/bin/oozie-setup sharelib create -fs hdfs://mediawiki-vagrant.dev -locallib /usr/lib/oozie/oozie-sharelib-yarn returned 137 instead of one of [0]
DEBUG ssh: stderr: Error: /Stage[main]/Cdh::Oozie::Server/Exec[oozie_sharelib_install]/returns: change from notrun to 0 failed: /usr/bin/oozie-setup sharelib create -fs hdfs://mediawiki-vagrant.dev -locallib /usr/lib/oozie/oozie-sharelib-yarn returned 137 instead of one of [0]

 INFO interface: info: Error: /Stage[main]/Cdh::Oozie::Server/Exec[oozie_sharelib_install]/returns: change from notrun to 0 failed: /usr/bin/oozie-setup sharelib create -fs hdfs://mediawiki-vagrant.dev -locallib /usr/lib/oozie/oozie-sharelib-yarn returned 137 instead of one of [0]
 INFO interface: info: ==> default: Error: /Stage[main]/Cdh::Oozie::Server/Exec[oozie_sharelib_install]/returns: change from notrun to 0 failed: /usr/bin/oozie-setup sharelib create -fs hdfs://mediawiki-vagrant.dev -locallib /usr/lib/oozie/oozie-sharelib-yarn returned 137 instead of one of [0]
==> default: Error: /Stage[main]/Cdh::Oozie::Server/Exec[oozie_sharelib_install]/returns: change from notrun to 0 failed: /usr/bin/oozie-setup sharelib create -fs hdfs://mediawiki-vagrant.dev -locallib /usr/lib/oozie/oozie-sharelib-yarn returned 137 instead of one of [0]

Naw, that can't be right. Nothing accesses the /mnt/hdfs fuse mount in Puppet. It is only created for user convenience, so you can cd and ls around like normal.

So, I was thinking about this. I haven't used the analytics role in vagrant in a long time, and we even decided not to maintain a non vagrant Hadoop cluster in labs because it never worked correctly there over the long term. However, in the short term, it works fine if we operations/puppet classes to deploy Hadoop on multiple nodes.

@mschwarzer, what labs project are you working in? If you like, I can try to set up a quick (non vagrant) Hadoop cluster for you. (It'll sorta follow these instructions, although those are surely very out of date.)

@Ottomata that would be excellent. We are working with the math cluster.

The math cluster's quota is almost full. We'll need room for at least 3 more instances, probably totally +16G RAM between the 3. Looks like there's only about 4G free in that project right now. Can yall kill a few instances, or can you ask for someone to up the quota?

Ok!

hadoop000 is up and running NameNode and ResourceManager. hadoop00[23] are running DataNodes and NodeManagers. You should be able to log into any of those 3 and use hdfs and hadoop and yarn commands.

Try it out!

@mschwarzer Did you already manage to setup Flink. I recommend to document your setup here https://wikitech.wikimedia.org/wiki/Flink

@Physikerwelt @Ottomata I was able to run Flink jobs on YARN (see https://wikitech.wikimedia.org/wiki/Flink ). However, I could not enable Oovie / Hive using these instructions:

Notice: /Stage[main]/Cdh::Oozie::Server/File[/etc/oozie/conf.math-hadoop/oozie-log4j.properties]/ensure: defined content as '{md5}2a408b66c1cd38d0626b767252b7730e'
Notice: /Stage[main]/Cdh::Oozie::Server/Exec[oozie_mysql_create_schema]/returns:   setting OOZIE_DATA=/var/lib/oozie
Notice: /Stage[main]/Cdh::Oozie::Server/Exec[oozie_mysql_create_schema]/returns:   setting OOZIE_CATALINA_HOME=/usr/lib/bigtop-tomcat
Notice: /Stage[main]/Cdh::Oozie::Server/Exec[oozie_mysql_create_schema]/returns:   setting CATALINA_TMPDIR=/var/lib/oozie
Notice: /Stage[main]/Cdh::Oozie::Server/Exec[oozie_mysql_create_schema]/returns:   setting CATALINA_PID=/var/run/oozie/oozie.pid
Notice: /Stage[main]/Cdh::Oozie::Server/Exec[oozie_mysql_create_schema]/returns:   setting CATALINA_BASE=/var/lib/oozie/tomcat-deployment
Notice: /Stage[main]/Cdh::Oozie::Server/Exec[oozie_mysql_create_schema]/returns:   setting OOZIE_HTTPS_PORT=11443
Notice: /Stage[main]/Cdh::Oozie::Server/Exec[oozie_mysql_create_schema]/returns:   setting OOZIE_HTTPS_KEYSTORE_PASS=password
Notice: /Stage[main]/Cdh::Oozie::Server/Exec[oozie_mysql_create_schema]/returns:   setting CATALINA_OPTS="$CATALINA_OPTS -Doozie.https.port=${OOZIE_HTTPS_PORT}"
Notice: /Stage[main]/Cdh::Oozie::Server/Exec[oozie_mysql_create_schema]/returns:   setting CATALINA_OPTS="$CATALINA_OPTS -Doozie.https.keystore.pass=${OOZIE_HTTPS_KEYSTORE_PASS}"
Notice: /Stage[main]/Cdh::Oozie::Server/Exec[oozie_mysql_create_schema]/returns:   setting CATALINA_OPTS="$CATALINA_OPTS -Xmx1024m"
Notice: /Stage[main]/Cdh::Oozie::Server/Exec[oozie_mysql_create_schema]/returns:   setting OOZIE_CONFIG=/etc/oozie/conf
Notice: /Stage[main]/Cdh::Oozie::Server/Exec[oozie_mysql_create_schema]/returns:   setting OOZIE_LOG=/var/log/oozie
Notice: /Stage[main]/Cdh::Oozie::Server/Exec[oozie_mysql_create_schema]/returns:   setting OOZIE_DATA=/var/lib/oozie
Notice: /Stage[main]/Cdh::Oozie::Server/Exec[oozie_mysql_create_schema]/returns:   setting OOZIE_CATALINA_HOME=/usr/lib/bigtop-tomcat
Notice: /Stage[main]/Cdh::Oozie::Server/Exec[oozie_mysql_create_schema]/returns:   setting CATALINA_TMPDIR=/var/lib/oozie
Notice: /Stage[main]/Cdh::Oozie::Server/Exec[oozie_mysql_create_schema]/returns:   setting CATALINA_PID=/var/run/oozie/oozie.pid
Notice: /Stage[main]/Cdh::Oozie::Server/Exec[oozie_mysql_create_schema]/returns:   setting CATALINA_BASE=/var/lib/oozie/tomcat-deployment
Notice: /Stage[main]/Cdh::Oozie::Server/Exec[oozie_mysql_create_schema]/returns:   setting OOZIE_HTTPS_PORT=11443
Notice: /Stage[main]/Cdh::Oozie::Server/Exec[oozie_mysql_create_schema]/returns:   setting OOZIE_HTTPS_KEYSTORE_PASS=password
Notice: /Stage[main]/Cdh::Oozie::Server/Exec[oozie_mysql_create_schema]/returns:   setting CATALINA_OPTS="$CATALINA_OPTS -Doozie.https.port=${OOZIE_HTTPS_PORT}"
Notice: /Stage[main]/Cdh::Oozie::Server/Exec[oozie_mysql_create_schema]/returns:   setting CATALINA_OPTS="$CATALINA_OPTS -Doozie.https.keystore.pass=${OOZIE_HTTPS_KEYSTORE_PASS}"
Notice: /Stage[main]/Cdh::Oozie::Server/Exec[oozie_mysql_create_schema]/returns:   setting CATALINA_OPTS="$CATALINA_OPTS -Xmx1024m"
Notice: /Stage[main]/Cdh::Oozie::Server/Exec[oozie_mysql_create_schema]/returns:   setting OOZIE_CONFIG=/etc/oozie/conf
Notice: /Stage[main]/Cdh::Oozie::Server/Exec[oozie_mysql_create_schema]/returns:   setting OOZIE_LOG=/var/log/oozie
Notice: /Stage[main]/Cdh::Oozie::Server/Exec[oozie_mysql_create_schema]/returns: 
Notice: /Stage[main]/Cdh::Oozie::Server/Exec[oozie_mysql_create_schema]/returns: Validate DB Connection
Notice: /Stage[main]/Cdh::Oozie::Server/Exec[oozie_mysql_create_schema]/returns: 
Notice: /Stage[main]/Cdh::Oozie::Server/Exec[oozie_mysql_create_schema]/returns: Error: Could not connect to the database: com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure
Notice: /Stage[main]/Cdh::Oozie::Server/Exec[oozie_mysql_create_schema]/returns: 
Notice: /Stage[main]/Cdh::Oozie::Server/Exec[oozie_mysql_create_schema]/returns: The last packet sent successfully to the server was 0 milliseconds ago. The driver has not received any packets from the server.
Notice: /Stage[main]/Cdh::Oozie::Server/Exec[oozie_mysql_create_schema]/returns: 
Notice: /Stage[main]/Cdh::Oozie::Server/Exec[oozie_mysql_create_schema]/returns: Stack trace for the error was (for debug purposes):
Notice: /Stage[main]/Cdh::Oozie::Server/Exec[oozie_mysql_create_schema]/returns: --------------------------------------
Notice: /Stage[main]/Cdh::Oozie::Server/Exec[oozie_mysql_create_schema]/returns: java.lang.Exception: Could not connect to the database: com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure
Notice: /Stage[main]/Cdh::Oozie::Server/Exec[oozie_mysql_create_schema]/returns: 
Notice: /Stage[main]/Cdh::Oozie::Server/Exec[oozie_mysql_create_schema]/returns: The last packet sent successfully to the server was 0 milliseconds ago. The driver has not received any packets from the server.
Notice: /Stage[main]/Cdh::Oozie::Server/Exec[oozie_mysql_create_schema]/returns: 	at org.apache.oozie.tools.OozieDBCLI.validateConnection(OozieDBCLI.java:905)
Notice: /Stage[main]/Cdh::Oozie::Server/Exec[oozie_mysql_create_schema]/returns: 	at org.apache.oozie.tools.OozieDBCLI.createDB(OozieDBCLI.java:185)
....
Notice: /Stage[main]/Cdh::Oozie::Server/Exec[oozie_mysql_create_schema]/returns: --------------------------------------
Error: /usr/lib/oozie/bin/ooziedb.sh create -run returned 1 instead of one of [0]
Error: /Stage[main]/Cdh::Oozie::Server/Exec[oozie_mysql_create_schema]/returns: change from notrun to 0 failed: /usr/lib/oozie/bin/ooziedb.sh create -run returned 1 instead of one of [0]
Notice: /Stage[main]/Cdh::Oozie::Server/Service[oozie]: Dependency Exec[oozie_mysql_create_schema] has failures: true
Warning: /Stage[main]/Cdh::Oozie::Server/Service[oozie]: Skipping because of failed dependencies
...
Notice: /Stage[main]/Cdh::Oozie::Server/Exec[oozie_sharelib_install]/returns: executed successfully
Notice: /Stage[main]/Role::Analytics_cluster::Oozie::Server/Cron[oozie-clean-logs]: Dependency Exec[oozie_mysql_create_schema] has failures: true
Warning: /Stage[main]/Role::Analytics_cluster::Oozie::Server/Cron[oozie-clean-logs]: Skipping because of failed dependencies

(Full error log: https://nopaste.me/view/093816d2 )

Oh! I didn't realize you wanted Hive and Oozie. Those instructions are def old.

The Hive Metastore and database need to be set up as well. So does the Oozie database. And we need to tell oozie where to find this stuff.

I've set this project hiera:

cdh::hive::metastore_host: hadoop000.math.eqiad.wmflabs
cdh::oozie::oozie_host: hadoop000.math.eqiad.wmflabs

And added puppet roles

analytics_cluster::oozie::server::database
analytics_cluster::hive::metastore::database
analytics_cluster::hive::metastore

And

analytics_cluster::oozie::client

to client nodes.

In addition to the ones you already added.

Note that there will be some permission errors running a hive client as your user from hadoop000. I'd recommend using either hadoop002 or hadoop003 as your primary client node from which you launch hadoop & hive & oozie jobs.

Nuria renamed this task from Cannot enable 'analytics' role on Labs instance to Enable 'analytics' role on Labs instance.Dec 19 2016, 4:39 PM
Nuria renamed this task from Enable 'analytics' role on Labs instance to Enable 'analytics_cluster' role on Labs instance.

Thanks for setting it up.

Yes, we would like to test the whole integration of our Citolytics project ( T143197 ) into the Oozie work flow, i.e. processing of a Wikipedia XML dump via Flink/YARN and writing the results to Elastic/CirrusSearch.

In order to get Elastic/CirrusSearch running, what puppet roles do I need to activate?

That I do not know. You'll have to ask folks on the Discovery team. Perhaps @Gehel or @EBernhardson can help?

For the elasticsearch side, we use the role "elasticsearch::cirrus", which simply includes the role "elasticsearch::common". If you are just experimenting, you can probably play with "elasticsearch::cirrus". To go further, it make sense to create a dedicated role, which would also be based on "elasticsearch::common".

Thanks! Can you recommend any Oozie starting point? Is there already a workflow that uses Wikipedia XML dumps? Or one that writes to ES?

Discovery folks will have to comment on Oozie + ES. You could poke around in here: https://gerrit.wikimedia.org/r/#/admin/projects/wikimedia/discovery/analytics

There isn't a workflow for XML dumps. We don't do any regular analysis in Hadoop of XML dumps. @JAllemandou and @Halfak have had some success doing this, but not with Oozie.

https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Oozie has some slightly outdated but mostly valid examples. You can can also have a look at the analytics/refinery Oozie jobs.

@mschwarzer I think we need the wikitext of the latest page revision, or is there anything else citolytics extracts from the dump?