We are currently running a Cloudera Hadoop distribution for the Analytics cluster, precisely CDH 5.10. This distribution has served us well but it showed some shortcomings:
- Limited community support for reporting bugs when needed (and getting issues fixed upstream).
- Absence of Debian source packages (limiting our ability to apply patches promptly, mostly for security CVEs).
Cloudera released some days ago CDH 6, a Hadoop 3.0 based distribution containing a lot of software upgrades (among all, Hive 2.1). Given the fact that we are running Hadoop 2.6.0 now, the jump to a new major version would require a lot of work and testing, likely doable only in multiple quarters.
This could be a good time to think if we want to keep going with CDH or change distribution, like:
- Apache Big Top
A bit more details about each distribution:
The last 2.x series release seems to be 2.6.5, here the documentation about installing it manually. The repository seems to deny directory listing so it is difficult to explore, but as far as I can see the support is only up to Debian 7 (Debian Stretch is 9 to compare, so very old).
The last release is 3.1.0 and seems to support Debian Stretch.
Very nice that Apache Ambari and Ranger and integrated with the Distribution.
Version 1.4 supports Debian Stretch, and the upcoming 1.5 also supports Buster (but it jumps to Hadoop 3).
The Deb sources are available in https://github.com/apache/bigtop/tree/master/bigtop-packages/src/deb
In https://issues.apache.org/jira/browse/BIGTOP-3074 they (hopefully) temporary removed the oozie build support since it is not working with Hive 2.X (seems that upstream is working on it). (see https://issues.apache.org/jira/browse/BIGTOP-3099)
The release notes are very interesting to read. From the packages list it is clear though that Hadoop 3.0 is installed. From the requirements notes it seems thought that Debian is not officially supported (at least version 6.0.0) but only Ubuntu Xenial.
From this post it seems that Cloudera will not support Debian for CDH6.
Moreover Cloudera does not offer Debian source packages for 5.X (the current distribution that we are running), that makes it difficult to patch things on the fly if needed (for example, a critical CVE that doesn't have a new Cloudera package ready). This would mean rebuilding the Ubuntu Xenial deb packages for Stretch each time that a release happens.
Note: it seems that Cloudera and Hortonworks will merge in one company very soon.
Note2: CDH 6.3 seems to support Java 11, that is the default on Debian Buster
Mentioning it to have a complete reference even if it is likely not a candidate for Production (the project is new and of course not as battle tested as Hadoop). More info in https://hops.readthedocs.io/en/latest/index.html. They have redesigned a lot of critical aspects of Hadoop, removing things like Zookeeper, Journal nodes, etc.. and replacing them with more fault tolerant and flexible solutions. They call themselves "Hadoop for humans", definitely a project to keep an eye on!