We are currently running a Cloudera Hadoop distribution for the Analytics cluster, precisely CDH 5.10. This distribution has served us well but it showed some shortcomings:
* Limited community support for reporting bugs when needed (and getting issues fixed upstream).
* Absence of Debian source packages (limiting our ability to apply patches promptly, mostly for security CVEs).
Cloudera released some days ago [[ https://www.cloudera.com/documentation/enterprise/6/release-notes/topics/rg_cdh_6_release_notes.html | CDH 6 ]], a Hadoop 3.0 based distribution containing a lot of software upgrades (among all, Hive 2.1). Given the fact that we are running Hadoop 2.6.0 now, the jump to a new major version would require a lot of work and testing, likely doable only in multiple quarters.
This could be a good time to think if we want to keep going with CDH or change distribution, like:
* Hortonworks
* Apache Big Top
A bit more details about each distribution:
**Hortonworks**
The last 2.x series release seems to be 2.6.5, here the [[ https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.5/bk_command-line-installation/content/config-remote-repositories.html | documentation ]] about installing it manually. The repository seems to deny directory listing so it is difficult to explore, but as far as I can see the support is only up to Debian 7 (Debian Stretch is 9 to compare, so very old).
The last release is 3.1.0 and seems to [[ https://docs.hortonworks.com/HDPDocuments/Ambari-2.7.0.0/bk_ambari-installation/content/hdp_30_repositories.html | support ]] Debian Stretch.
Very nice that Apache Ambari and Ranger and integrated with the Distribution.
**Apache BigTop**
Don't see any Debian Stretch mention in the [[http://apache.panu.it/bigtop/bigtop-1.2.1/repos/ | list of repos]], only Jessie. The next upcoming release (1.3) should support Debian Stretch!
The Deb sources are available in https://github.com/apache/bigtop/tree/master/bigtop-packages/src/deb
**CDH 6**
The [[https://www.cloudera.com/documentation/enterprise/6/release-notes/topics/rg_cdh_600_release_notes.html#cdh600_release_notes | release notes]] are very interesting to read. From the [[https://www.cloudera.com/documentation/enterprise/6/release-notes/topics/rg_cdh_60_packaging.html#cdh_packaging_600 | packages list ]] it is clear though that Hadoop 3.0 is installed. From the [[https://www.cloudera.com/documentation/enterprise/6/release-notes/topics/rg_os_requirements.html#cdh_cm_supported_os | requirements notes ]] it seems thought that Debian is not officially supported (at least version 6.0.0) but only Ubuntu Xenial.
From [[http://community.cloudera.com/t5/CDH-Manual-Installation/CDH-6-supported-OS/m-p/80697#M1787?eid=1&aid=1| this post]] it seems that Cloudera will not support Debian for CDH6. Moreover Cloudera does not offer Debian source packages for 5.X (the current distribution that we are running), that makes it difficult to patch things on the fly if needed (for example, a critical CVE that doesn't have a new Cloudera package ready).
Last but not the least, it seems that Cloudera and Hortonworks will merge in one company very soon.