We are still using Hadoop version 2.10.2 and Hive version 1.3.6, both of which are now considered EOL.
Currently there are two active release lines: (https://hadoop.apache.org/releases.html) of Hadoop, which are 3.3 and 3.4
- 3.3.6 - https://hadoop.apache.org/release/3.3.6.html
- 3.4.0 - https://hadoop.apache.org/release/3.4.0.html
Our Debian packages currently use Apache Bigtop (https://bigtop.apache.org/) but we have had to use a custom fork of this project to maintain a release of their version 1.5 branch, which builds for Debian bullseye.
The most recent release of Apache bigtop (https://bigtop.apache.org/download.html#releases) is: 3.3.0
The list of packages included in Bigtop 3.3 is here.
This includes:
- Hadoop 3.3.6
- Hive 3.1.3
We need to plan how and when to upgrade our two Hadoop clusters.
- Will we continue to use bigtop packages, or should we look into some form of containerised approach?
- How will we ensure that the data we have on the Hadoop datanodes is safe during the upgrade? Will we need to back some of it up beforehand?