Problem Statement
We originally built hadoop packages for bullseye in the following ticket: T310643: Build Bigtop 1.5 Hadoop packages for Bullseye
In doing so, we encountered a problem in that the bigtop build scripts create identically named packages for each distribution.
For example, the deb file for hive-hcatalog was named: hive-hcatalog_2.3.6-3_all.deb whether it was built for buster or bullseye.
This identical filename causes a conflict with reprepro which is how we host our packages for local distribtion on apt.wikimedia.org.
In order to try to work around this problem, we tried modifying the bigtop.bom to add a -deb11 suffix to the package version string.
Sadly, we have just discovered that this attempted workaround causes subtle problems with the generated packages, which may be difficult to detect. The original ticket description below explains how the first issue was detected.
Suggested fix
The suggested fix is to build a set of packages for bullseye using an unmodified bigtop.bom and then use the dpkg-repack tool to modify them in order to add the suffix.
This will be scripted to handle all of the packages that we need.
Original ticket description follows
We have upgraded one of our Hadoop workers to bullseye, but have discovered a problem with one of the packages.
This is the hive-hcatalog packages, which is missing a vital set of symlinks in the /usr/lib/hive-hcatalog/share/hcatalog/ directory.
That directory on a buster host contains this:
btullis@an-test-worker1002:~$ ls -l /usr/lib/hive-hcatalog/share/hcatalog/ total 516 -rw-r--r-- 1 root root 264740 Jan 4 2022 hive-hcatalog-core-2.3.6.jar lrwxrwxrwx 1 root root 28 Jan 4 2022 hive-hcatalog-core.jar -> hive-hcatalog-core-2.3.6.jar -rw-r--r-- 1 root root 53963 Jan 4 2022 hive-hcatalog-pig-adapter-2.3.6.jar lrwxrwxrwx 1 root root 35 Jan 4 2022 hive-hcatalog-pig-adapter.jar -> hive-hcatalog-pig-adapter-2.3.6.jar -rw-r--r-- 1 root root 73711 Jan 4 2022 hive-hcatalog-server-extensions-2.3.6.jar lrwxrwxrwx 1 root root 41 Jan 4 2022 hive-hcatalog-server-extensions.jar -> hive-hcatalog-server-extensions-2.3.6.jar -rw-r--r-- 1 root root 128401 Jan 4 2022 hive-hcatalog-streaming-2.3.6.jar lrwxrwxrwx 1 root root 33 Jan 4 2022 hive-hcatalog-streaming.jar -> hive-hcatalog-streaming-2.3.6.jar
On the bullseye host, those unversioned symlinks are missing:
btullis@an-test-worker1001:~$ ls -l /usr/lib/hive-hcatalog/share/hcatalog/ total 520 -rw-r--r-- 1 root root 264798 Aug 12 2022 hive-hcatalog-core-2.3.6.jar -rw-r--r-- 1 root root 54023 Aug 12 2022 hive-hcatalog-pig-adapter-2.3.6.jar -rw-r--r-- 1 root root 73772 Aug 12 2022 hive-hcatalog-server-extensions-2.3.6.jar -rw-r--r-- 1 root root 128459 Aug 12 2022 hive-hcatalog-streaming-2.3.6.jar
The install_hive.sh script contains a section that was supposed to create those symlinks at the time of the package creation:
for DIR in ${HCATALOG_SHARE_DIR} ; do (cd $DIR && for j in hive-hcatalog-*.jar; do if [[ $j =~ hive-hcatalog-(.*)-${HIVE_VERSION}.jar ]]; then name=${BASH_REMATCH[1]} ln -s $j hive-hcatalog-$name.jar fi done) done
We need to understand why that script didn't work as expected and rebuild the package.
At the same time as fixing this issue, we need to:
- Be on the lookout for any other implications or occurrences of this issue about missing symlinks.
- Update the instructions on Wikitech for building our hadoop packages - Currently the best instructions are in T310643: Build Bigtop 1.5 Hadoop packages for Bullseye
- Refine the build process if necessary/possible