Page MenuHomePhabricator

Build Bigtop 1.5 Hadoop packages for Bullseye
Closed, ResolvedPublic2 Estimated Story Points

Description

This is a necessary step before we can roll out Bullseye across the analytics cluster.

The list of components that we need to build is as follows.

  • bigtop_groovy
  • bigtop_jsvc
  • bigtop-tomcat
  • bigtop-utils
  • flink
  • hadoop
  • hbase
  • hive
  • mahout
  • oozie
  • solr
  • spark
  • sqoop
  • sqoop2

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
BTullis set the point value for this task to 2.Jul 1 2022, 11:48 AM

I'll start work on this today. I have come across this ticket: https://issues.apache.org/jira/browse/BIGTOP-3600 in which @elukey has confirmed that the build process now works with bulseye.

I believe that the build command I need will be:

docker run --rm  -v `pwd`:/ws --workdir /ws bigtop/slaves:trunk-debian-11 bash -c '. /etc/profile.d/bigtop.sh; ./gradlew allclean hadoop-pkg'

I'll then transfer the built packages to apt1001 and then add them to our repo with reprepro. I'll check to see which packages are already present in the repo for buster and only add the same for bullseye.

@elukey does this procedure seem right to you?

I got a build failure from the command above.

[WARNING] collect2: error: ld returned 1 exit status
[WARNING] make[4]: *** [main/native/fuse-dfs/CMakeFiles/fuse_dfs.dir/build.make:496: main/native/fuse-dfs/fuse_dfs] Error 1
[WARNING] make[3]: *** [CMakeFiles/Makefile2:501: main/native/fuse-dfs/CMakeFiles/fuse_dfs.dir/all] Error 2
[WARNING] make[2]: *** [Makefile:114: all] Error 2

[INFO] Apache Hadoop HDFS Native Client ................... FAILURE [  2.181 s]

Continuing to investigate.

I've added debian-11 support to the toolchain in my local 1.5-branch so that I could build a 1.5.0-debian-11 puppet image and subsequently a [[1.5.0-debian-11 build-slave image]].

Then I could run the command:

docker run --rm  -v `pwd`:/ws --workdir /ws bigtop/slaves:1.5.0-debian-11 bash -c '. /etc/profile.d/bigtop.sh; ./gradlew allclean hadoop-pkg'

However, it didn't fix the issue. I got the same error as before, although this is a slightly later fragment.

[ERROR] Failed to execute goal org.apache.hadoop:hadoop-maven-plugins:2.10.2:cmake-compile (cmake-compile) on project hadoop-hdfs-native-client: make failed with error code 2 -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
[ERROR]
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn <goals> -rf :hadoop-hdfs-native-client
make[1]: *** [debian/rules:39: override_dh_auto_build] Error 1
make[1]: Leaving directory '/ws/output/hadoop/hadoop-2.10.2'
make: *** [debian/rules:27: build] Error 2
dpkg-buildpackage: error: debian/rules build subprocess returned exit status 2
debuild: fatal error at line 1182:
dpkg-buildpackage -us -uc -ui -b
 failed

> Task :hadoop-deb FAILED
70 actionable tasks: 70 executed

FAILURE: Build failed with an exception.

* Where:
Script '/ws/packages.gradle' line: 354

* What went wrong:
Execution failed for task ':hadoop-deb'.
> Process 'command 'debuild'' finished with non-zero exit value 29

Even with the patch that worked for Buster I couldn't get this to build, so I've opened an upstream bug report.
https://issues.apache.org/jira/browse/BIGTOP-3720

I'll start work on this today. I have come across this ticket: https://issues.apache.org/jira/browse/BIGTOP-3600 in which @elukey has confirmed that the build process now works with bulseye.

I believe that the build command I need will be:

docker run --rm  -v `pwd`:/ws --workdir /ws bigtop/slaves:trunk-debian-11 bash -c '. /etc/profile.d/bigtop.sh; ./gradlew allclean hadoop-pkg'

I'll then transfer the built packages to apt1001 and then add them to our repo with reprepro. I'll check to see which packages are already present in the repo for buster and only add the same for bullseye.

@elukey does this procedure seem right to you?

One thing that we could do is to import the bigtop's Docker images to the production-images repo, and have them available on build2001 to build all packages. If it is a one-off it may be ok also to follow the above procedure (build locally + upload to apt1001), but let's ask to @MoritzMuehlenhoff
what it is best.

The packages to build are several though, not only the hadoop one (hive, oozie, etc..) but the procedure is similar. I can help if needed!

The packages to build are several though, not only the hadoop one (hive, oozie, etc..) but the procedure is similar. I can help if needed!

Ah, thanks @elukey, that makes sense.

Have we got a definitive list of what packages we built from Bigtop? You mention oozie, hive, but what else do we build?
Looking at this list and the output from sudo -i reprepro -C thirdparty/bigtop15 list buster-wikimedia I'd guess the list is:

  • alluxio (maybe not needed right now)
  • bigtop_groovy
  • bigtop_jsvc
  • bigtop-tomcat
  • bigtop-utils
  • flink
  • hadoop
  • hbase
  • hive
  • mahout
  • oozie
  • solr
  • spark
  • sqoop
  • sqoop2
  • zookeeper (This is also in Debian, so do we need it?)

There are also a few built for i386 as well on buster-wikimedia. Are these still required?

I guess I could just use the ./gradlew deb target which should build everything and then pick out what I need?

One thing that we could do is to import the bigtop's Docker images to the production-images repo, and have them available on build2001 to build all packages. If it is a one-off it may be ok also to follow the above procedure (build locally + upload to apt1001), but let's ask to @MoritzMuehlenhoff

In this case it seems fine to simply go ahead and import the locally build debs.

The packages to build are several though, not only the hadoop one (hive, oozie, etc..) but the procedure is similar. I can help if needed!

  • zookeeper (This is also in Debian, so do we need it?)

No, we only use the native zookeeper packages from Debian.

There are also a few built for i386 as well on buster-wikimedia. Are these still required?

No, we only use amd64, if there are any i386 ones, they can be ignored.

I'm running into a few problem with this bigtop-1.5 support, because upstream doesn't consider Debian 11 a supported operating system until versions 3.0.1 and 3.1.0
https://bigtop.apache.org/release-notes.html

My bug report was politely closed as not a problem because it's an unsupported combination, although they did helpfully point me in the direction of a patch that fixed this particular build issue.

I subsequently got another build error shortly after this, so I'm not yet sure how much work it will be to get branch-1.5 compiling on Bullseye.

I'm making more progress on this now. Upstream has backported one fix: https://github.com/apache/bigtop/commit/505b0da9e1696b2c37feccc96d2e884b209c4a82

I've made a couple more local modifications to my branch-1.5 which were caused by problems building the puppetized toolchain and now I'm trying another build.

That build of hadoop was successful.

> Task :hadoop-pkg

BUILD SUCCESSFUL in 41m 58s
71 actionable tasks: 71 executed
btullis@marlin-wsl:~/src/bigtop-bullseye$ ls -l output/hadoop/
total 369720
-rw-r--r-- 1 root root      3436 Jul 13 14:57 hadoop-client_2.10.2-1_amd64.deb
-rw-r--r-- 1 root root     18028 Jul 13 14:57 hadoop-conf-pseudo_2.10.2-1_amd64.deb
-rw-r--r-- 1 root root   5021860 Jul 13 14:57 hadoop-doc_2.10.2-1_all.deb
-rw-r--r-- 1 root root      4116 Jul 13 14:57 hadoop-hdfs-datanode_2.10.2-1_amd64.deb
-rw-r--r-- 1 root root     21832 Jul 13 14:57 hadoop-hdfs-fuse_2.10.2-1_amd64.deb
-rw-r--r-- 1 root root      4084 Jul 13 14:57 hadoop-hdfs-journalnode_2.10.2-1_amd64.deb
-rw-r--r-- 1 root root      4180 Jul 13 14:57 hadoop-hdfs-namenode_2.10.2-1_amd64.deb
-rw-r--r-- 1 root root      4072 Jul 13 14:57 hadoop-hdfs-secondarynamenode_2.10.2-1_amd64.deb
-rw-r--r-- 1 root root      4116 Jul 13 14:57 hadoop-hdfs-zkfc_2.10.2-1_amd64.deb
-rw-r--r-- 1 root root  30351904 Jul 13 14:58 hadoop-hdfs_2.10.2-1_amd64.deb
-rw-r--r-- 1 root root  41081104 Jul 13 14:57 hadoop-httpfs_2.10.2-1_amd64.deb
-rw-r--r-- 1 root root  25254100 Jul 13 14:57 hadoop-kms_2.10.2-1_amd64.deb
-rw-r--r-- 1 root root      4036 Jul 13 14:57 hadoop-mapreduce-historyserver_2.10.2-1_amd64.deb
-rw-r--r-- 1 root root 127197364 Jul 13 14:58 hadoop-mapreduce_2.10.2-1_amd64.deb
-rw-r--r-- 1 root root      4072 Jul 13 14:57 hadoop-yarn-nodemanager_2.10.2-1_amd64.deb
-rw-r--r-- 1 root root      4000 Jul 13 14:57 hadoop-yarn-proxyserver_2.10.2-1_amd64.deb
-rw-r--r-- 1 root root      4000 Jul 13 14:57 hadoop-yarn-resourcemanager_2.10.2-1_amd64.deb
-rw-r--r-- 1 root root      4032 Jul 13 14:57 hadoop-yarn-timelineserver_2.10.2-1_amd64.deb
-rw-r--r-- 1 root root  49736908 Jul 13 14:57 hadoop-yarn_2.10.2-1_amd64.deb
-rw-r--r-- 1 root root     35276 Jul 13 14:19 hadoop_2.10.2-1.debian.tar.xz
-rw-r--r-- 1 root root      2254 Jul 13 14:19 hadoop_2.10.2-1.dsc
-rw-r--r-- 1 root root  23575832 Jul 13 15:00 hadoop_2.10.2-1_amd64.build
-rw-r--r-- 1 root root     12235 Jul 13 14:58 hadoop_2.10.2-1_amd64.buildinfo
-rw-r--r-- 1 root root      8376 Jul 13 14:58 hadoop_2.10.2-1_amd64.changes
-rw-r--r-- 1 root root  29154308 Jul 13 14:57 hadoop_2.10.2-1_amd64.deb
-rw-r--r-- 1 root root      1345 Jul 13 14:19 hadoop_2.10.2-1_source.changes
-rw-r--r-- 1 root root  46989179 Jul 13 14:19 hadoop_2.10.2.orig.tar.gz
-rw-r--r-- 1 root root      8960 Jul 13 14:57 libhdfs0-dev_2.10.2-1_amd64.deb
-rw-r--r-- 1 root root     27124 Jul 13 14:57 libhdfs0_2.10.2-1_amd64.deb

I'll work on some of the other components as well now.

Build of hive started.

btullis@marlin-wsl:~/src/bigtop-bullseye$ docker run --rm  -v `pwd`:/ws --workdir /ws bigtop/slaves:1.5.0-debian-11 bash -c '. /etc/profile.d/bigtop.sh; ./gradlew allclean hive-pkg'

The build of hive was successful.

> Task :hive-pkg

BUILD SUCCESSFUL in 22m 40s
71 actionable tasks: 71 executed
btullis@marlin-wsl:~/src/bigtop-bullseye$ ls -l output/hive/
total 244936
-rw-r--r-- 1 root root    109056 Jul 14 09:59 hive-hbase_2.3.6-3_all.deb
-rw-r--r-- 1 root root      3616 Jul 14 09:59 hive-hcatalog-server_2.3.6-3_all.deb
-rw-r--r-- 1 root root    476284 Jul 14 09:59 hive-hcatalog_2.3.6-3_all.deb
-rw-r--r-- 1 root root  58098780 Jul 14 09:59 hive-jdbc_2.3.6-3_all.deb
-rw-r--r-- 1 root root      3716 Jul 14 09:59 hive-metastore_2.3.6-3_all.deb
-rw-r--r-- 1 root root      3724 Jul 14 09:59 hive-server2_2.3.6-3_all.deb
-rw-r--r-- 1 root root      3592 Jul 14 09:59 hive-webhcat-server_2.3.6-3_all.deb
-rw-r--r-- 1 root root   3557712 Jul 14 09:59 hive-webhcat_2.3.6-3_all.deb
-rw-r--r-- 1 root root     20816 Jul 14 09:38 hive_2.3.6-3.debian.tar.xz
-rw-r--r-- 1 root root      1227 Jul 14 09:38 hive_2.3.6-3.dsc
-rw-r--r-- 1 root root 162318504 Jul 14 09:59 hive_2.3.6-3_all.deb
-rw-r--r-- 1 root root   5541895 Jul 14 10:00 hive_2.3.6-3_amd64.build
-rw-r--r-- 1 root root      6886 Jul 14 10:00 hive_2.3.6-3_amd64.buildinfo
-rw-r--r-- 1 root root      3719 Jul 14 10:00 hive_2.3.6-3_amd64.changes
-rw-r--r-- 1 root root      1303 Jul 14 09:38 hive_2.3.6-3_source.changes
-rw-r--r-- 1 root root  20635075 Jul 14 09:38 hive_2.3.6.orig.tar.gz

The build of oozie was successful.

> Task :oozie-pkg

BUILD SUCCESSFUL in 14m 4s
71 actionable tasks: 71 executed
btullis@marlin-wsl:~/src/bigtop-bullseye$ ls -l output/oozie/
total 668556
-rw-r--r-- 1 root root  11966132 Jul 14 11:32 oozie-client_4.3.0-2_all.deb
-rw-r--r-- 1 root root     16648 Jul 14 11:20 oozie_4.3.0-2.debian.tar.xz
-rw-r--r-- 1 root root       844 Jul 14 11:20 oozie_4.3.0-2.dsc
-rw-r--r-- 1 root root 660477680 Jul 14 11:32 oozie_4.3.0-2_all.deb
-rw-r--r-- 1 root root   9717065 Jul 14 11:34 oozie_4.3.0-2_amd64.build
-rw-r--r-- 1 root root      5019 Jul 14 11:32 oozie_4.3.0-2_amd64.buildinfo
-rw-r--r-- 1 root root      1255 Jul 14 11:33 oozie_4.3.0-2_amd64.changes
-rw-r--r-- 1 root root      1311 Jul 14 11:20 oozie_4.3.0-2_source.changes
-rw-r--r-- 1 root root   2386289 Jul 14 11:20 oozie_4.3.0.orig.tar.gz
btullis@marlin-wsl:~/src/bigtop-bullseye$ docker run --rm  -v `pwd`:/ws --workdir /ws bigtop/slaves:1.5.0-debian-11 bash -c '. /etc/profile.d/bigtop.sh; ./gradlew allclean bigtop-groovy-pkg'
> Task :bigtop-groovy-pkg

BUILD SUCCESSFUL in 42s
71 actionable tasks: 71 executed
btullis@marlin-wsl:~/src/bigtop-bullseye$ ls -l output/bigtop-groovy/
total 33836
-rw-r--r-- 1 root root     5588 Jul 14 12:20 bigtop-groovy_2.5.4-1.debian.tar.xz
-rw-r--r-- 1 root root      867 Jul 14 12:20 bigtop-groovy_2.5.4-1.dsc
-rw-r--r-- 1 root root  4846880 Jul 14 12:20 bigtop-groovy_2.5.4-1_all.deb
-rw-r--r-- 1 root root    10427 Jul 14 12:20 bigtop-groovy_2.5.4-1_amd64.build
-rw-r--r-- 1 root root     4784 Jul 14 12:20 bigtop-groovy_2.5.4-1_amd64.buildinfo
-rw-r--r-- 1 root root     1000 Jul 14 12:20 bigtop-groovy_2.5.4-1_amd64.changes
-rw-r--r-- 1 root root     1423 Jul 14 12:20 bigtop-groovy_2.5.4-1_source.changes
-rw-r--r-- 1 root root 29754360 Jul 14 12:20 bigtop-groovy_2.5.4.orig.tar.gz
btullis@marlin-wsl:~/src/bigtop-bullseye$ docker run --rm  -v `pwd`:/ws --workdir /ws bigtop/slaves:1.5.0-debian-11 bash -c '. /etc/profile.d/bigtop.sh; ./gradlew allclean bigtop-jsvc-pkg'
> Task :bigtop-jsvc-pkg

BUILD SUCCESSFUL in 29s
71 actionable tasks: 71 executed
btullis@marlin-wsl:~/src/bigtop-bullseye$ ls -l output/bigtop-jsvc/
total 308
-rw-r--r-- 1 root root  27016 Jul 14 12:37 bigtop-jsvc_1.0.15-1.debian.tar.xz
-rw-r--r-- 1 root root    884 Jul 14 12:37 bigtop-jsvc_1.0.15-1.dsc
-rw-r--r-- 1 root root  25758 Jul 14 12:37 bigtop-jsvc_1.0.15-1_amd64.build
-rw-r--r-- 1 root root   4780 Jul 14 12:37 bigtop-jsvc_1.0.15-1_amd64.buildinfo
-rw-r--r-- 1 root root    965 Jul 14 12:37 bigtop-jsvc_1.0.15-1_amd64.changes
-rw-r--r-- 1 root root  26464 Jul 14 12:37 bigtop-jsvc_1.0.15-1_amd64.deb
-rw-r--r-- 1 root root   1406 Jul 14 12:37 bigtop-jsvc_1.0.15-1_source.changes
-rw-r--r-- 1 root root 204944 Jul 14 12:37 bigtop-jsvc_1.0.15.orig.tar.gz
docker run --rm  -v `pwd`:/ws --workdir /ws bigtop/slaves:1.5.0-debian-11 bash -c '. /etc/profile.d/bigtop.sh; ./gradlew allclean bigtop-tomcat-pkg'
> Task :bigtop-tomcat-pkg

BUILD SUCCESSFUL in 1m 0s
71 actionable tasks: 71 executed
btullis@marlin-wsl:~/src/bigtop-bullseye$ ls -l output/bigtop-tomcat/
total 14252
-rw-r--r-- 1 root root    7128 Jul 15 11:01 bigtop-tomcat_8.5.57-1.debian.tar.xz
-rw-r--r-- 1 root root     869 Jul 15 11:01 bigtop-tomcat_8.5.57-1.dsc
-rw-r--r-- 1 root root 8754220 Jul 15 11:02 bigtop-tomcat_8.5.57-1_all.deb
-rw-r--r-- 1 root root   70902 Jul 15 11:02 bigtop-tomcat_8.5.57-1_amd64.build
-rw-r--r-- 1 root root    4788 Jul 15 11:02 bigtop-tomcat_8.5.57-1_amd64.buildinfo
-rw-r--r-- 1 root root     963 Jul 15 11:02 bigtop-tomcat_8.5.57-1_amd64.changes
-rw-r--r-- 1 root root    1434 Jul 15 11:01 bigtop-tomcat_8.5.57-1_source.changes
-rw-r--r-- 1 root root 5730658 Jul 15 11:01 bigtop-tomcat_8.5.57.orig.tar.gz
btullis@marlin-wsl:~/src/bigtop-bullseye$ docker run --rm  -v `pwd`:/ws --workdir /ws bigtop/slaves:1.5.0-debian-11 bash -c '. /etc/profile.d/bigtop.sh; ./gradlew allclean bigtop-utils-pkg'
> Task :bigtop-utils-pkg

BUILD SUCCESSFUL in 22s
71 actionable tasks: 71 executed
btullis@marlin-wsl:~/src/bigtop-bullseye$ ls -l output/bigtop-utils/
total 56
-rw-r--r-- 1 root root 10268 Jul 15 11:05 bigtop-utils_1.5.0-1.debian.tar.xz
-rw-r--r-- 1 root root   847 Jul 15 11:05 bigtop-utils_1.5.0-1.dsc
-rw-r--r-- 1 root root  4276 Jul 15 11:05 bigtop-utils_1.5.0-1_all.deb
-rw-r--r-- 1 root root  6113 Jul 15 11:05 bigtop-utils_1.5.0-1_amd64.build
-rw-r--r-- 1 root root  4770 Jul 15 11:05 bigtop-utils_1.5.0-1_amd64.buildinfo
-rw-r--r-- 1 root root   960 Jul 15 11:05 bigtop-utils_1.5.0-1_amd64.changes
-rw-r--r-- 1 root root  1400 Jul 15 11:05 bigtop-utils_1.5.0-1_source.changes
-rw-r--r-- 1 root root  4148 Jul 15 11:05 bigtop-utils_1.5.0.orig.tar.gz

The flink job failed.

[ERROR] Failed to execute goal org.apache.maven.plugins:maven-shade-plugin:3.0.0:shade (shade-hadoop) on project flink-shaded-hadoop2-uber: Error creating shaded jar: null: IllegalArgumentException -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
[ERROR]
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn <goals> -rf :flink-shaded-hadoop2-uber
make[1]: *** [debian/rules:29: override_dh_auto_build] Error 1
make[1]: Leaving directory '/ws/output/flink/flink-1.6.4'
make: *** [debian/rules:26: build] Error 2
dpkg-buildpackage: error: debian/rules build subprocess returned exit status 2
debuild: fatal error at line 1182:
dpkg-buildpackage -us -uc -ui -b
 failed

> Task :flink-deb FAILED

I have a feeling I may have seen this build error before, so I'll search through the issues on the project to see if I can resolve it.

I had to create a patch file to update the version of the maven-shade-plugin that is in use.

btullis@marlin-wsl:~/src/bigtop-bullseye$ cat bigtop-packages/src/common/flink/patch1-fix-maven-shaded-plugin.diff
From 716b16106d889c0e462d74d6cfcbf92780e8ebfa Mon Sep 17 00:00:00 2001
From: Ben Tullis <btullis@wikimedia.org>
Date: Fri, 15 Jul 2022 13:33:28 +0100
Subject: [PATCH] Update maven-shaded-plugin

---
 pom.xml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/pom.xml b/pom.xml
index f95fb3c0f3d..ff8a3ef2fc1 100644
--- a/pom.xml
+++ b/pom.xml
@@ -1433,7 +1433,7 @@ under the License.
                                <plugin>
                                        <groupId>org.apache.maven.plugins</groupId>
                                        <artifactId>maven-shade-plugin</artifactId>
-                                       <version>3.0.0</version>
+                                       <version>3.1.1</version>
                                </plugin>

                                <plugin>
--
2.30.2

Then it built successfully.

btullis@marlin-wsl:~/src/bigtop-bullseye$ docker run --rm  -v `pwd`:/ws --workdir /ws bigtop/slaves:1.5.0-debian-11 bash -c '. /etc/profile.d/bigtop.sh; ./gradlew allclean flink-pkg'
> Task :flink-pkg

BUILD SUCCESSFUL in 40m 13s
71 actionable tasks: 71 executed
btullis@marlin-wsl:~/src/bigtop-bullseye$ ls -l output/flink/
total 159932
-rw-r--r-- 1 root root      4328 Jul 15 14:19 flink-jobmanager_1.6.4-1_all.deb
-rw-r--r-- 1 root root      4304 Jul 15 14:19 flink-taskmanager_1.6.4-1_all.deb
-rw-r--r-- 1 root root      7784 Jul 15 13:40 flink_1.6.4-1.debian.tar.xz
-rw-r--r-- 1 root root       907 Jul 15 13:40 flink_1.6.4-1.dsc
-rw-r--r-- 1 root root 135472424 Jul 15 14:20 flink_1.6.4-1_all.deb
-rw-r--r-- 1 root root  11529468 Jul 15 14:20 flink_1.6.4-1_amd64.build
-rw-r--r-- 1 root root      5300 Jul 15 14:20 flink_1.6.4-1_amd64.buildinfo
-rw-r--r-- 1 root root      1617 Jul 15 14:20 flink_1.6.4-1_amd64.changes
-rw-r--r-- 1 root root      1311 Jul 15 13:40 flink_1.6.4-1_source.changes
-rw-r--r-- 1 root root  16719593 Jul 15 13:40 flink_1.6.4.orig.tar.gz

I asked in a Slack thread whether we actually use these flink 1.6.4 packages and the conclusion is that I don't think we do. Still, it's good to know that they build anyway.

Hbase built successfully:

btullis@marlin-wsl:~/src/bigtop-bullseye$ docker run --rm  -v `pwd`:/ws --workdir /ws bigtop/slaves:1.5.0-debian-11 bash -c '. /etc/profile.d/bigtop.sh; ./gradlew allclean hbase-pkg'
> Task :hbase-pkg

BUILD SUCCESSFUL in 13m 41s
71 actionable tasks: 71 executed
btullis@marlin-wsl:~/src/bigtop-bullseye$ ls -l output/hbase/
total 136260
-rw-r--r-- 1 root root 19480544 Jul 15 16:13 hbase-doc_1.5.0-1_all.deb
-rw-r--r-- 1 root root    87244 Jul 15 16:12 hbase-master_1.5.0-1_all.deb
-rw-r--r-- 1 root root    89436 Jul 15 16:12 hbase-regionserver_1.5.0-1_all.deb
-rw-r--r-- 1 root root    87192 Jul 15 16:12 hbase-rest_1.5.0-1_all.deb
-rw-r--r-- 1 root root    87552 Jul 15 16:12 hbase-thrift_1.5.0-1_all.deb
-rw-r--r-- 1 root root    14088 Jul 15 16:01 hbase_1.5.0-1.debian.tar.xz
-rw-r--r-- 1 root root     1060 Jul 15 16:01 hbase_1.5.0-1.dsc
-rw-r--r-- 1 root root  7944062 Jul 15 16:14 hbase_1.5.0-1_amd64.build
-rw-r--r-- 1 root root     6072 Jul 15 16:13 hbase_1.5.0-1_amd64.buildinfo
-rw-r--r-- 1 root root     2550 Jul 15 16:13 hbase_1.5.0-1_amd64.changes
-rw-r--r-- 1 root root 99067884 Jul 15 16:13 hbase_1.5.0-1_amd64.deb
-rw-r--r-- 1 root root     1317 Jul 15 16:01 hbase_1.5.0-1_source.changes
-rw-r--r-- 1 root root 12635361 Jul 15 16:01 hbase_1.5.0.orig.tar.gz

Mahout built successfully.

btullis@marlin-wsl:~/src/bigtop-bullseye$ docker run --rm  -v `pwd`:/ws --workdir /ws bigtop/slaves:1.5.0-debian-11 bash -c '. /etc/profile.d/bigtop.sh; ./gradlew allclean mahout-pkg'
> Task :mahout-pkg

BUILD SUCCESSFUL in 1h 3m 35s
71 actionable tasks: 71 executed
btullis@marlin-wsl:~/src/bigtop-bullseye$
btullis@marlin-wsl:~/src/bigtop-bullseye$ ls -l output/mahout/
total 243296
-rw-r--r-- 1 root root    530400 Jul 15 18:10 mahout-doc_0.13.0-1_all.deb
-rw-r--r-- 1 root root      6984 Jul 15 17:08 mahout_0.13.0-1.debian.tar.xz
-rw-r--r-- 1 root root       854 Jul 15 17:08 mahout_0.13.0-1.dsc
-rw-r--r-- 1 root root 236996212 Jul 15 18:10 mahout_0.13.0-1_all.deb
-rw-r--r-- 1 root root   6652526 Jul 15 18:12 mahout_0.13.0-1_amd64.build
-rw-r--r-- 1 root root      5017 Jul 15 18:10 mahout_0.13.0-1_amd64.buildinfo
-rw-r--r-- 1 root root      1260 Jul 15 18:10 mahout_0.13.0-1_amd64.changes
-rw-r--r-- 1 root root      1338 Jul 15 17:08 mahout_0.13.0-1_source.changes
-rw-r--r-- 1 root root   4914767 Jul 15 17:08 mahout_0.13.0.orig.tar.gz

Solr built successfully

btullis@marlin-wsl:~/src/bigtop$ docker run --rm  -v `pwd`:/ws --workdir /ws bigtop/slaves:1.5.0-debian-11 bash -c '. /etc/profile.d/bigtop.sh; ./gradlew allclean solr-pkg'
> Task :solr-pkg

BUILD SUCCESSFUL in 8m 57s
71 actionable tasks: 71 executed
btullis@marlin-wsl:~/src/bigtop$ ls -l output/solr/
total 195076
-rw-r--r-- 1 root root   4377612 Aug  2 12:42 solr-doc_6.6.6-1_all.deb
-rw-r--r-- 1 root root      3720 Aug  2 12:42 solr-server_6.6.6-1_all.deb
-rw-r--r-- 1 root root     43368 Aug  2 12:34 solr_6.6.6-1.debian.tar.xz
-rw-r--r-- 1 root root       891 Aug  2 12:34 solr_6.6.6-1.dsc
-rw-r--r-- 1 root root 139819264 Aug  2 12:42 solr_6.6.6-1_all.deb
-rw-r--r-- 1 root root    792469 Aug  2 12:42 solr_6.6.6-1_amd64.build
-rw-r--r-- 1 root root      5248 Aug  2 12:42 solr_6.6.6-1_amd64.buildinfo
-rw-r--r-- 1 root root      1535 Aug  2 12:42 solr_6.6.6-1_amd64.changes
-rw-r--r-- 1 root root      1307 Aug  2 12:34 solr_6.6.6-1_source.changes
-rw-r--r-- 1 root root  54692402 Aug  2 12:34 solr_6.6.6.orig.tar.gz

Spark built successfully

btullis@marlin-wsl:~/src/bigtop$ docker run --rm  -v `pwd`:/ws --workdir /ws bigtop/slaves:1.5.0-debian-11 bash -c '. /etc/profile.d/bigtop.sh; ./gradlew allclean spark-pkg'
> Task :spark-pkg

BUILD SUCCESSFUL in 34m 26s
71 actionable tasks: 71 executed
btullis@marlin-wsl:~/src/bigtop$ ls -l output/spark/
total 237688
-rw-r--r-- 1 root root     38400 Aug  2 12:50 spark-core_2.4.5-1.debian.tar.xz
-rw-r--r-- 1 root root      1356 Aug  2 12:50 spark-core_2.4.5-1.dsc
-rw-r--r-- 1 root root 189887920 Aug  2 13:23 spark-core_2.4.5-1_all.deb
-rw-r--r-- 1 root root  10375354 Aug  2 13:24 spark-core_2.4.5-1_amd64.build
-rw-r--r-- 1 root root      7241 Aug  2 13:23 spark-core_2.4.5-1_amd64.buildinfo
-rw-r--r-- 1 root root      3947 Aug  2 13:23 spark-core_2.4.5-1_amd64.changes
-rw-r--r-- 1 root root      1387 Aug  2 12:50 spark-core_2.4.5-1_source.changes
-rw-r--r-- 1 root root  15658998 Aug  2 12:50 spark-core_2.4.5.orig.tar.gz
-rw-r--r-- 1 root root   3750520 Aug  2 13:23 spark-datanucleus_2.4.5-1_all.deb
-rw-r--r-- 1 root root   9749700 Aug  2 13:23 spark-external_2.4.5-1_all.deb
-rw-r--r-- 1 root root      3456 Aug  2 13:23 spark-history-server_2.4.5-1_all.deb
-rw-r--r-- 1 root root      3496 Aug  2 13:23 spark-master_2.4.5-1_all.deb
-rw-r--r-- 1 root root   1027884 Aug  2 13:23 spark-python_2.4.5-1_all.deb
-rw-r--r-- 1 root root   3946708 Aug  2 13:23 spark-sparkr_2.4.5-1_all.deb
-rw-r--r-- 1 root root      3504 Aug  2 13:23 spark-thriftserver_2.4.5-1_all.deb
-rw-r--r-- 1 root root      3464 Aug  2 13:23 spark-worker_2.4.5-1_all.deb
-rw-r--r-- 1 root root   8900232 Aug  2 13:23 spark-yarn-shuffle_2.4.5-1_all.deb

Only spark2, but that was of course expected.

The sqoop build failed.

BUILD FAILED
/ws/output/sqoop/sqoop-1.4.6/build.xml:1094: Execute failed: java.io.IOException: Cannot run program "python" (in directory "/ws/output/sqoop/sqoop-1.4.6"): error=2, No such file or directory
        at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)
        at java.lang.Runtime.exec(Runtime.java:621)
        at org.apache.tools.ant.taskdefs.launcher.Java13CommandLauncher.exec(Java13CommandLauncher.java:58)
        at org.apache.tools.ant.taskdefs.Execute.launch(Execute.java:426)
        at org.apache.tools.ant.taskdefs.Execute.execute(Execute.java:440)
        at org.apache.tools.ant.taskdefs.ExecTask.runExecute(ExecTask.java:630)
        at org.apache.tools.ant.taskdefs.ExecTask.runExec(ExecTask.java:671)
        at org.apache.tools.ant.taskdefs.ExecTask.execute(ExecTask.java:497)
        at org.apache.tools.ant.UnknownElement.execute(UnknownElement.java:293)
        at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.tools.ant.dispatch.DispatchUtils.execute(DispatchUtils.java:106)
        at org.apache.tools.ant.Task.perform(Task.java:352)
        at org.apache.tools.ant.Target.execute(Target.java:437)
        at org.apache.tools.ant.Target.performTasks(Target.java:458)
        at org.apache.tools.ant.Project.executeSortedTargets(Project.java:1406)
        at org.apache.tools.ant.Project.executeTarget(Project.java:1377)
        at org.apache.tools.ant.helper.DefaultExecutor.executeTargets(DefaultExecutor.java:41)
        at org.apache.tools.ant.Project.executeTargets(Project.java:1261)
        at org.apache.tools.ant.Main.runBuild(Main.java:857)
        at org.apache.tools.ant.Main.startAnt(Main.java:236)
        at org.apache.tools.ant.launch.Launcher.run(Launcher.java:287)
        at org.apache.tools.ant.launch.Launcher.main(Launcher.java:112)
Caused by: java.io.IOException: error=2, No such file or directory
        at java.lang.UNIXProcess.forkAndExec(Native Method)
        at java.lang.UNIXProcess.<init>(UNIXProcess.java:247)
        at java.lang.ProcessImpl.start(ProcessImpl.java:134)
        at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)
        ... 23 more

Total time: 8 minutes 4 seconds
make[1]: *** [debian/rules:30: override_dh_auto_build] Error 1
make[1]: Leaving directory '/ws/output/sqoop/sqoop-1.4.6'
make: *** [debian/rules:27: build] Error 2
dpkg-buildpackage: error: debian/rules build subprocess returned exit status 2

I built sqoop2 before returning to sqoop. This was successful.

btullis@marlin-wsl:~/src/bigtop$ docker run --rm  -v `pwd`:/ws --workdir /ws bigtop/slaves:1.5.0-debian-11 bash -c '. /etc/profile.d/bigtop.sh; ./gradlew allclean sqoop2-pkg'
> Task :sqoop2-pkg

BUILD SUCCESSFUL in 3m 20s
71 actionable tasks: 71 executed
btullis@marlin-wsl:~/src/bigtop$ ls -l output/sqoop2/
total 23048
-rw-r--r-- 1 root root  8566596 Aug  2 15:18 sqoop2-client_1.99.4-1_all.deb
-rw-r--r-- 1 root root    15280 Aug  2 15:18 sqoop2-server_1.99.4-1_all.deb
-rw-r--r-- 1 root root    10208 Aug  2 15:15 sqoop2_1.99.4-1.debian.tar.xz
-rw-r--r-- 1 root root      874 Aug  2 15:15 sqoop2_1.99.4-1.dsc
-rw-r--r-- 1 root root 12664996 Aug  2 15:18 sqoop2_1.99.4-1_all.deb
-rw-r--r-- 1 root root  1893123 Aug  2 15:18 sqoop2_1.99.4-1_amd64.build
-rw-r--r-- 1 root root     4365 Aug  2 15:18 sqoop2_1.99.4-1_amd64.buildinfo
-rw-r--r-- 1 root root     1599 Aug  2 15:18 sqoop2_1.99.4-1_amd64.changes
-rw-r--r-- 1 root root     1336 Aug  2 15:15 sqoop2_1.99.4-1_source.changes
-rw-r--r-- 1 root root   416855 Aug  2 15:15 sqoop2_1.99.4.orig.tar.gz

Bother, I built solr, spark, and sqoop[2] in the wrong working directory, so I'm going to rebuild them.

I have rebuilt the solr, spark , and sqoop2 packages using the correct working directory and the same commands as mentioned previously.

I have also managed to fix the build of sqoop using the following patch, which specifies the python executable to use:

btullis@marlin-wsl:~/src/bigtop-bullseye$ cat bigtop-packages/src/common/sqoop/patch-specify-python-version.diff
diff --git a/build.xml b/build.xml
index d614d09..eee61ec 100644
--- a/build.xml
+++ b/build.xml
@@ -216,7 +216,7 @@
   <property name="git.hash" value="" />

   <!-- programs used -->
-  <property name="python" value="python" />
+  <property name="python" value="python2.7" />

   <!-- locations in the source tree -->
   <property name="base.src.dir" location="${basedir}/src" />
btullis@marlin-wsl:~/src/bigtop-bullseye$ docker run --rm  -v `pwd`:/ws --workdir /ws bigtop/slaves:1.5.0-debian-11 bash -c '. /etc/profile.d/bigtop.sh; ./gradlew allclean sqoop-pkg'
> Task :sqoop-pkg

BUILD SUCCESSFUL in 7m 46s
71 actionable tasks: 71 executed
btullis@marlin-wsl:~/src/bigtop-bullseye$ ls -l output/sqoop/
total 13548
-rw-r--r-- 1 root root    16080 Aug  3 14:46 sqoop-metastore_1.4.6-1_all.deb
-rw-r--r-- 1 root root     9324 Aug  3 14:39 sqoop_1.4.6-1.debian.tar.xz
-rw-r--r-- 1 root root      864 Aug  3 14:39 sqoop_1.4.6-1.dsc
-rw-r--r-- 1 root root 11380320 Aug  3 14:46 sqoop_1.4.6-1_all.deb
-rw-r--r-- 1 root root   235985 Aug  3 14:46 sqoop_1.4.6-1_amd64.build
-rw-r--r-- 1 root root     5856 Aug  3 14:46 sqoop_1.4.6-1_amd64.buildinfo
-rw-r--r-- 1 root root     1274 Aug  3 14:46 sqoop_1.4.6-1_amd64.changes
-rw-r--r-- 1 root root     1308 Aug  3 14:39 sqoop_1.4.6-1_source.changes
-rw-r--r-- 1 root root  2197063 Aug  3 14:39 sqoop_1.4.6.orig.tar.gz

That means all packages have now been built for bullseye.

I will transfer to them to apt1001 and add them to apt.wikimedia.org using reprepro.

Change 821223 had a related patch set uploaded (by Btullis; author: Btullis):

[operations/puppet@production] Add thirdparty/bigtop15 component to wikimedia-bullseye

https://gerrit.wikimedia.org/r/821223

Change 821223 merged by Jbond:

[operations/puppet@production] Add thirdparty/bigtop15 component to wikimedia-bullseye

https://gerrit.wikimedia.org/r/821223

So the final build command was the following:

btullis@marlin-wsl:~/src/bigtop-bullseye$ docker run --rm  -v `pwd`:/ws --workdir /ws bigtop/slaves:1.5.0-debian-11 bash -c '. /etc/profile.d/bigtop.sh; ./gradlew allclean hadoop-pkg hive-pkg bigtop-groovy-pkg bigtop-jsvc-pkg bigtop-tomcat-pkg bigtop-utils-pkg flink-pkg hbase-pkg mahout-pkg solr-pkg spark-pkg sqoop-pkg sqoop2-pkg'

I've added the thindparty/bigtop15 component to the wikimedia-bullseye distribution: https://gerrit.wikimedia.org/r/821223

However, I ran into an issue adding the packages with reprepro

btullis@apt1001:~/bigtop-bullseye/bigtop-groovy$ sudo -i reprepro -C thirdparty/bigtop15 --ignore=wrongdistribution include bullseye-wikimedia `pwd`/bigtop-groovy_2.5.4-1_amd64.changes
.changes put in a distribution not listed within it!
Ignoring as --ignore=wrongdistribution given.
File "pool/thirdparty/bigtop15/b/bigtop-groovy/bigtop-groovy_2.5.4-1_all.deb" is already registered with different checksums!
md5 expected: b4f8d524e5a757952018658874e00296, got: 7823305de5ee62d63a04bef776de59c4
sha1 expected: 45f9124e2db83161121a19b64db1eab79224ccea, got: 5e84432417c1b2de4e8f73b4fbfb02d4f95365e6
sha256 expected: 2f6eb9027f69b6b36088e101e5b7172001d0de16c0a8ccb3e748200a74d500d3, got: 1ec9f772052c551a00336411bfb00f86c450d9f04466d89835be2fab4349e33d
size expected: 4848258, got: 4847044
There have been errors!

The filenames are identical between distributions, which means that we can't add them as they are.

We may need to create versioned components, as described here: https://wikitech.wikimedia.org/wiki/Reprepro#Multiple_versions_of_the_same_package

I'm in a bit of a quandry here, since reprepro isn't allowing me to have identically named but different files across distributions.

The best solution I can think of doing is to add a suffix to the version string within the package filenames with -deb11.
This would be configured in the bigtop.bom file, for each component that we require.

For example, when building the hadoop component...

btullis@marlin-wsl:~/src/bigtop-bullseye$ git diff bigtop.bom
diff --git a/bigtop.bom b/bigtop.bom
index 6ae57730..40fca47f 100644
--- a/bigtop.bom
+++ b/bigtop.bom
@@ -148,7 +148,7 @@ bigtop {
     'hadoop' {
       name    = 'hadoop'
       relNotes = 'Apache Hadoop'
-      version { base = '2.10.2'; pkg = base; release = 1 }
+      version { base = '2.10.2'; pkg = base-'deb11'; release = 1 }
       tarball { destination = "${name}-${version.base}.tar.gz"
                 source      = "${name}-${version.base}-src.tar.gz" }
       url     { download_path = "/$name/common/$name-${version.base}"

When I subsequently buid the hadoop packages they all have the suffix in the filename.

Can anyone else (like @elukey, @Ottomata , @MoritzMuehlenhoff ) think of a better solution?

Maybe someone else has done something similar or has a workaround?

I believe that this option will work. All packages were built successfully with the exception of flink, which I don't think we're using anyway.
I tested with reprepro by adding a single package, bigtop-groovy

btullis@apt1001:~/bigtop-bullseye/bigtop-groovy$ sudo -i reprepro -C thirdparty/bigtop15 --ignore=wrongdistribution include bullseye-wikimedia `pwd`/bigtop-groovy_2.5.4-deb11-1_amd64.changes
.changes put in a distribution not listed within it!
Ignoring as --ignore=wrongdistribution given.
Exporting indices...

btullis@apt1001:~/bigtop-bullseye/bigtop-groovy$ sudo -i reprepro ls bigtop-groovy
bigtop-groovy |      2.4.10-1 |  stretch-wikimedia | amd64
bigtop-groovy |       2.5.4-1 |  stretch-wikimedia | amd64
bigtop-groovy |       2.5.4-1 |   buster-wikimedia | amd64
bigtop-groovy | 2.5.4-deb11-1 | bullseye-wikimedia | amd64, i386

I'll add all of the remaining packages and continue testing.

Here are the hadoop packages as an example:

btullis@apt1001:~/bigtop-bullseye/hadoop$ ls -l
total 370072
-rw-r--r-- 1 btullis wikidev  23919194 Aug 12 13:57 hadoop_2.10.2-deb11-1_amd64.build
-rw-r--r-- 1 btullis wikidev     12637 Aug 12 13:55 hadoop_2.10.2-deb11-1_amd64.buildinfo
-rw-r--r-- 1 btullis wikidev      8802 Aug 12 13:55 hadoop_2.10.2-deb11-1_amd64.changes
-rw-r--r-- 1 btullis wikidev  29154288 Aug 12 13:55 hadoop_2.10.2-deb11-1_amd64.deb
-rw-r--r-- 1 btullis wikidev     35296 Aug 12 12:59 hadoop_2.10.2-deb11-1.debian.tar.xz
-rw-r--r-- 1 btullis wikidev      2296 Aug 12 12:59 hadoop_2.10.2-deb11-1.dsc
-rw-r--r-- 1 btullis wikidev      1429 Aug 12 12:59 hadoop_2.10.2-deb11-1_source.changes
-rw-r--r-- 1 btullis wikidev  46989179 Aug 12 12:59 hadoop_2.10.2-deb11.orig.tar.gz
-rw-r--r-- 1 btullis wikidev      3436 Aug 12 13:55 hadoop-client_2.10.2-deb11-1_amd64.deb
-rw-r--r-- 1 btullis wikidev     18020 Aug 12 13:55 hadoop-conf-pseudo_2.10.2-deb11-1_amd64.deb
-rw-r--r-- 1 btullis wikidev   5022732 Aug 12 13:55 hadoop-doc_2.10.2-deb11-1_all.deb
-rw-r--r-- 1 btullis wikidev  30351736 Aug 12 13:55 hadoop-hdfs_2.10.2-deb11-1_amd64.deb
-rw-r--r-- 1 btullis wikidev      4128 Aug 12 13:55 hadoop-hdfs-datanode_2.10.2-deb11-1_amd64.deb
-rw-r--r-- 1 btullis wikidev     21852 Aug 12 13:55 hadoop-hdfs-fuse_2.10.2-deb11-1_amd64.deb
-rw-r--r-- 1 btullis wikidev      4096 Aug 12 13:55 hadoop-hdfs-journalnode_2.10.2-deb11-1_amd64.deb
-rw-r--r-- 1 btullis wikidev      4188 Aug 12 13:55 hadoop-hdfs-namenode_2.10.2-deb11-1_amd64.deb
-rw-r--r-- 1 btullis wikidev      4084 Aug 12 13:55 hadoop-hdfs-secondarynamenode_2.10.2-deb11-1_amd64.deb
-rw-r--r-- 1 btullis wikidev      4116 Aug 12 13:55 hadoop-hdfs-zkfc_2.10.2-deb11-1_amd64.deb
-rw-r--r-- 1 btullis wikidev  41082156 Aug 12 13:55 hadoop-httpfs_2.10.2-deb11-1_amd64.deb
-rw-r--r-- 1 btullis wikidev  25254464 Aug 12 13:55 hadoop-kms_2.10.2-deb11-1_amd64.deb
-rw-r--r-- 1 btullis wikidev 127200060 Aug 12 13:55 hadoop-mapreduce_2.10.2-deb11-1_amd64.deb
-rw-r--r-- 1 btullis wikidev      4044 Aug 12 13:55 hadoop-mapreduce-historyserver_2.10.2-deb11-1_amd64.deb
-rw-r--r-- 1 btullis wikidev  49743080 Aug 12 13:55 hadoop-yarn_2.10.2-deb11-1_amd64.deb
-rw-r--r-- 1 btullis wikidev      4076 Aug 12 13:55 hadoop-yarn-nodemanager_2.10.2-deb11-1_amd64.deb
-rw-r--r-- 1 btullis wikidev      4004 Aug 12 13:55 hadoop-yarn-proxyserver_2.10.2-deb11-1_amd64.deb
-rw-r--r-- 1 btullis wikidev      4004 Aug 12 13:55 hadoop-yarn-resourcemanager_2.10.2-deb11-1_amd64.deb
-rw-r--r-- 1 btullis wikidev      4040 Aug 12 13:55 hadoop-yarn-timelineserver_2.10.2-deb11-1_amd64.deb
-rw-r--r-- 1 btullis wikidev     27128 Aug 12 13:55 libhdfs0_2.10.2-deb11-1_amd64.deb
-rw-r--r-- 1 btullis wikidev      8964 Aug 12 13:55 libhdfs0-dev_2.10.2-deb11-1_amd64.deb

btullis@apt1001:~/bigtop-bullseye/hadoop$ sudo -i reprepro -C thirdparty/bigtop15 --ignore=wrongdistribution include bullseye-wikimedia `pwd`/hadoop_2.10.2-deb11-1_amd64.changes
.changes put in a distribution not listed within it!
Ignoring as --ignore=wrongdistribution given.
Exporting indices...

I have finished building and adding these packages to reprepro, so I'm tentatively marking this ticket as done. We can begin testing the packages on bullseye hosts now.

Is anyone experimenting with these packages? I'm seeing some weird interactions -- specifically, installing these packages seems to make timesyncd fail in interesting ways:

Aug 15 22:14:22 clouddumps1002 systemd[85653]: systemd-timesyncd.service: Failed to set up mount namespacing: /run/systemd/unit-root/: Input/output error
Aug 15 22:14:22 clouddumps1002 systemd[85653]: systemd-timesyncd.service: Failed at step NAMESPACE spawning /lib/systemd/systemd-timesyncd: Input/output error

I'd be interested in knowing if this is somehow known or if anyone else has seen it. I don't know about hdfs to speculate as to why this would be happening.

Hm, could be related to the hdfs FUSE mount at /mnt/hdfs ?

Hm, could be related to the hdfs FUSE mount at /mnt/hdfs ?

removing that mount didn't seem to help but I will test more today.

Hm, could be related to the hdfs FUSE mount at /mnt/hdfs ?

removing that mount didn't seem to help but I will test more today.

Oops, I'm wrong -- I just now tried a simple 'umount /mnt/hdfs' and that seems to have fixed timesyncd. If that implies a fix then I'm not seeing it yet though.

Thanks @Andrew - I'm having a look at this issue with high priority, because I think we need to get to the bottom of it quickly, to unblock your work as much as anything.

I can confirm that it seems to be closely related to the /mnt/hdfs FUSE mount. Also, when I manually mount the HDFS directory again I see the following INFO messages.

btullis@clouddumps1002:~$ sudo mount -a
INFO ./hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/fuse-dfs/fuse_options.c:164 Adding FUSE arg /mnt/hdfs
INFO ./hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/fuse-dfs/fuse_options.c:115 Ignoring option allow_other
INFO ./hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/fuse-dfs/fuse_options.c:115 Ignoring option dev
INFO ./hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/fuse-dfs/fuse_options.c:115 Ignoring option suid

That's interesting because allow_other doesn't seem to be a valid build mount option, according to this:
https://cwiki.apache.org/confluence/display/HADOOP2/MountableHDFS

image.png (314×1 px, 39 KB)

We set it here: https://github.com/wikimedia/puppet/blob/production/modules/bigtop/manifests/hadoop/mount.pp#L28-L30

I'll check to see how it works on the previous hosts, to see if they report the same messages.

So far I think I have ascertained that it's not a change in behaviour of the hdfs-client itself, but rather a change in the way that namespaces are handled in the Linux kernel.

I'm making some progress by trying different systemd options in /lib/systemd/system/systemd-timesyncd.service but I haven't found a solution yet.

Change 824503 had a related patch set uploaded (by Btullis; author: Btullis):

[operations/puppet@production] Fix a conflict between hdfs-fuse and systemd-timesync on bullseye

https://gerrit.wikimedia.org/r/824503

I believe that I have a fix for this now. Having tried many different options in the systemd unit files, I was able to ascertain that a change to the ProtectSystem option from strict to full allowed the service to start.
Nothing else that I tried had any effect.

From the systemd.exec manual:

ProtectSystem=
Takes a boolean argument or the special values "full" or "strict". If true, mounts the /usr/ and the boot loader directories (/boot and /efi) read-only for processes invoked by this unit. If set to "full", the /etc/ directory is mounted read-only, too. If set to "strict" the entire file system hierarchy is mounted read-only, except for the API file system subtrees /dev/, /proc/ and /sys/ (protect these directories using PrivateDevices=, ProtectKernelTunables=, ProtectControlGroups=).

I think that this ultimately comes about due to the behaviour of the hdfs-fuse module and its interaction with kerberos. Even listing the mount point as root gives an error, whereas listing it with a user who has a keytab is fine.

root@clouddumps1002:/home/btullis# ls -l /mnt ; echo $?
ls: cannot access '/mnt/hdfs': Input/output error
total 0
d????????? ? ? ? ?            ? hdfs
1

root@clouddumps1002:/home/btullis# sudo -u dumpsgen ls -l /mnt ; echo $?
total 4
drwxr-xr-x 7 hdfs hadoop 4096 Jul 25  2018 hdfs
0

I can't see a way of changing this behaviour in hdfs-fuse and I think that the changes in namespace handling between the two kernels in buster and bullseye is what has made this error apparent.

So I've proposed a change to the timesyncd module to relax the ProtectSystem option, but only when using bullseye and only when /mnt/hdfs is also defined for that host.

Change 824527 had a related patch set uploaded (by Jbond; author: jbond):

[operations/puppet@production] P:systemd::timesyncd: allow overriding the protectsystem systemd param

https://gerrit.wikimedia.org/r/824527

Change 824503 abandoned by Btullis:

[operations/puppet@production] Fix a conflict between hdfs-fuse and systemd-timesync on bullseye

Reason:

See https://gerrit.wikimedia.org/r/c/operations/puppet/+/824527 instead.

https://gerrit.wikimedia.org/r/824503

Change 824527 merged by Jbond:

[operations/puppet@production] P:systemd::timesyncd: exclude /mnt from accessible paths

https://gerrit.wikimedia.org/r/824527

BTullis removed a project: Patch-For-Review.

This issue with /mnt/hdfs has now been resolved with thanks to @jbond and others. I'll therefore close this issue again. If any more compatibility issues arise we can create new tickets to track them.

There is a similar issue with logind, probably fixable with a similar patch to the one for timesyncd. T316123

Change 828526 had a related patch set uploaded (by Andrew Bogott; author: Andrew Bogott):

[operations/puppet@production] P:systemd::timedated: exclude /mnt from accessible paths

https://gerrit.wikimedia.org/r/828526

Change 828526 merged by Andrew Bogott:

[operations/puppet@production] P:systemd::timedated: exclude /mnt from accessible paths

https://gerrit.wikimedia.org/r/828526