Page MenuHomePhabricator

Test Alluxio as cache layer for Presto
Closed, ResolvedPublic

Description

In T256108 we were wondering if it was worth or not to co-locate Presto with Yarn on Hadoop worker nodes. The alternative would be to keep separate nodes, and use alluxio as caching layer for HDFS.

Alluxio is packaged in Bigtop so after the upgrade we could try to test it and see how it performs. The current Presto bottleneck is all the data moving in and out the workers from the HDFS ones (all network bound). Alluxio would alleviate the problem via hdfs caching on the Presto workers' RAM.

Event Timeline

fdans triaged this task as Medium priority.Oct 29 2020, 4:47 PM
fdans moved this task from Incoming to Operational Excellence on the Analytics board.

Nice video to keep in mind https://www.youtube.com/watch?v=Gs87KMFinYE&ab_channel=PrestoFoundation

Let's remember Presto scheduler's soft affinity when testing this, since if splits are assigned to random nodes then data will be (more often) fetched and replicated to the same Presto worker node by alluxio.

I rebuilt Alluxio with Docker using the bigtop1.5 repo as baseline, and applying https://github.com/apache/bigtop/pull/724 to get the 2.4.1 version.

https://docs.alluxio.io/os/user/stable/en/deploy/Running-Alluxio-On-a-Cluster.html
https://docs.alluxio.io/os/user/stable/en/deploy/Running-Alluxio-On-a-HA-Cluster.html

The above are good starting points, I'll try to find some time during the next days to play with Alluxio in Hadoop test.

To ruin all the initial fun, https://docs.alluxio.io/ee/user/2.4/en/operation/Kerberos-Security-Setup.html lists quite a few things to keep in mind to test alluxio :)

Change 673515 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] Add alluxio keytabs on Hadoop test

https://gerrit.wikimedia.org/r/673515

Change 673515 merged by Elukey:
[operations/puppet@production] Add alluxio keytabs on Hadoop test

https://gerrit.wikimedia.org/r/673515

Part of the complexity related to the security config is the fact that two users needs to be kerberized IIUC:

The alluxiohdfs user needs to be an HDFS superuser (like our hdfs) and a proxy. The alluxio cluster will use HDFS as shared storage space, and if needed it will use it to keep files in sync with HDFS. I think that the use case is needed to have multiple systems like S3/swift/ozone/etc.. and HDFS to be accessible via Alluxio at the same time. For example, in the HDFS case, if a user creates a file via Alluxio then alluxiohdfs will be able to do the same on the main HDFS file system, keeping things in sync. If this is the correct understanding, it scares me a little to grant alluxio all this power, but I'll have to research more.

Adding some notes collected in several meetings with Joseph during these months, plus related tasks.

The architecture that we have in mind for the Alluxio/Presto cluster is the following:

  • 10 worker nodes with tiered caching (RAM/SSDs/HDDs)
  • 2 master nodes running various daemons in HA (Job master to move/copy data around, Alluxio Master to manage the inodes, Alluxio Service Catalog)

Alluxio at a very high level is like having another HDFS cluster. The Alluxio cluster fetches datasets from HDFS and populates various tiered caches, and Presto runs on top of it (leveraging its scheduler to get the maximum data locality possible when running jobs). In order to do it, Alluxio offers the Catalog Service, a layer on top of the Hive Metastore that Presto leverages to understand where to schedule jobs on.

Last but not the least, in T286591 we are managing our hardware requirements for the new cluster.

Change 712974 had a related patch set uploaded (by Btullis; author: Btullis):

[operations/puppet@production] WIP: Begin work on the alluxio puppet classes

https://gerrit.wikimedia.org/r/712974

I was unaware of some of the limitations of the community edition of Alluxio, as opposed to the enterprise edition.
https://www.alluxio.io/editions/
Particularly relating to TLS and Kerberos.

image.png (316×1 px, 36 KB)

So we can't enable TLS between workers and masters. We can't authenticate users of Alluxio via Kerberos.
I think it may be OK, but in some ways it makes the configuration simpler, because a lot of the advanced parameters mentioned in this comment T266641#6928675 aren't available to us.

Change 713535 had a related patch set uploaded (by Btullis; author: Btullis):

[labs/private@master] Add a dummy SSH key pair for alluxio in the nest cluster

https://gerrit.wikimedia.org/r/713535

Change 713535 merged by Btullis:

[labs/private@master] Add a dummy SSH key pair for alluxio in the nest cluster

https://gerrit.wikimedia.org/r/713535

I have now got to the point where I believe the puppet patch that I've been working on needs to be merged in order to continue with my testing.
This is what the patch implements:

  1. Reserves the alluxio uid/gid and deploys the user to all hadoop cluster servers, including production.
  2. Defines global and cluster-specific alluxio settings in common.yaml
  3. Defines role-specific settings in their respective hiera files
  4. Applies the alluxio::master profile to the coordinator roles in the analytics test cluster
  5. Applies the alluxio::worker profile to the presto::server role in the analytics test cluster
  6. Allows alluxio masters to SSH into alluxio workers using a new key
  7. Opens required RPC firewall ports to the analytics networks
  8. Adds the alluxio user to the hadoop group, enabling impersonation of users.

The configuration isn't yet complete, but I don't believe that I can proceed with formatting the alluxio cluster in test until I have these minimum configuration elements in place.

Note that I have used the druid-analytics-test-eqiad zookeeper cluster, although that's supposed to be reserved for druid.
It was the only test zookeeper server that I had access to in the analytics network. Have filed T289056: Create analytics-test-eqiad zookeeper cluster to rectify that.

Looking more closely, I see that the sysvinit files that are provided with alluxio don't actually run the daemons as the alluxio system user, they run them as root.
It might be possible to patch them and there is even a SVC_USER variable placeholder, but it isn't used anywhere.

I think that I would probably prefer to create systemd unit files for them, but I might remove the service definitions from the patch and start alluxio by-hand as the system user until I know more.

I was unaware of some of the limitations of the community edition of Alluxio, as opposed to the enterprise edition.
https://www.alluxio.io/editions/
Particularly relating to TLS and Kerberos.

image.png (316×1 px, 36 KB)

So we can't enable TLS between workers and masters. We can't authenticate users of Alluxio via Kerberos.
I think it may be OK, but in some ways it makes the configuration simpler, because a lot of the advanced parameters mentioned in this comment T266641#6928675 aren't available to us.

Ooooof :( I was unaware of the editions, it is really unfortunate sigh :(
In my opinion this problem is a serious one, without Kerberos we'd allow Alluxio to query any data on HDFS without any control, bypassing completely any security check. It may be ok-ish for the Presto use case, since we control the hosts and we don't allow people to ssh to them, but still it may be an issue for other tools like Superset. What we do now is something like:

  1. User is passed to Superset via HTTP headers, set by httpd after mod_cas authenticates the user (via CAS/LDAP credentials)
  2. Superset runs a Kerberos keytab and the user can proxy other ones, so in turn it queries Presto on behalf of the user.
  3. Presto has a keytab and can query on behalf of users as well, so the request from the user is proxied to hdfs to fetch the relevant data (only users in analytics-privatedata-users can query)

With Alluxio and no kerberos authentication we'd have a problem, namely no HDFS checks to see if a user can effectively fetch data or not. For example, Superset users not in analytics-privatedata-users (POSIX group, even without ssh keys), would be able to see all PII/sensitive datasets (that, together with the absence of 2FA, is not great).

Another problem - people querying data with Presto from Analytics nodes wouldn't be authenticated as well (since Presto would not need to authenticate in Alluxio). True that we allow only analytics-privatedata-users on stat100x + we have ferm rules, but if any of these changes in the future we'd loose some security features.

I'm still trying to work through all of the pieces so I could be completely wrong, but I think that we might be OK.

For a start, alluxio also has the user impersonation feature. I've configured the masters with:

'alluxio.master.security.impersonation.alluxio_user.presto.groups': 'analytics-privatedata-users'

So if I understand it correctly, that should allow the presto user to impersonate anyone from the analytics-privatedata-users group.

I believe that presto is going to be our only user of alluxio and is already configured to impersonate either the POSIX user (on the stat100x boxes) or the user from HTTP if using Superset.

Then we create a new catalog in presto which is using the Alluxio Catalog Service https://docs.alluxio.io/os/user/stable/en/core-services/Catalog.html#enabling-the-alluxio-catalog-service-with-presto

Users on the stat100x boxes would just use this new catalog.
Instead of running:
presto --catalog analytics_hive
...they would run:
presto --catalog analytics_alluxio_hive

Similarly, users of Superset would just use a new datasource, with a different SQLAlchemy URI.
Instead of this URI:
presto://an-coord1001.eqiad.wmnet:8281/analytics_hive?protocol=https
that URI would be:
presto://an-coord1001.eqiad.wmnet:8281/analytics_alluxio_hive?protocol=https

Instead of the three hops that you have outlined above for a Superset user connection to Hive data, there would be four:

  1. User is passed to Superset via HTTP headers, set by httpd after mod_cas authenticates the user (via CAS/LDAP credentials)
  2. Superset runs a Kerberos keytab and the user can proxy other ones, so in turn it queries Presto on behalf of the user.
  3. Presto can query on behalf of users, so the request from the user is proxied to alluxio to fetch the relevant data
  4. Alluxio has a keytab and can proxy to users in the analytics-privatedata-users group, fetching permitted data from its cache.

We don't allow any access to the alluxio service, other than through the presto servers or the coordinator with which they are co-located.

An important point to note is that this method of using the service catalog only works with read-only access. We can't write back to Hive through this method.
https://docs.alluxio.io/os/user/stable/en/compute/Presto.html#using-presto-with-the-alluxio-catalog-service

There is a different method, where we would modify the Hive metastore itself and create a table backed by Alluxio (https://docs.alluxio.io/os/user/stable/en/compute/Presto.html#create-a-hive-table-on-alluxio) but I don't think that we need to explore that option at the moment. Using the catalog service for read-only caching seems a much simpler approach to me.

I was unaware of some of the limitations of the community edition of Alluxio, as opposed to the enterprise edition.
https://www.alluxio.io/editions/
Particularly relating to TLS and Kerberos.

image.png (316×1 px, 36 KB)

So we can't enable TLS between workers and masters. We can't authenticate users of Alluxio via Kerberos.
I think it may be OK, but in some ways it makes the configuration simpler, because a lot of the advanced parameters mentioned in this comment T266641#6928675 aren't available to us.

Wow this is an important find! I had completely missed it, I'm very sorry for this :(

There is something that I don't understand about the licenses though. https://www.alluxio.io/pricing/ seems to point out the following:

Alluxio Community Edition, based on the open source project, is always free and comes with community forum support. For those looking for more, we offer Enterprise Edition Subscriptions, which include both software and SLA-based technical support.

https://github.com/Alluxio/alluxio/blob/master/LICENSE also points to Apache 2.0, and I don't see any mention of community vs enterprise. Now I am wondering if the "editions" that are mentioned in the Alluxio website are related to package distribution that they maintain, rather than the whole open source project. What I mean is that since we rely on the Bigtop's packaging we are not really subject to the restrictions, since all code works with Apache 2.0. We should follow up on this, it seems pretty crucial to understand. One of the main issues that we had in the past has been dealing with weird licenses, that were not open source ones and hence incompatible with our guidelines. So if Alluxio allows us to use a single user for kerberos proxying (like presto) it may work for our use case, but the overall license wouldn't.

It looks like the open source version has got a hard-coded incompatibility with the Kerberos security method:

image.png (242×688 px, 44 KB)

Also here:

image.png (132×768 px, 13 KB)

So, presumably, the Alluxio upstream version is not Apache 2.0 licensed, but has these options fully enabled.
It looks like we're going to have to try to make the single-user mechanism work for us, or look elsewhere for the performance improvements.

There is some additional information about caching for presto here: https://www.alluxio.io/blog/top-5-performance-tuning-tips-for-running-presto-on-alluxio-1/

... when Presto is running on collocated Alluxio service, it is possible that Alluxio can cache the input data local to Presto workers and serve it at memory-speed for the next retrieve.
In this case, Presto can leverage Alluxio to read from the local Alluxio worker storage (termed as short-circuit read) without any additional network transfer. As a result, to maximize input throughput, users should make sure task locality and Alluxio short circuit read are achieved.

We want to ensure that we get a high value for Short-circuit Read and a low value for From Remote Instances in the metrics UI. If this is not the case then our locality aware scheduling settings aren't correct.

Set node-scheduler.network-topology=flat in config.properties and set hive.force-local-scheduling=true in catalog/hive.properties

There are various other concurrency tuning parameters mentioned on that page, such as task.max-worker-threads, task.concurrency, and node-scheduler.max-splits-per-node to try to make sure that CPUs are effectively saturated when running tasks, but I don't think we need to look at these yet. We'll need to look seriously at them when promoting this setup to production.

I've updated the patch with three further changes that I believe are required to get the impersonation working the way we need.

  1. In order to achieve the client-side hadoop impersonation, I have configured the following in /etc/alluxio/conf/alluxio-site.properties

alluxio.security.login.impersonation.username=_HDFS_USER_

From here: https://docs.alluxio.io/os/user/2.4/en/operation/Security.html#client-configuration

If the property is set to an empty string or _NONE_, impersonation is disabled, and the Alluxio client will interact with Alluxio servers as the Alluxio client user.
If the property is set to _HDFS_USER_, the Alluxio client will connect to Alluxio servers as the Alluxio client user, but impersonate as the Hadoop client user when using the Hadoop compatible client.

In our case the hadoop compatible client that we're using is that shipped with presto: /usr/lib/presto/plugin/hive-hadoop2/alluxio-shaded-client-2.4.1-1.jar

  1. I've set alluxio.master.security.impersonation.presto.users': '*'

I see now that we are OK to impersonate any user, not just those in analytics-privatedata-users.

  1. I've also added the following to /etc/hadoop/conf/core-site.xml in order to allow the alluxio user to impersonate other users.
<!-- Alluxio proxy user -->
<property>
  <name>hadoop.proxyuser.alluxio.hosts</name>
  <value>*</value>
</property>
<property>
  <name>hadoop.proxyuser.alluxio.groups</name>
  <value>*</value>
</property>

As per the discussion on the patch, I too am not keen to set up passwordless SSH access from masters to workers for the alluxio user unless it is absolutely required.
Therefore I've reached out on the Alluxio slack workspace to ask forother people's experiences of running Alluxio clusters without SSH.

image.png (321×1 px, 85 KB)

For now, I'll remove the SSH parts from the patch so that we can continue our own testing without SSH.

Change 712974 merged by Btullis:

[operations/puppet@production] Install Alluxio to the test cluster

https://gerrit.wikimedia.org/r/712974

Change 724407 had a related patch set uploaded (by Btullis; author: Btullis):

[operations/puppet@production] Fix the ferm configuration for alluxio workers

https://gerrit.wikimedia.org/r/724407

Change 724407 merged by Btullis:

[operations/puppet@production] Fix the ferm configuration for alluxio workers

https://gerrit.wikimedia.org/r/724407

I have started running the master process manually as the alluxio user on an-test-coord1001 with the following command.

alluxio@an-test-coord1001:~$ kerberos-run-command alluxio /usr/lib/jvm/java-1.8.0-openjdk-amd64/bin/java -cp /usr/lib/alluxio/conf/::/usr/lib/alluxio/assembly/server/target/alluxio-assembly-server-2.4.1-jar-with-dependencies.jar -Dalluxio.home=/usr/lib/alluxio -Dalluxio.conf.dir=/usr/lib/alluxio/conf -Dalluxio.logs.dir=/usr/lib/alluxio/logs -Dalluxio.user.logs.dir=/usr/lib/alluxio/logs/user -Dlog4j.configuration=file:/usr/lib/alluxio/conf/log4j.properties -Dorg.apache.jasper.compiler.disablejsr199=true -Djava.net.preferIPv4Stack=true -Dorg.apache.ratis.thirdparty.io.netty.allocator.useCacheForAllThreads=false -Dalluxio.logger.type=MASTER_LOGGER -Dalluxio.master.audit.logger.type=MASTER_AUDIT_LOGGER -Xmx8g -XX:MetaspaceSize=256M alluxio.master.AlluxioMaster

Currently investigating a permissions error, which I believe to be on the local file system, as opposed to HDFS.

I have instead been running the commands with:

an-test-coord1001: bash -x /usr/lib/alluxio/bin/alluxio-start.sh master
an-test-presto1001: bash -x /usr/lib/alluxio/bin/alluxio-start.sh worker SudoMount

I have temporarily disabled puppet on an-test-presto1001 so that I can test a required sudoers entry:

alluxio ALL=(ALL) NOPASSWD: /bin/mount * /mnt/ramdisk, /bin/umount */mnt/ramdisk, /bin/mkdir * /mnt/ramdisk, /bin/chmod * /mnt/ramdisk

I have formatted the master with the following command as the alluxio user on an-test-coord1001: alluxio formatMasters
The output was as shown below.

I'll keep looking into the warnings, as I had thought that at least the native library would have been found.

alluxio@an-test-coord1001:/etc/alluxio/conf$ alluxio formatMasters
Formatting Alluxio Master @ an-test-coord1001.eqiad.wmnet
2021-09-30 13:10:55,807 INFO  Format - Formatting master journal: hdfs://analytics-test-hadoop/wmf/alluxio/journal/
2021-09-30 13:10:55,845 INFO  ExtensionFactoryRegistry - Loading core jars from /usr/lib/alluxio/lib
2021-09-30 13:10:55,881 INFO  ExtensionFactoryRegistry - Loading extension jars from /usr/lib/alluxio/extensions
2021-09-30 13:10:55,985 WARN  HdfsUnderFileSystem - Cannot create SupportedHdfsAclProvider. HDFS ACLs will not be supported.
2021-09-30 13:10:56,051 WARN  HdfsUnderFileSystem - Cannot create SupportedHdfsActiveSyncProvider.HDFS ActiveSync will not be supported.
2021-09-30 13:10:56,065 INFO  ExtensionFactoryRegistry - Loading core jars from /usr/lib/alluxio/lib
2021-09-30 13:10:56,076 INFO  ExtensionFactoryRegistry - Loading extension jars from /usr/lib/alluxio/extensions
2021-09-30 13:10:56,156 WARN  HdfsUnderFileSystem - Cannot create SupportedHdfsAclProvider. HDFS ACLs will not be supported.
2021-09-30 13:10:56,174 WARN  NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2021-09-30 13:10:56,186 WARN  HdfsUnderFileSystem - Cannot create SupportedHdfsActiveSyncProvider.HDFS ActiveSync will not be supported.
2021-09-30 13:10:56,188 INFO  ExtensionFactoryRegistry - Loading core jars from /usr/lib/alluxio/lib
2021-09-30 13:10:56,198 INFO  ExtensionFactoryRegistry - Loading extension jars from /usr/lib/alluxio/extensions
2021-09-30 13:10:56,276 WARN  HdfsUnderFileSystem - Cannot create SupportedHdfsAclProvider. HDFS ACLs will not be supported.
2021-09-30 13:10:56,295 WARN  NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2021-09-30 13:10:56,306 WARN  HdfsUnderFileSystem - Cannot create SupportedHdfsActiveSyncProvider.HDFS ActiveSync will not be supported.
2021-09-30 13:10:56,308 INFO  ExtensionFactoryRegistry - Loading core jars from /usr/lib/alluxio/lib
2021-09-30 13:10:56,319 INFO  ExtensionFactoryRegistry - Loading extension jars from /usr/lib/alluxio/extensions
2021-09-30 13:10:56,397 WARN  HdfsUnderFileSystem - Cannot create SupportedHdfsAclProvider. HDFS ACLs will not be supported.
2021-09-30 13:10:56,414 WARN  NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2021-09-30 13:10:56,425 WARN  HdfsUnderFileSystem - Cannot create SupportedHdfsActiveSyncProvider.HDFS ActiveSync will not be supported.
2021-09-30 13:10:56,427 INFO  ExtensionFactoryRegistry - Loading core jars from /usr/lib/alluxio/lib
2021-09-30 13:10:56,436 INFO  ExtensionFactoryRegistry - Loading extension jars from /usr/lib/alluxio/extensions
2021-09-30 13:10:56,583 WARN  HdfsUnderFileSystem - Cannot create SupportedHdfsAclProvider. HDFS ACLs will not be supported.
2021-09-30 13:10:56,603 WARN  NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2021-09-30 13:10:56,614 WARN  HdfsUnderFileSystem - Cannot create SupportedHdfsActiveSyncProvider.HDFS ActiveSync will not be supported.
2021-09-30 13:10:56,614 INFO  UfsJournal - Formatting hdfs://analytics-test-hadoop/wmf/alluxio/journal/BlockMaster/v1
2021-09-30 13:10:57,297 INFO  UfsJournal - Formatting hdfs://analytics-test-hadoop/wmf/alluxio/journal/TableMaster/v1
2021-09-30 13:10:57,789 INFO  UfsJournal - Formatting hdfs://analytics-test-hadoop/wmf/alluxio/journal/FileSystemMaster/v1
2021-09-30 13:10:58,246 INFO  UfsJournal - Formatting hdfs://analytics-test-hadoop/wmf/alluxio/journal/MetaMaster/v1
2021-09-30 13:10:58,791 INFO  UfsJournal - Formatting hdfs://analytics-test-hadoop/wmf/alluxio/journal/MetricsMaster/v1
2021-09-30 13:10:59,212 INFO  Format - Formatting complete

I was able to execute the alluxio runTests command:

alluxio@an-test-coord1001:/etc/alluxio/conf$ alluxio runTests
2021-09-30 13:33:08,712 INFO  ZkMasterInquireClient - Creating new zookeeper client for zk@an-test-druid1001.eqiad.wmnet/alluxio/leader
2021-09-30 13:33:08,749 INFO  ZooKeeper - Client environment:zookeeper.version=3.5.8-f439ca583e70862c3068a1f2a7d4d068eec33315, built on 05/04/2020 15:53 GMT
2021-09-30 13:33:08,749 INFO  ZooKeeper - Client environment:host.name=an-test-coord1001.eqiad.wmnet
2021-09-30 13:33:08,749 INFO  ZooKeeper - Client environment:java.version=1.8.0_302
2021-09-30 13:33:08,749 INFO  ZooKeeper - Client environment:java.vendor=Oracle Corporation
2021-09-30 13:33:08,749 INFO  ZooKeeper - Client environment:java.home=/usr/lib/jvm/java-8-openjdk-amd64/jre
2021-09-30 13:33:08,749 INFO  ZooKeeper - Client environment:java.class.path=/usr/lib/alluxio/conf/::/usr/lib/alluxio/assembly/client/target/alluxio-assembly-client-2.4.1-jar-with-dependencies.jar:/usr/lib/alluxio/lib/alluxio-integration-tools-validation-2.4.1.jar
2021-09-30 13:33:08,749 INFO  ZooKeeper - Client environment:java.library.path=/usr/lib/hadoop/lib/native/
2021-09-30 13:33:08,749 INFO  ZooKeeper - Client environment:java.io.tmpdir=/tmp
2021-09-30 13:33:08,749 INFO  ZooKeeper - Client environment:java.compiler=<NA>
2021-09-30 13:33:08,749 INFO  ZooKeeper - Client environment:os.name=Linux
2021-09-30 13:33:08,749 INFO  ZooKeeper - Client environment:os.arch=amd64
2021-09-30 13:33:08,749 INFO  ZooKeeper - Client environment:os.version=4.19.0-16-amd64
2021-09-30 13:33:08,749 INFO  ZooKeeper - Client environment:user.name=alluxio
2021-09-30 13:33:08,750 INFO  ZooKeeper - Client environment:user.home=/var/lib/alluxio
2021-09-30 13:33:08,750 INFO  ZooKeeper - Client environment:user.dir=/etc/alluxio/conf.analytics-test-hadoop
2021-09-30 13:33:08,750 INFO  ZooKeeper - Client environment:os.memory.free=1856MB
2021-09-30 13:33:08,750 INFO  ZooKeeper - Client environment:os.memory.max=27305MB
2021-09-30 13:33:08,750 INFO  ZooKeeper - Client environment:os.memory.total=1926MB
2021-09-30 13:33:08,750 INFO  Compatibility - Using emulated InjectSessionExpiration
2021-09-30 13:33:08,797 INFO  TieredIdentityFactory - Initialized tiered identity TieredIdentity(node=an-test-coord1001.eqiad.wmnet, rack=null)
2021-09-30 13:33:08,812 INFO  CuratorFrameworkImpl - Starting
2021-09-30 13:33:08,816 INFO  X509Util - Setting -D jdk.tls.rejectClientInitiatedRenegotiation=true to disable client-initiated TLS renegotiation
2021-09-30 13:33:08,818 INFO  ZooKeeper - Initiating client connection, connectString=an-test-druid1001.eqiad.wmnet sessionTimeout=60000 watcher=org.apache.curator.ConnectionState@4ae3c1cd
2021-09-30 13:33:08,821 INFO  ClientCnxnSocket - jute.maxbuffer value is 4194304 Bytes
2021-09-30 13:33:08,826 INFO  ClientCnxn - zookeeper.request.timeout value is 0. feature enabled=
2021-09-30 13:33:08,832 INFO  CuratorFrameworkImpl - Default schema
2021-09-30 13:33:08,837 INFO  ClientCnxn - Opening socket connection to server an-test-druid1001.eqiad.wmnet/10.64.53.6:2181. Will not attempt to authenticate using SASL (unknown error)
2021-09-30 13:33:08,843 INFO  ClientCnxn - Socket connection established, initiating session, client: /10.64.53.41:32994, server: an-test-druid1001.eqiad.wmnet/10.64.53.6:2181
2021-09-30 13:33:08,851 INFO  ClientCnxn - Session establishment complete on server an-test-druid1001.eqiad.wmnet/10.64.53.6:2181, sessionid = 0x1036c6f602300c4, negotiated timeout = 40000
2021-09-30 13:33:08,861 INFO  ConnectionStateManager - State change: CONNECTED
2021-09-30 13:33:08,981 INFO  NettyUtils - EPOLL_MODE is available
runTest --operation BASIC --readType CACHE_PROMOTE --writeType MUST_CACHE
2021-09-30 13:33:10,420 INFO  BasicOperations - writeFile to file /default_tests_files/BASIC_CACHE_PROMOTE_MUST_CACHE took 701 ms.
2021-09-30 13:33:10,524 INFO  BasicOperations - readFile file /default_tests_files/BASIC_CACHE_PROMOTE_MUST_CACHE took 104 ms.
Passed the test!
runTest --operation BASIC_NON_BYTE_BUFFER --readType CACHE_PROMOTE --writeType MUST_CACHE
2021-09-30 13:33:10,582 INFO  BasicNonByteBufferOperations - writeFile to file /default_tests_files/BASIC_NON_BYTE_BUFFER_CACHE_PROMOTE_MUST_CACHE took 40 ms.
2021-09-30 13:33:10,604 INFO  BasicNonByteBufferOperations - readFile file /default_tests_files/BASIC_NON_BYTE_BUFFER_CACHE_PROMOTE_MUST_CACHE took 21 ms.
Passed the test!
runTest --operation BASIC --readType CACHE_PROMOTE --writeType CACHE_THROUGH
2021-09-30 13:33:12,241 INFO  BasicOperations - writeFile to file /default_tests_files/BASIC_CACHE_PROMOTE_CACHE_THROUGH took 1637 ms.
2021-09-30 13:33:12,263 INFO  BasicOperations - readFile file /default_tests_files/BASIC_CACHE_PROMOTE_CACHE_THROUGH took 22 ms.
Passed the test!
runTest --operation BASIC_NON_BYTE_BUFFER --readType CACHE_PROMOTE --writeType CACHE_THROUGH
2021-09-30 13:33:12,445 INFO  BasicNonByteBufferOperations - writeFile to file /default_tests_files/BASIC_NON_BYTE_BUFFER_CACHE_PROMOTE_CACHE_THROUGH took 155 ms.
2021-09-30 13:33:12,462 INFO  BasicNonByteBufferOperations - readFile file /default_tests_files/BASIC_NON_BYTE_BUFFER_CACHE_PROMOTE_CACHE_THROUGH took 17 ms.
Passed the test!
runTest --operation BASIC --readType CACHE_PROMOTE --writeType THROUGH
2021-09-30 13:33:12,589 INFO  BasicOperations - writeFile to file /default_tests_files/BASIC_CACHE_PROMOTE_THROUGH took 127 ms.
2021-09-30 13:33:12,697 INFO  BasicOperations - readFile file /default_tests_files/BASIC_CACHE_PROMOTE_THROUGH took 108 ms.
Passed the test!
runTest --operation BASIC_NON_BYTE_BUFFER --readType CACHE_PROMOTE --writeType THROUGH
2021-09-30 13:33:12,841 INFO  BasicNonByteBufferOperations - writeFile to file /default_tests_files/BASIC_NON_BYTE_BUFFER_CACHE_PROMOTE_THROUGH took 117 ms.
2021-09-30 13:33:12,883 INFO  BasicNonByteBufferOperations - readFile file /default_tests_files/BASIC_NON_BYTE_BUFFER_CACHE_PROMOTE_THROUGH took 41 ms.
Passed the test!
runTest --operation BASIC --readType CACHE_PROMOTE --writeType ASYNC_THROUGH
2021-09-30 13:33:12,945 INFO  BasicOperations - writeFile to file /default_tests_files/BASIC_CACHE_PROMOTE_ASYNC_THROUGH took 62 ms.
2021-09-30 13:33:12,959 INFO  BasicOperations - readFile file /default_tests_files/BASIC_CACHE_PROMOTE_ASYNC_THROUGH took 13 ms.
Passed the test!
runTest --operation BASIC_NON_BYTE_BUFFER --readType CACHE_PROMOTE --writeType ASYNC_THROUGH
2021-09-30 13:33:13,007 INFO  BasicNonByteBufferOperations - writeFile to file /default_tests_files/BASIC_NON_BYTE_BUFFER_CACHE_PROMOTE_ASYNC_THROUGH took 29 ms.
2021-09-30 13:33:13,023 INFO  BasicNonByteBufferOperations - readFile file /default_tests_files/BASIC_NON_BYTE_BUFFER_CACHE_PROMOTE_ASYNC_THROUGH took 15 ms.
Passed the test!
runTest --operation BASIC --readType CACHE --writeType MUST_CACHE
2021-09-30 13:33:13,100 INFO  BasicOperations - writeFile to file /default_tests_files/BASIC_CACHE_MUST_CACHE took 76 ms.
2021-09-30 13:33:13,113 INFO  BasicOperations - readFile file /default_tests_files/BASIC_CACHE_MUST_CACHE took 13 ms.
Passed the test!
runTest --operation BASIC_NON_BYTE_BUFFER --readType CACHE --writeType MUST_CACHE
2021-09-30 13:33:13,226 INFO  BasicNonByteBufferOperations - writeFile to file /default_tests_files/BASIC_NON_BYTE_BUFFER_CACHE_MUST_CACHE took 94 ms.
2021-09-30 13:33:13,237 INFO  BasicNonByteBufferOperations - readFile file /default_tests_files/BASIC_NON_BYTE_BUFFER_CACHE_MUST_CACHE took 11 ms.
Passed the test!
runTest --operation BASIC --readType CACHE --writeType CACHE_THROUGH
2021-09-30 13:33:13,357 INFO  BasicOperations - writeFile to file /default_tests_files/BASIC_CACHE_CACHE_THROUGH took 119 ms.
2021-09-30 13:33:13,369 INFO  BasicOperations - readFile file /default_tests_files/BASIC_CACHE_CACHE_THROUGH took 12 ms.
Passed the test!
runTest --operation BASIC_NON_BYTE_BUFFER --readType CACHE --writeType CACHE_THROUGH
2021-09-30 13:33:13,537 INFO  BasicNonByteBufferOperations - writeFile to file /default_tests_files/BASIC_NON_BYTE_BUFFER_CACHE_CACHE_THROUGH took 144 ms.
2021-09-30 13:33:13,549 INFO  BasicNonByteBufferOperations - readFile file /default_tests_files/BASIC_NON_BYTE_BUFFER_CACHE_CACHE_THROUGH took 12 ms.
Passed the test!
runTest --operation BASIC --readType CACHE --writeType THROUGH
2021-09-30 13:33:13,652 INFO  BasicOperations - writeFile to file /default_tests_files/BASIC_CACHE_THROUGH took 102 ms.
2021-09-30 13:33:13,686 INFO  BasicOperations - readFile file /default_tests_files/BASIC_CACHE_THROUGH took 34 ms.
Passed the test!
runTest --operation BASIC_NON_BYTE_BUFFER --readType CACHE --writeType THROUGH
2021-09-30 13:33:13,806 INFO  BasicNonByteBufferOperations - writeFile to file /default_tests_files/BASIC_NON_BYTE_BUFFER_CACHE_THROUGH took 99 ms.
2021-09-30 13:33:13,835 INFO  BasicNonByteBufferOperations - readFile file /default_tests_files/BASIC_NON_BYTE_BUFFER_CACHE_THROUGH took 29 ms.
Passed the test!
runTest --operation BASIC --readType CACHE --writeType ASYNC_THROUGH
2021-09-30 13:33:13,871 INFO  BasicOperations - writeFile to file /default_tests_files/BASIC_CACHE_ASYNC_THROUGH took 36 ms.
2021-09-30 13:33:13,884 INFO  BasicOperations - readFile file /default_tests_files/BASIC_CACHE_ASYNC_THROUGH took 12 ms.
Passed the test!
runTest --operation BASIC_NON_BYTE_BUFFER --readType CACHE --writeType ASYNC_THROUGH
2021-09-30 13:33:13,973 INFO  BasicNonByteBufferOperations - writeFile to file /default_tests_files/BASIC_NON_BYTE_BUFFER_CACHE_ASYNC_THROUGH took 71 ms.
2021-09-30 13:33:13,984 INFO  BasicNonByteBufferOperations - readFile file /default_tests_files/BASIC_NON_BYTE_BUFFER_CACHE_ASYNC_THROUGH took 11 ms.
Passed the test!
runTest --operation BASIC --readType NO_CACHE --writeType MUST_CACHE
2021-09-30 13:33:14,018 INFO  BasicOperations - writeFile to file /default_tests_files/BASIC_NO_CACHE_MUST_CACHE took 34 ms.
2021-09-30 13:33:14,029 INFO  BasicOperations - readFile file /default_tests_files/BASIC_NO_CACHE_MUST_CACHE took 11 ms.
Passed the test!
runTest --operation BASIC_NON_BYTE_BUFFER --readType NO_CACHE --writeType MUST_CACHE
2021-09-30 13:33:14,128 INFO  BasicNonByteBufferOperations - writeFile to file /default_tests_files/BASIC_NON_BYTE_BUFFER_NO_CACHE_MUST_CACHE took 83 ms.
2021-09-30 13:33:14,138 INFO  BasicNonByteBufferOperations - readFile file /default_tests_files/BASIC_NON_BYTE_BUFFER_NO_CACHE_MUST_CACHE took 10 ms.
Passed the test!
runTest --operation BASIC --readType NO_CACHE --writeType CACHE_THROUGH
2021-09-30 13:33:14,278 INFO  BasicOperations - writeFile to file /default_tests_files/BASIC_NO_CACHE_CACHE_THROUGH took 140 ms.
2021-09-30 13:33:14,288 INFO  BasicOperations - readFile file /default_tests_files/BASIC_NO_CACHE_CACHE_THROUGH took 9 ms.
Passed the test!
runTest --operation BASIC_NON_BYTE_BUFFER --readType NO_CACHE --writeType CACHE_THROUGH
2021-09-30 13:33:14,425 INFO  BasicNonByteBufferOperations - writeFile to file /default_tests_files/BASIC_NON_BYTE_BUFFER_NO_CACHE_CACHE_THROUGH took 119 ms.
2021-09-30 13:33:14,435 INFO  BasicNonByteBufferOperations - readFile file /default_tests_files/BASIC_NON_BYTE_BUFFER_NO_CACHE_CACHE_THROUGH took 10 ms.
Passed the test!
runTest --operation BASIC --readType NO_CACHE --writeType THROUGH
2021-09-30 13:33:14,557 INFO  BasicOperations - writeFile to file /default_tests_files/BASIC_NO_CACHE_THROUGH took 122 ms.
2021-09-30 13:33:14,579 INFO  BasicOperations - readFile file /default_tests_files/BASIC_NO_CACHE_THROUGH took 22 ms.
Passed the test!
runTest --operation BASIC_NON_BYTE_BUFFER --readType NO_CACHE --writeType THROUGH
2021-09-30 13:33:14,690 INFO  BasicNonByteBufferOperations - writeFile to file /default_tests_files/BASIC_NON_BYTE_BUFFER_NO_CACHE_THROUGH took 92 ms.
2021-09-30 13:33:14,715 INFO  BasicNonByteBufferOperations - readFile file /default_tests_files/BASIC_NON_BYTE_BUFFER_NO_CACHE_THROUGH took 25 ms.
Passed the test!
runTest --operation BASIC --readType NO_CACHE --writeType ASYNC_THROUGH
2021-09-30 13:33:14,749 INFO  BasicOperations - writeFile to file /default_tests_files/BASIC_NO_CACHE_ASYNC_THROUGH took 34 ms.
2021-09-30 13:33:14,757 INFO  BasicOperations - readFile file /default_tests_files/BASIC_NO_CACHE_ASYNC_THROUGH took 8 ms.
Passed the test!
runTest --operation BASIC_NON_BYTE_BUFFER --readType NO_CACHE --writeType ASYNC_THROUGH
2021-09-30 13:33:14,797 INFO  BasicNonByteBufferOperations - writeFile to file /default_tests_files/BASIC_NON_BYTE_BUFFER_NO_CACHE_ASYNC_THROUGH took 26 ms.
2021-09-30 13:33:14,805 INFO  BasicNonByteBufferOperations - readFile file /default_tests_files/BASIC_NON_BYTE_BUFFER_NO_CACHE_ASYNC_THROUGH took 8 ms.
Passed the test!

Also added the job_master process on the master.

bash -x /usr/lib/alluxio/bin/alluxio-start.sh job_master

There is currently an issue with running these scripts, where it is asking for a password. I'm not yet sure what it is, but i think it's from kinit somewhere. It's not from sudo
The process runs even without entering a password, but the tty is still captured so I'm still investigating this.

I have made some more progress on this, but it is still fairly slow.
Firstly, I have tried the vanilla download of Alluxio 2.6.2 instead of our packaged version.
It would appear that our version somehow omitted the /webui directory so there was no web interface on the master's port 19999

I'm currently working with an extracted tarball in /home/btullis/alluxio-2.6.2/ with conf symlinked to /etc/alluxio/conf and logs symlinked to /var/log/alluxio
I also had to create a metadata directory in the extracted directory and configure this to be ownder by alluxio:alluxio - This is beause version 2.6.2 uses a RocksDB instance for its metastore: https://docs.alluxio.io/os/user/stable/en/operation/Metastore.html

The commands I am using to start and stop the services are as follows:

an-test-coord1001

Start the Master
bin/alluxio-start.sh -a master

Start the Job Master
bin/alluxio-start -a job_master

Stop the master
bin/alluxio-stop.sh master

Stop the Job Master
bin/alluxio-stop.sh job_master

an-test-presto1001

Start the Worker
bin/alluxio-start.sh -a worker SudoMount

Start the Job Worker
bin/alluxio-start -a job_worker

Stop the Worker
bin/alluxio-stop.sh worker

Stop the Job Worker
bin/alluxio-stop.sh job_worker

Next I am trying to attach a hive database as an UDB. Following these instructions: https://docs.alluxio.io/os/user/stable/en/core-services/Catalog.html#attaching-databases

However, it's not yet working.

alluxio@an-test-presto1001:/home/btullis/alluxio-2.6.2$ bin/alluxio table attachdb --db alluxio_event hive thrift://analytics-test-hive.eqiad.wmnet:9083 event_sanitized
Failed to connect underDb for Alluxio db 'alluxio_event': Failed to get hive database event_sanitized. null

I have posted several more questions to the Slack workspace for Alluxio. I have a feeling that the error above is related to Kerberos, since we have the HDFS with Kerberos working, but we have not specified a keytab to use for Hive access.

Here is the full stacktrace from the master.log file for this operation.

2021-10-06 13:59:05,016 ERROR AlluxioCatalog - Sync (during attach) failed for db 'alluxio_event'.
java.io.IOException: Failed to get hive database default. null
	at alluxio.table.under.hive.HiveDatabase.getDatabaseInfo(HiveDatabase.java:137)
	at alluxio.master.table.Database.sync(Database.java:226)
	at alluxio.master.table.AlluxioCatalog.attachDatabase(AlluxioCatalog.java:126)
	at alluxio.master.table.DefaultTableMaster.attachDatabase(DefaultTableMaster.java:85)
	at alluxio.master.table.TableMasterClientServiceHandler.lambda$attachDatabase$0(TableMasterClientServiceHandler.java:74)
	at alluxio.RpcUtils.callAndReturn(RpcUtils.java:121)
	at alluxio.RpcUtils.call(RpcUtils.java:83)
	at alluxio.RpcUtils.call(RpcUtils.java:58)
	at alluxio.master.table.TableMasterClientServiceHandler.attachDatabase(TableMasterClientServiceHandler.java:72)
	at alluxio.grpc.table.TableMasterClientServiceGrpc$MethodHandlers.invoke(TableMasterClientServiceGrpc.java:1135)
	at io.grpc.stub.ServerCalls$UnaryServerCallHandler$UnaryServerCallListener.onHalfClose(ServerCalls.java:182)
	at io.grpc.PartialForwardingServerCallListener.onHalfClose(PartialForwardingServerCallListener.java:35)
	at io.grpc.ForwardingServerCallListener.onHalfClose(ForwardingServerCallListener.java:23)
	at io.grpc.ForwardingServerCallListener$SimpleForwardingServerCallListener.onHalfClose(ForwardingServerCallListener.java:40)
	at alluxio.security.authentication.AuthenticatedUserInjector$1.onHalfClose(AuthenticatedUserInjector.java:67)
	at io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.halfClosed(ServerCallImpl.java:331)
	at io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1HalfClosed.runInContext(ServerImpl.java:797)
	at io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
	at io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)
	at alluxio.concurrent.jsr.ForkJoinTask$RunnableExecuteAction.exec(ForkJoinTask.java:1378)
	at alluxio.concurrent.jsr.ForkJoinTask.doExec(ForkJoinTask.java:609)
	at alluxio.concurrent.jsr.ForkJoinPool.runWorker(ForkJoinPool.java:1356)
	at alluxio.concurrent.jsr.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:131)
Caused by: org.apache.thrift.transport.TTransportException
	at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)
	at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
	at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:429)
	at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:318)
	at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:219)
	at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:77)
	at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_database(ThriftHiveMetastore.java:782)
	at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_database(ThriftHiveMetastore.java:769)
	at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getDatabase(HiveMetaStoreClient.java:1290)
	at sun.reflect.GeneratedMethodAccessor46.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:169)
	at com.sun.proxy.$Proxy74.getDatabase(Unknown Source)
	at sun.reflect.GeneratedMethodAccessor46.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at alluxio.table.under.hive.util.CompatibleMetastoreClient.invoke(CompatibleMetastoreClient.java:39)
	at com.sun.proxy.$Proxy74.getDatabase(Unknown Source)
	at alluxio.table.under.hive.HiveDatabase.getDatabaseInfo(HiveDatabase.java:128)
	... 22 more
2021-10-06 13:59:05,040 WARN  TableMasterClientServiceHandler - Exit (Error): attachDatabase: , Error=java.io.IOException: Failed to connect underDb for Alluxio db 'alluxio_event': Failed to get hive database default. null

This is also an interesting error. We have generated a keytab for each host that is to access HDFS, but this states that the configuration item should be identical. I don't know whether this is causing an issue or just a warning.

alluxio@an-test-coord1001:/home/btullis/alluxio-2.6.2$ bin/alluxio fsadmin doctor
Server-side configuration errors (those properties are required to be identical): 
key: alluxio.master.mount.table.root.option.alluxio.security.underfs.hdfs.kerberos.client.principal
    value: alluxio/an-test-coord1001.eqiad.wmnet@WIKIMEDIA (an-test-coord1001.eqiad.wmnet:19998)
    value: alluxio/an-test-presto1001.eqiad.wmnet@WIKIMEDIA (an-test-presto1001.eqiad.wmnet:29999)
All worker storage paths are in working state.

Hmm. Not looking good. The word on the street is that the Alluxio Catalog Service doesn't support kerberized Hive.

image.png (100×979 px, 28 KB)

https://app.slack.com/client/TEXALQC8J/CEXGGUBDK/thread/CEXGGUBDK-1633528717.482200

There are still options of creating Hive tables that point to Alluxio locations, but it's still a compromise compared with what we wanted to achieve.

image.png (658×997 px, 165 KB)

Instead of Alluxio as acaching layer, we might like to look at the caching features of the hive connector that is avaiable in Trino: https://trino.io/docs/current/connector/hive-caching.html

We are already working to T266640: Decide whether to migrate from Presto to Trino (cc @razzi) so this feature of Trino might be a good differentiator.

The cache architecture section of the hive-connector for Trino states the following:

Caching can operate in two modes. The async mode provides the queried data directly and caches any objects asynchronously afterwards. Async is the default and recommended mode. The query doesn’t pay the cost of warming up the cache. The cache is populated in the background and the query bypasses the cache if the cache is not already populated. Any following queries requesting the cached objects are served directly from the cache.

The other mode is a read-through cache. In this mode, if an object is not found in the cache, it is read from the storage, placed in the cache, and then provided to the requesting query. In read-through mode, the query always reads from cache and must wait for the cache to be populated.

In both modes, objects are cached on local storage of each worker. Workers can request cached objects from other workers to avoid requests from the object storage.

The cache chunks are 1MB in size and are well suited for ORC or Parquet file formats.

I'll start researching whether Kerberos, user impersonation, and access control would operate in the manner we need.

Unfortunately, that's a no on all three counts.
https://trino.io/docs/current/connector/hive-caching.html#limitations

Limitations

Caching does not support user impersonation and cannot be used with HDFS secured by Kerberos. It does not take any user-specific access rights to the object storage into account.
The cached objects are simply transparent binary blobs to the caching system and full access to all content is available.

Looking at the details of the JMX monitoring and the GitHub history, it would appear that Trino merged rubix into their codebase.

The same limitations are present in their Starburst Enterprise Presto product: https://docs.starburst.io/latest/connector/hive-caching.html#limitations

Do we want to revisit the idea to T256108: Co-locate Presto with Hadoop worker nodes?

@JAllemandou spoke of an alternative solution, which was to create a second Hadoop cluster that was essentially colocated with the Presto nodes.
Data would then be regularly synced from (let's call it) the primary cluster, to the presto cluster by some means.

Change 731115 had a related patch set uploaded (by Btullis; author: Btullis):

[operations/puppet@production] Remove alluxio from the test cluster

https://gerrit.wikimedia.org/r/731115

Change 731115 merged by Btullis:

[operations/puppet@production] Remove alluxio resources from puppet

https://gerrit.wikimedia.org/r/731115

I have deployed a patch to the alluxio resources from puppet, given that it's not going to be able to meet our needs.

I will remove the packages and any remanants manually from an-test-coord1001 and an-test-presto1001.

I need to roll-restart the hadoop masters in both the analytics and test clusters, before removing the alluxio user and group in a subsequent commit.

Change 732296 had a related patch set uploaded (by Btullis; author: Btullis):

[operations/puppet@production] Remove the alluxio user and group

https://gerrit.wikimedia.org/r/732296

Change 732296 merged by Btullis:

[operations/puppet@production] Remove the alluxio user and group

https://gerrit.wikimedia.org/r/732296

Change 732719 had a related patch set uploaded (by Btullis; author: Btullis):

[operations/puppet@production] Remove all remaining references to alluxio

https://gerrit.wikimedia.org/r/732719

I have checked that puppet has run recently and successfully on all 102 servers that have the profile bigtop::alluxio::user applied.
This ensures that the user and group will have been removed by the absent change to the resources in that class.

btullis@cumin1001:~$ sudo cumin --no-progress C:bigtop::alluxio::user "/usr/local/lib/nagios/plugins/check_puppetrun -c 3600 -w 2000"
102 hosts will be targeted:
an-airflow1001.eqiad.wmnet,an-coord[1001-1002].eqiad.wmnet,an-launcher1002.eqiad.wmnet,an-master[1001-1002].eqiad.wmnet,an-test-client1001.eqiad.wmnet,an-test-coord1001.eqiad.wmnet,an-test-master[1001-1002].eqiad.wmnet,an-test-worker[1001-1003].eqiad.wmnet,an-worker[1078-1141].eqiad.wmnet,analytics[1058-1077].eqiad.wmnet,stat[1004-1008].eqiad.wmnet
Ok to proceed on 102 hosts? Enter the number of affected hosts to confirm or "q" to quit 102
===== NODE GROUP =====
(1) analytics1072.eqiad.wmnet
----- OUTPUT of '/usr/local/lib/n... -c 3600 -w 2000' -----
OK: Puppet is currently enabled, last run 26 minutes ago with 0 failures
===== NODE GROUP =====
(2) an-worker[1078-1079].eqiad.wmnet
----- OUTPUT of '/usr/local/lib/n... -c 3600 -w 2000' -----
OK: Puppet is currently enabled, last run 15 minutes ago with 0 failures
===== NODE GROUP =====
(1) an-master1001.eqiad.wmnet
----- OUTPUT of '/usr/local/lib/n... -c 3600 -w 2000' -----
OK: Puppet is currently enabled, last run 22 minutes ago with 0 failures
===== NODE GROUP =====
(1) an-test-worker1003.eqiad.wmnet
----- OUTPUT of '/usr/local/lib/n... -c 3600 -w 2000' -----
OK: Puppet is currently enabled, last run 17 minutes ago with 0 failures
===== NODE GROUP =====
(2) an-worker[1137,1141].eqiad.wmnet
----- OUTPUT of '/usr/local/lib/n... -c 3600 -w 2000' -----
OK: Puppet is currently enabled, last run 10 minutes ago with 0 failures
===== NODE GROUP =====
(1) analytics1075.eqiad.wmnet
----- OUTPUT of '/usr/local/lib/n... -c 3600 -w 2000' -----
OK: Puppet is currently enabled, last run 9 seconds ago with 0 failures
===== NODE GROUP =====
(3) an-launcher1002.eqiad.wmnet,an-worker1128.eqiad.wmnet,analytics1069.eqiad.wmnet
----- OUTPUT of '/usr/local/lib/n... -c 3600 -w 2000' -----
OK: Puppet is currently enabled, last run 29 minutes ago with 0 failures
===== NODE GROUP =====
(4) an-master1002.eqiad.wmnet,an-worker[1085,1114].eqiad.wmnet,analytics1066.eqiad.wmnet
----- OUTPUT of '/usr/local/lib/n... -c 3600 -w 2000' -----
OK: Puppet is currently enabled, last run 9 minutes ago with 0 failures
===== NODE GROUP =====
(3) an-coord1002.eqiad.wmnet,an-test-client1001.eqiad.wmnet,analytics1073.eqiad.wmnet
----- OUTPUT of '/usr/local/lib/n... -c 3600 -w 2000' -----
OK: Puppet is currently enabled, last run 18 minutes ago with 0 failures
===== NODE GROUP =====
(2) an-worker1094.eqiad.wmnet,analytics1058.eqiad.wmnet
----- OUTPUT of '/usr/local/lib/n... -c 3600 -w 2000' -----
OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures
===== NODE GROUP =====
(4) an-worker[1093,1119,1139].eqiad.wmnet,analytics1068.eqiad.wmnet
----- OUTPUT of '/usr/local/lib/n... -c 3600 -w 2000' -----
OK: Puppet is currently enabled, last run 11 minutes ago with 0 failures
===== NODE GROUP =====
(3) an-worker[1080,1121,1123].eqiad.wmnet
----- OUTPUT of '/usr/local/lib/n... -c 3600 -w 2000' -----
OK: Puppet is currently enabled, last run 20 minutes ago with 0 failures
===== NODE GROUP =====
(3) an-test-coord1001.eqiad.wmnet,an-worker1107.eqiad.wmnet,analytics1062.eqiad.wmnet
----- OUTPUT of '/usr/local/lib/n... -c 3600 -w 2000' -----
OK: Puppet is currently enabled, last run 25 minutes ago with 0 failures
===== NODE GROUP =====
(5) an-worker[1102,1108,1133,1135].eqiad.wmnet,analytics1076.eqiad.wmnet
----- OUTPUT of '/usr/local/lib/n... -c 3600 -w 2000' -----
OK: Puppet is currently enabled, last run 6 minutes ago with 0 failures
===== NODE GROUP =====
(7) an-worker[1095,1113,1117,1120,1127,1129].eqiad.wmnet,analytics1074.eqiad.wmnet
----- OUTPUT of '/usr/local/lib/n... -c 3600 -w 2000' -----
OK: Puppet is currently enabled, last run 8 minutes ago with 0 failures
===== NODE GROUP =====
(7) an-test-master1001.eqiad.wmnet,an-test-worker1002.eqiad.wmnet,an-worker[1088,1091,1138].eqiad.wmnet,analytics1059.eqiad.wmnet,stat1006.eqiad.wmnet
----- OUTPUT of '/usr/local/lib/n... -c 3600 -w 2000' -----
OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures
===== NODE GROUP =====
(4) an-airflow1001.eqiad.wmnet,an-worker1122.eqiad.wmnet,analytics[1064,1067].eqiad.wmnet
----- OUTPUT of '/usr/local/lib/n... -c 3600 -w 2000' -----
OK: Puppet is currently enabled, last run 14 minutes ago with 0 failures
===== NODE GROUP =====
(2) an-worker[1110,1136].eqiad.wmnet
----- OUTPUT of '/usr/local/lib/n... -c 3600 -w 2000' -----
OK: Puppet is currently enabled, last run 12 minutes ago with 0 failures
===== NODE GROUP =====
(1) an-worker1111.eqiad.wmnet
----- OUTPUT of '/usr/local/lib/n... -c 3600 -w 2000' -----
OK: Puppet is currently enabled, last run 43 seconds ago with 0 failures
===== NODE GROUP =====
(7) an-worker[1082,1106,1109,1115,1118,1140].eqiad.wmnet,analytics1065.eqiad.wmnet
----- OUTPUT of '/usr/local/lib/n... -c 3600 -w 2000' -----
OK: Puppet is currently enabled, last run 16 minutes ago with 0 failures
===== NODE GROUP =====
(1) stat1004.eqiad.wmnet
----- OUTPUT of '/usr/local/lib/n... -c 3600 -w 2000' -----
OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures
===== NODE GROUP =====
(3) an-worker1098.eqiad.wmnet,analytics[1060,1070].eqiad.wmnet
----- OUTPUT of '/usr/local/lib/n... -c 3600 -w 2000' -----
OK: Puppet is currently enabled, last run 24 minutes ago with 0 failures
===== NODE GROUP =====
(3) an-worker[1089,1100].eqiad.wmnet,stat1007.eqiad.wmnet
----- OUTPUT of '/usr/local/lib/n... -c 3600 -w 2000' -----
OK: Puppet is currently enabled, last run 28 minutes ago with 0 failures
===== NODE GROUP =====
(3) an-worker[1099,1116,1132].eqiad.wmnet
----- OUTPUT of '/usr/local/lib/n... -c 3600 -w 2000' -----
OK: Puppet is currently enabled, last run 7 minutes ago with 0 failures
===== NODE GROUP =====
(7) an-worker[1083,1086-1087,1101,1103,1124].eqiad.wmnet,analytics1063.eqiad.wmnet
----- OUTPUT of '/usr/local/lib/n... -c 3600 -w 2000' -----
OK: Puppet is currently enabled, last run 23 minutes ago with 0 failures
===== NODE GROUP =====
(6) an-test-master1002.eqiad.wmnet,an-test-worker1001.eqiad.wmnet,an-worker[1096,1130,1134].eqiad.wmnet,analytics1077.eqiad.wmnet
----- OUTPUT of '/usr/local/lib/n... -c 3600 -w 2000' -----
OK: Puppet is currently enabled, last run 13 minutes ago with 0 failures
===== NODE GROUP =====
(5) an-coord1001.eqiad.wmnet,an-worker[1104-1105,1112].eqiad.wmnet,stat1008.eqiad.wmnet
----- OUTPUT of '/usr/local/lib/n... -c 3600 -w 2000' -----
OK: Puppet is currently enabled, last run 21 minutes ago with 0 failures
===== NODE GROUP =====
(4) an-worker[1081,1090,1092,1097].eqiad.wmnet
----- OUTPUT of '/usr/local/lib/n... -c 3600 -w 2000' -----
OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
===== NODE GROUP =====
(7) an-worker[1084,1125-1126,1131].eqiad.wmnet,analytics[1061,1071].eqiad.wmnet,stat1005.eqiad.wmnet
----- OUTPUT of '/usr/local/lib/n... -c 3600 -w 2000' -----
OK: Puppet is currently enabled, last run 19 minutes ago with 0 failures
================
100.0% (102/102) success ratio (>= 100.0% threshold) for command: '/usr/local/lib/n... -c 3600 -w 2000'.
100.0% (102/102) success ratio (>= 100.0% threshold) of nodes successfully executed all commands.

I've checked all ext4 file systems on all of these hosts for files owned by the uid/gid 914.
There are still a few files still left around belonging to the alluxio user, specifically a kerberos keytab and a stray /var/lib/alluxio directory.

There are a few more files still owned by the alluxio user.

btullis@cumin1001:~$ sudo cumin -x C:bigtop::alluxio::user "sudo find $(findmnt -t ext4 -o TARGET -r -n|tr '\n' ' ') -xdev -uid 914"
                                                                                                               
(1) an-test-client1001.eqiad.wmnet                                                                                                                                                                                 
----- OUTPUT of 'sudo find / /srv  -xdev -uid 914' -----                                                                                                                                                           
/var/lib/alluxio                                                                                                                                                                                                   

===== NODE GROUP =====                                                                                                                                                                                             
(1) an-test-coord1001.eqiad.wmnet                                                                                                                                                                                  
----- OUTPUT of 'sudo find / /srv  -xdev -uid 914' -----                                                                                                                                                           
/etc/security/keytabs/alluxio                                                                                                                                                                                      
/etc/security/keytabs/alluxio/alluxio.keytab                                                                                                                                                                       
/tmp/krb5cc_914

I propose to remove these manually.

As @jbond pointed out, the alluxio user and group still exist on an-test-presto1001. They have not been absented by the recent change to puppet.

btullis@an-test-presto1001:~$ grep alluxio /etc/passwd
alluxio:x:914:914:alluxio User,,,:/var/lib/alluxio:/bin/false
btullis@an-test-presto1001:~$ grep alluxio /etc/group
hadoop:x:908:yarn,mapred,hdfs,alluxio
alluxio:x:914:

I could remove these manually, but I would like to understand why puppet isn't absenting it.

I have discovered that the bigtop::alluxio::user account was never applied to presto servers, because it wasn't included in the catalog.

We might like to think about whether we want to apply the profile::analytics::cluster::users profile to presto servers in future, but for now I am happy to remove the stray user and group manually on this server.

The same keytab file has been found on this server as on an-test-coord1001, so I will remove that manually as well.
Removing files

btullis@an-test-coord1001:~$ sudo rm /etc/security/keytabs/alluxio/alluxio.keytab && sudo rmdir /etc/security/keytabs/alluxio
btullis@an-test-coord1001:~$ sudo rm /tmp/krb5cc_914

btullis@an-test-presto1001:~$ sudo rm /etc/security/keytabs/alluxio/alluxio.keytab && sudo rmdir /etc/security/keytabs/alluxio

btullis@an-test-client1001:~$ sudo rmdir /var/lib/alluxio

Removing user and group

btullis@an-test-presto1001:~$ sudo deluser alluxio
Removing user `alluxio' ...
Warning: group `alluxio' has no more members.
Done.
btullis@an-test-presto1001:~$ sudo delgroup alluxio
The group `alluxio' does not exist.
btullis@an-test-presto1001:~$ grep alluxio /etc/group
btullis@an-test-presto1001:~$ grep alluxio /etc/passwd
btullis@an-test-presto1001:~$

Change 732952 had a related patch set uploaded (by Btullis; author: Btullis):

[labs/private@master] Remove unused dummy keytabs and an SSH key for alluxio

https://gerrit.wikimedia.org/r/732952

I have deleted the kerberos principals:

btullis@krb1001:~$ sudo manage_principals.py delete alluxio/an-test-coord1001.eqiad.wmnet@WIKIMEDIA
Principal successfully deleted.
btullis@krb1001:~$ sudo manage_principals.py delete alluxio/an-test-presto1001.eqiad.wmnet@WIKIMEDIA
Principal successfully deleted.

...and the keytabs that were generated.

root@krb1001:/srv/kerberos/keytabs# rm an-test-presto1001.eqiad.wmnet/alluxio/alluxio.keytab && rmdir an-test-presto1001.eqiad.wmnet/alluxio
root@krb1001:/srv/kerberos/keytabs# rm an-test-coord1001.eqiad.wmnet/alluxio/alluxio.keytab && rmdir an-test-coord1001.eqiad.wmnet/alluxio

I have removed these keytabs from the private puppet repository, and from the dummy puppet repository.

Change 732952 merged by Btullis:

[labs/private@master] Remove unused dummy keytabs and an SSH key for alluxio

https://gerrit.wikimedia.org/r/732952

Change 732719 merged by Btullis:

[operations/puppet@production] Remove all remaining references to alluxio

https://gerrit.wikimedia.org/r/732719