Hadoop Yarn uses zookeeper to store data related to its running/old applications:
```
[zk: localhost:2181(CONNECTED) 3] ls /rmstore/ZKRMStateRoot
[AMRMTokenSecretManagerRoot, RMAppRoot, EpochNode, RMDTSecretManagerRoot, RMVersionNode]
```
```
[zk: localhost:2181(CONNECTED) 6] stat /rmstore/ZKRMStateRoot/RMAppRoot
cZxid = 0xc01884a67
ctime = Wed May 06 14:21:45 UTC 2015
mZxid = 0xc01884a67
mtime = Wed May 06 14:21:45 UTC 2015
pZxid = 0x30004a738f
cversion = 8835145
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x0
dataLength = 0
numChildren = 10011
```
```
[zk: localhost:2181(CONNECTED) 12] get /rmstore/ZKRMStateRoot/RMAppRoot/application_1550134620574_26075
�����-�
�����َ-6SELECT
client_ip,
COUNT(1) A...client_ip (Stage-1)nice*�
�
jobSubmitDir/job.splitmetainfow
f
hdfsanalytics-hadoop�>"I/user/analytics-search/.staging/job_1550134620574_26075/job.splitmetainfo������- (
[..]
cZxid = 0x300049610f
ctime = Fri Feb 22 12:55:35 UTC 2019
mZxid = 0x3000496121
mtime = Fri Feb 22 12:57:33 UTC 2019
pZxid = 0x3000496110
cversion = 1
dataVersion = 1
aclVersion = 0
ephemeralOwner = 0x0
dataLength = 2597
numChildren = 1
```
The recent Hadoop test cluster has already a lot of Znodes:
```
[zk: localhost:2181(CONNECTED) 14] stat /rmstore-analytics-test-hadoop/ZKRMStateRoot/RMAppRoot
cZxid = 0x3000457e04
ctime = Fri Feb 15 09:18:19 UTC 2019
mZxid = 0x3000457e04
mtime = Fri Feb 15 09:18:19 UTC 2019
pZxid = 0x30004a74af
cversion = 2545
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x0
dataLength = 0
numChildren = 2545
```
After reading https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.5/bk_yarn-resource-management/content/ref-c2ececdf-c68e-4095-99b5-15b4c31701ba.1.html I found that in theory the above state can be stored in HDFS using `yarn.resourcemanager.fs.state-store.uri`. My idea would be something like:
`yarn.resourcemanager.fs.state-store.uri = hdfs://${hadoop-cluster-name}/user/yarn/rmstore/${hadoop-cluster-name}/etc..
So we'd have a sort of "chroot" on HDFS and we'd avoid to rely on Zookeeper (storing state in there, with a ton of znodes) . There is already a tight dependency between HDFS and Yarn so I wouldn't be worried to add another one.
To test this we should:
* change `yarn.resourcemanager.store.class` to `org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore` (should be the default, we specifically set zookeeper).
* set `yarn.resourcemanager.fs.state-store.uri`
I'd be in favor of testing asap this setting on the Hadoop test cluster (before any kerberos etc.. testing) and possibly apply it to the main one if we are happy with the result.