Hadoop Yarn uses zookeeper to store data related to its running/old applications:
[zk: localhost:2181(CONNECTED) 3] ls /rmstore/ZKRMStateRoot [AMRMTokenSecretManagerRoot, RMAppRoot, EpochNode, RMDTSecretManagerRoot, RMVersionNode]
[zk: localhost:2181(CONNECTED) 6] stat /rmstore/ZKRMStateRoot/RMAppRoot cZxid = 0xc01884a67 ctime = Wed May 06 14:21:45 UTC 2015 mZxid = 0xc01884a67 mtime = Wed May 06 14:21:45 UTC 2015 pZxid = 0x30004a738f cversion = 8835145 dataVersion = 0 aclVersion = 0 ephemeralOwner = 0x0 dataLength = 0 numChildren = 10011 <======================= each of these znodes have also children related to app attempts
[zk: localhost:2181(CONNECTED) 12] get /rmstore/ZKRMStateRoot/RMAppRoot/application_1550134620574_26075 �����-� �����َ-6SELECT client_ip, COUNT(1) A...client_ip (Stage-1)nice*� � jobSubmitDir/job.splitmetainfow f hdfsanalytics-hadoop�>"I/user/analytics-search/.staging/job_1550134620574_26075/job.splitmetainfo������- ( [..] cZxid = 0x300049610f ctime = Fri Feb 22 12:55:35 UTC 2019 mZxid = 0x3000496121 mtime = Fri Feb 22 12:57:33 UTC 2019 pZxid = 0x3000496110 cversion = 1 dataVersion = 1 aclVersion = 0 ephemeralOwner = 0x0 dataLength = 2597 numChildren = 1
The recent Hadoop test cluster has already a lot of Znodes:
[zk: localhost:2181(CONNECTED) 14] stat /rmstore-analytics-test-hadoop/ZKRMStateRoot/RMAppRoot cZxid = 0x3000457e04 ctime = Fri Feb 15 09:18:19 UTC 2019 mZxid = 0x3000457e04 mtime = Fri Feb 15 09:18:19 UTC 2019 pZxid = 0x30004a74af cversion = 2545 dataVersion = 0 aclVersion = 0 ephemeralOwner = 0x0 dataLength = 0 numChildren = 2545
After reading https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.5/bk_yarn-resource-management/content/ref-c2ececdf-c68e-4095-99b5-15b4c31701ba.1.html I found that in theory the above state can be stored in HDFS using yarn.resourcemanager.fs.state-store.uri. My idea would be something like:
yarn.resourcemanager.fs.state-store.uri = hdfs://${hadoop-cluster-name}/user/yarn/rmstore/etc..
So we'd have a sort of "chroot" on HDFS and we'd avoid to rely on Zookeeper (storing state in there, with a ton of znodes) . There is already a tight dependency between HDFS and Yarn so I wouldn't be worried to add another one.
To test this we should:
- change yarn.resourcemanager.store.class to org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore (should be the default, we specifically set zookeeper).
- set yarn.resourcemanager.fs.state-store.uri
I'd be in favor of testing asap this setting on the Hadoop test cluster (before any kerberos etc.. testing) and possibly apply it to the main one if we are happy with the result.