Page MenuHomePhabricator

SPIKE - Will Hadoop 3 container support help us for Airflow deployment pipelines?
Open, Needs TriagePublic

Description

Hadoop 3 supports running Docker containers.

  • Can Airflow use it to launch jobs in Hadoop?

Event Timeline

Container support has been backported to the version that we are using - 2.10

From scanning quickly through the docs, it seems this would require Docker to be installed on nodemanagers.

Also

Docker Container Executor runs in non-secure mode of HDFS and YARN. It will not run in secure mode, and will exit if it detects secure mode.

https://hadoop.apache.org/docs/r2.7.2/hadoop-yarn/hadoop-yarn-site/DockerContainerExecutor.html

Secure mode == kerberos.

So,

Can we do it?

I think not :/

OK, thanks for looking into this!

It won't work for our Hadoop YARN setup though, we'll still encounter the Kerberos barrier. Rootless docker MAY make it possible to run docker outside of Hadoop YARN though.

Perhaps hadoop 3 can do this with kerberos?

https://docs.cloudera.com/HDPDocuments/HDP3/HDP-3.1.0/data-operating-system/content/configure_yarn_for_running_docker_containers.html

Kerberos configurations are recommended for production

I can't find any docs on how this would work. But it may be worth investigating given https://phabricator.wikimedia.org/T296543#7538145

Reopening for now to discuss more.

@Ottomata: Per emails from Sep18 and Oct20 and https://www.mediawiki.org/wiki/Bug_management/Assignee_cleanup , I am resetting the assignee of this task because there has not been progress lately (please correct me if I am wrong!). Resetting the assignee avoids the impression that somebody is already working on this task. It also allows others to potentially work towards fixing this task. Please claim this task again when you plan to work on it (via Add Action...Assign / Claim in the dropdown menu) - it would be welcome. Thanks for your understanding!