Page MenuHomePhabricator

Determine whether or not we need expose an internal service for yarn
Closed, DeclinedPublic

Description

We currently configure our data engineering apps specifying the YARN domain to be yarn.wikimedia.org, which is a publicly accessible domain.

When deploying airflow in Kubernetes, we might need to define an external_services entry to allow egress to the domain.

The issue is: yarn.wikimedia.org is a CNAME record that points to dyna.wikimedia.org, the domain serving our ATS reverse proxy. So by allowing egress traffic to yarn.wikimedia.org, we virtually enable egress traffic to all domains proxied by ATS. It might be more prudent to configure Apache with an additional, internal, vhost, such as yarn.discovery.wnnet.

Event Timeline

Hm, are we sure we need this? IIRC yarn client handles picking the hostname to talk to directly via yarn ResourceManager HA stuff.

I don't think this will be necessary after all, cf our experimentation in https://phabricator.wikimedia.org/T377602

BTullis renamed this task from Determine whether we can expose an internal service for yarn to Determine whether or not we need expose an internal service for yarn .Nov 8 2024, 1:26 PM
BTullis updated the task description. (Show Details)
Gehel triaged this task as High priority.Nov 8 2024, 2:23 PM

Based on https://phabricator.wikimedia.org/T377602#10304199, this is not required, as it works as-is.