Definition of done:
- The test YARN interface points to the test spark-history service
- The production YARN interface points to the spark-history service
| brouberol | |
| Dec 6 2023, 1:02 PM |
| F41612943: image.png | |
| Dec 19 2023, 9:54 AM |
| F41599068: image.png | |
| Dec 13 2023, 7:10 PM |
| F41599066: image.png | |
| Dec 13 2023, 7:10 PM |
| F41599057: image.png | |
| Dec 13 2023, 7:10 PM |
| F41599038: image.png | |
| Dec 13 2023, 7:10 PM |
| F41599029: image.png | |
| Dec 13 2023, 7:10 PM |
Definition of done:
| Status | Subtype | Assigned | Task | ||
|---|---|---|---|---|---|
| Resolved | brouberol | T330176 [Data Platform] Deploy Spark History Service | |||
| Resolved | brouberol | T352863 Configure the YARN resource manager with the spark history service URL |
Change 981948 had a related patch set uploaded (by Brouberol; author: Brouberol):
[operations/puppet@production] [yarn] Add the option to configure the spark history server address
Change 981949 had a related patch set uploaded (by Brouberol; author: Brouberol):
[operations/puppet@production] Configure the Spark History server host for the an-test yarn
Change 981950 had a related patch set uploaded (by Brouberol; author: Brouberol):
[operations/puppet@production] Configure the Spark History server host for the analytics yarn
Change 981948 merged by Brouberol:
[operations/puppet@production] [yarn] Add the option to configure the spark history server address
Change 981949 merged by Brouberol:
[operations/puppet@production] Configure the Spark History server host for the an-test yarn
Change 982656 had a related patch set uploaded (by Brouberol; author: Brouberol):
[operations/puppet@production] Revert "Configure the Spark History server host for the an-test yarn"
Change 982657 had a related patch set uploaded (by Brouberol; author: Brouberol):
[operations/puppet@production] Revert "[yarn] Add the option to configure the spark history server address"
Change 982797 had a related patch set uploaded (by Brouberol; author: Brouberol):
[operations/puppet@production] spark3: add option to specify spark history server address to yarn
Change 982798 had a related patch set uploaded (by Brouberol; author: Brouberol):
[operations/puppet@production] spark3: Specify the history server endoint for the test-analytics cluster
Change 982656 merged by Brouberol:
[operations/puppet@production] Revert "Configure the Spark History server host for the an-test yarn"
Change 982657 merged by Brouberol:
[operations/puppet@production] Revert "[yarn] Add the option to configure the spark history server address"
Change 982797 merged by Brouberol:
[operations/puppet@production] spark3: add option to specify spark history server address to yarn
Change 982798 merged by Brouberol:
[operations/puppet@production] spark3: Specify the history server endoint for the test-analytics cluster
This is proving a little tricky to test on the hadoop-test cluster, because we don't have ready access to a YARN job browser UI. However, I did the following little test which might be helpful.
I set up SSH access to the YARN UI in test with:
ssh -N -L 8088:an-test-master1001:8088 an-test-master1001.eqiad.wmnet
Note that we can't use localhost for this, as the port does not seem to be open on localhost.
Then I started a Jupyter notebook on an-test-client1002 with:
ssh -N an-test-client1002.eqiad.wmnet -L 8880:127.0.0.1:8880
I made sure that I have authenticated with kerberos, then I entered the following code into the notebook.
import wmfdata as wmf
ss = wmf.spark.create_custom_session(
master="yarn",
spark_config={
"spark.driver.memory": "2g",
"spark.dynamicAllocation.maxExecutors": 64,
"spark.executor.memory": "8g",
"spark.executor.cores": 4,
"spark.sql.shuffle.partitions": 256,
"spark.yarn.historyServer.address": "yarn.wikimedia.org"
}
)ss.sql("""
SELECT count(1) as count
FROM wmf.webrequest
WHERE year = 2023
AND month = 12
AND day = 12
""").show(100)I then checked the YARN UI for this spark session, which is still running, even though this particular query has finished:
We can see the application listed.
ss.stop()
Now we can see that the History link is shown instead of ApplicationMaster, but the address appears to be the same: http://an-test-master1001.eqiad.wmnet:8088/proxy/application_1702472457465_0153/
If I manually enter that address in the browser again, it redirects me to https://yarn.wikimedia.org/history/application_1702472457465_0153/1
I wouldn't be surprised if the HTTPS scheme here is enforced because of HSTS, but I haven't tested that.
So I think that we can add a redirect in the Apache config for the /history URL paths in the apache virtualhost config.
It looks like this has been done for the mapreduce server history here. But I haven't verified this yet.
Change 983192 had a related patch set uploaded (by Brouberol; author: Brouberol):
[operations/puppet@production] spark3: set the spark history server domain as yarn.wikimedia.org
Change 983193 had a related patch set uploaded (by Brouberol; author: Brouberol):
[operations/puppet@production] yarn: proxy the spark job history requests to the spark history service
Change 983193 abandoned by Brouberol:
[operations/puppet@production] yarn: proxy the spark job history requests to the spark history service
Reason:
Bundled with 983192 now
Change 983193 restored by Brouberol:
[operations/puppet@production] yarn: proxy the spark job history requests to the spark history service
Change 983192 merged by Brouberol:
[operations/puppet@production] spark3: set the spark history server domain for analytics-hadoop
We have setup a proxy_pass rule from https://yarn.wikimedia.org/history to https://spark-history.svc.eqiad:30443/history.
When we start spark jobs with spark.yarn.historyServer.address: yarn.wikimedia.org (which is evaluated as http://yarn.wikimedia.org), the http->https redirection will be taken care of by Apache, so:
We'll see if things work as expected.
Change 983712 had a related patch set uploaded (by Brouberol; author: Brouberol):
[operations/puppet@production] yarn: configure Apache to only listen to port 80
Change 983712 merged by Brouberol:
[operations/puppet@production] yarn: configure Apache to only listen to port 80
Change 983748 had a related patch set uploaded (by Brouberol; author: Brouberol):
[operations/deployment-charts@master] spark-history: enable definition of spark env vars in spark-env.sh
Change 983749 had a related patch set uploaded (by Brouberol; author: Brouberol):
[operations/deployment-charts@master] spark-history: set public DNS to yarn.wikimedia.org
Change 983748 abandoned by Brouberol:
[operations/deployment-charts@master] spark-history: enable definition of spark env vars in spark-env.sh
Reason:
Experimentation has shown this does not work
Change 984127 had a related patch set uploaded (by Brouberol; author: Brouberol):
[operations/deployment-charts@master] spark-history-analytics-hadoop: fix redirect and static links
Change 984128 had a related patch set uploaded (by Brouberol; author: Brouberol):
[operations/puppet@production] httpd-yarn: proxy reqs with a /spark-history prefix to the spark-history svc
Change 983749 abandoned by Brouberol:
[operations/deployment-charts@master] spark-history: set public DNS to yarn.wikimedia.org
Reason:
Experiment has shown that this does not work as expected
Change 984127 merged by Brouberol:
[operations/deployment-charts@master] spark-history-analytics-hadoop: fix redirect and static links
Change 984128 merged by Brouberol:
[operations/puppet@production] httpd-yarn: proxy reqs with a /spark-history prefix to the spark-history svc
We had to slightly tweak the spark UI config as well as the apache config to make the whole thing work:
ProxyPass /spark-history/ https://spark-history.svc.eqiad.wmnet:30443/ ProxyPassReverse /spark-history/ https://spark-history.svc.eqiad.wmnet:30443/
This fixed the serving of the spark history statics.
We also added the spark.ui.proxyRedirectUri: https://yarn.wikimedia.org/ spark parameter to tell spark to use the https://yarn.wikimedia.org/ base URL for all redirections, to make sure they get served / proxied by apache.
And with this, a Yarn link to the spark history server (eg https://yarn.wikimedia.org/proxy/application_1695896957545_507754/) now ping-pongs between redirections until the browser gets redirected to https://yarn.wikimedia.org/spark-history/history/application_1695896957545_507754/jobs/, which displays the following:
Change 983193 abandoned by Brouberol:
[operations/puppet@production] yarn: proxy the spark job history requests to the spark history service
Reason:
Already released
Change 981950 abandoned by Brouberol:
[operations/puppet@production] Configure the Spark History server host for the analytics yarn
Reason:
Moot