Page MenuHomePhabricator

Puppetize Spark 3 installation using conda-analytics env
Closed, ResolvedPublic5 Estimated Story Points

Description

A new conda-analytics env is being created to install spark3, as well as other packages. We want to create puppetization that installs the conda-analytics.deb package being created as part of T309227: Create conda-base-env with last pyspark.

This puppetization should handle properly setting PYSPARK_PYTHON and PYSPARK_DRIVER_PYTHON.

Automation of creation and uploading of a spark assembly file will be done in a separate task: T310578: Build and install spark3 assembly.

Event Timeline

Change 813278 had a related patch set uploaded (by Ottomata; author: Ottomata):

[operations/puppet@production] Puppetize spark3 installation and configs using conda-analytics env

https://gerrit.wikimedia.org/r/813278

Change 813278 merged by Btullis:

[operations/puppet@production] Puppetize spark3 installation and configs using conda-analytics env

https://gerrit.wikimedia.org/r/813278

Change 821293 had a related patch set uploaded (by Ottomata; author: Ottomata):

[operations/puppet@production] Don't hardcode /opt/conda-analytics in spark3.env.sh.erb

https://gerrit.wikimedia.org/r/821293

Heya @Antoine_Quhen @BTullis, I had to revert https://gerrit.wikimedia.org/r/c/operations/puppet/+/821326. Was not ready to be merged. It changed /etc/spark3/conf, which is being used by the existent airflow spark 3 jobs.

In either case, we can test the spark3 configuration before we merge puppet and make it official, even with things like test_spark_3_install. Testing by merging puppet patches sometimes is all we can do, but in this case we can do better.

To test changes to the spark 3 configuration, along with the .deb being developed in T309227: Create conda-base-env with last pyspark, we can use dpkg-deb -x to extract the deb into a local directory, and then also make a local spark3/ conf dir with the settings we need. We should try to make these work together before we merge puppet changes to /etc/spark3/conf.

I hadn't merged that because of the outstanding issues with the .deb listed here: https://phabricator.wikimedia.org/T309227#8079678

Let's solve those first.

Change 821695 had a related patch set uploaded (by Aqu; author: Aqu):

[operations/puppet@production] Puppetize spark3 installation and configs using conda-analytics env V2

https://gerrit.wikimedia.org/r/821695

EChetty set the point value for this task to 5.Aug 17 2022, 12:50 PM

I have rolled out version 0.0.8 of conda-analytics to the five hosts where it is currently deployed:

btullis@cumin1001:~$ sudo debdeploy deploy -u 2022-08-30-conda-analytics.yaml -s hadoop-client-test
Rolling out conda-analytics:
Non-daemon update, no service restart needed

conda-analytics was updated: 0.0.7 -> 0.0.8
  an-test-client1001.eqiad.wmnet (1 hosts)

btullis@cumin1001:~$ sudo debdeploy deploy -u 2022-08-30-conda-analytics.yaml -s hadoop-worker-test
Rolling out conda-analytics:
Non-daemon update, no service restart needed

conda-analytics was updated: 0.0.7 -> 0.0.8
  an-test-worker[1001-1003].eqiad.wmnet (3 hosts)

btullis@cumin1001:~$ sudo debdeploy deploy -u 2022-08-30-conda-analytics.yaml -s hadoop-coordinator-test
Rolling out conda-analytics:
Non-daemon update, no service restart needed

conda-analytics was updated: 0.0.7 -> 0.0.8
  an-test-coord1001.eqiad.wmnet (1 hosts)

As per Antoine's request, I have begun work to roll out version 0.0.9 of the conda-analytics deb to the test cluster.

On apt1001, get the artifact from GitLab:

btullis@apt1001:~$ wget -O conda-analytics-0.0.9_amd64.deb https://gitlab.wikimedia.org/repos/data-engineering/conda-analytics/-/package_files/731/download
--2022-09-06 08:54:58--  https://gitlab.wikimedia.org/repos/data-engineering/conda-analytics/-/package_files/731/download
Resolving gitlab.wikimedia.org (gitlab.wikimedia.org)... 2620:0:861:2:208:80:154:145, 208.80.154.145
Connecting to gitlab.wikimedia.org (gitlab.wikimedia.org)|2620:0:861:2:208:80:154:145|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 530052260 (505M) [application/octet-stream]
Saving to: ‘conda-analytics-0.0.9_amd64.deb’

conda-analytics-0.0.9_amd64.deb                      100%[===================================================================================================================>] 505.50M   108MB/s    in 4.7s

2022-09-06 08:55:03 (108 MB/s) - ‘conda-analytics-0.0.9_amd64.deb’ saved [530052260/530052260]

Check the existing package details and upgrade it.

btullis@apt1001:~$ sudo -i reprepro ls conda-analytics
conda-analytics | 0.0.8 | buster-wikimedia | amd64

btullis@apt1001:~$ sudo -i reprepro includedeb buster-wikimedia `pwd`/conda-analytics-0.0.9_amd64.deb
Exporting indices...
Deleting files no longer referenced...

btullis@apt1001:~$ sudo -i reprepro ls conda-analytics
conda-analytics | 0.0.9 | buster-wikimedia | amd64

On cumin1001 force the test-cluster to run apt update

btullis@cumin1001:~$ sudo cumin A:hadoop-test 'apt update'

Generate a debdeploy spec.

btullis@cumin1001:~$ generate-debdeploy-spec
Please enter the name of source package (e.g. openssl). type '' or 'quit' to abort
>conda-analytics
Enter an optional comment, e.g. a reference to a security advisory or a CVE ID mapping
>T312882

tool           -> The updated packages is an enduser tool, can be
                  rolled-out immediately.
daemon-direct  -> Daemons which are restarted during update, but which
                  do no affect existing users.
daemon-disrupt -> Daemons which are restarted during update, where the
                  users notice an impact. The update procedure is almost
                  identical, but displays additional warnings
library        -> After a library is updated, programs may need to be
                  restarted to fully effect the change. In addition
                  to libs, some applications may also fall under this rule,
                  e.g. when updating QEMU, you might need to restart VMs.

Please enter the update type:
>tool
Please enter the version of conda-analytics fixed in bullseye. Leave blank if no fix is available/required for bullseye.
>
Please enter the version of conda-analytics fixed in buster. Leave blank if no fix is available/required for buster.
>0.0.9
Please enter the version of conda-analytics fixed in stretch. Leave blank if no fix is available/required for stretch.
>

Usually every upgrade only modifies existing package names. There are rare exceptions
e.g. if a rebase to a new upstream release is necessary.

Enter an optional comma-separated list of binary package names
which are being switched to a new name.
Leave blank to skip
>
Spec file created as 2022-09-06-conda-analytics.yaml

Use debdeploy to roll out the packages:

btullis@cumin1001:~$ sudo debdeploy deploy -u 2022-09-06-conda-analytics.yaml -Q 'A:hadoop-worker-test or A:hadoop-client-test or A:hadoop-coordinator-test'

Rolling out conda-analytics:
Non-daemon update, no service restart needed

conda-analytics was updated: 0.0.8 -> 0.0.9
  an-test-client1001.eqiad.wmnet,an-test-coord1001.eqiad.wmnet,an-
test-worker[1001-1003].eqiad.wmnet (5 hosts)

Change 821695 merged by Btullis:

[operations/puppet@production] Puppetize spark3 installation and configs using conda-analytics env V2

https://gerrit.wikimedia.org/r/821695

Change 830170 had a related patch set uploaded (by Btullis; author: Btullis):

[operations/puppet@production] Fix the spark3-env.sh resource

https://gerrit.wikimedia.org/r/830170

Change 830170 merged by Ottomata:

[operations/puppet@production] Fix the spark3-env.sh resource

https://gerrit.wikimedia.org/r/830170

Change 833406 had a related patch set uploaded (by Aqu; author: Aqu):

[operations/puppet@production] Deploy Spark 3 conf and debian pkg to test cluster

https://gerrit.wikimedia.org/r/833406

Change 833412 had a related patch set uploaded (by Aqu; author: Aqu):

[operations/puppet@production] Deploy Spark 3 to production

https://gerrit.wikimedia.org/r/833412

Change 833842 had a related patch set uploaded (by Aqu; author: Aqu):

[operations/puppet@production] WIP Puppet test

https://gerrit.wikimedia.org/r/833842

Change 833842 abandoned by Aqu:

[operations/puppet@production] WIP Puppet test

Reason:

This test served its purpose.

https://gerrit.wikimedia.org/r/833842

Change 833406 merged by Ottomata:

[operations/puppet@production] Deploy Spark 3 conf and debian pkg to test cluster

https://gerrit.wikimedia.org/r/833406

Change 834359 had a related patch set uploaded (by Aqu; author: Aqu):

[operations/puppet@production] Add missing Spark 3 on an-test-coord*

https://gerrit.wikimedia.org/r/834359

Change 834359 merged by Ottomata:

[operations/puppet@production] Add missing Spark 3 on an-test-coord*

https://gerrit.wikimedia.org/r/834359

Change 834365 had a related patch set uploaded (by Ottomata; author: Ottomata):

[operations/puppet@production] debconf::set - add $owner param, set owner in conda_analytics/init.pp

https://gerrit.wikimedia.org/r/834365

Change 834365 merged by Ottomata:

[operations/puppet@production] debconf::set - add $owner param, set owner in conda_analytics/init.pp

https://gerrit.wikimedia.org/r/834365

Change 834370 had a related patch set uploaded (by Ottomata; author: Ottomata):

[operations/puppet@production] Set spark3 config on hadoop workers, test install only on one worker

https://gerrit.wikimedia.org/r/834370

Change 834370 merged by Ottomata:

[operations/puppet@production] Set spark3 config on hadoop workers, test install only on one worker

https://gerrit.wikimedia.org/r/834370

Change 834371 had a related patch set uploaded (by Ottomata; author: Ottomata):

[operations/puppet@production] Install spark3 via conda-analytics on all stat boxes

https://gerrit.wikimedia.org/r/834371

Change 834371 merged by Ottomata:

[operations/puppet@production] Install spark3 via conda-analytics on all stat boxes

https://gerrit.wikimedia.org/r/834371

Change 834500 had a related patch set uploaded (by Aqu; author: Aqu):

[operations/puppet@production] Deploy Spark 3 on the whole production cluster

https://gerrit.wikimedia.org/r/834500

Change 833412 abandoned by Aqu:

[operations/puppet@production] Deploy Spark 3 to production

Reason:

Replaced by https://gerrit.wikimedia.org/r/c/operations/puppet/+/834500

https://gerrit.wikimedia.org/r/833412

Change 834500 merged by Btullis:

[operations/puppet@production] Deploy Spark 3 on the whole production cluster

https://gerrit.wikimedia.org/r/834500

We have now merged this patch and therefore the conda-analytics package with spark3 is being rolled out to all production hadoop workers.

Change 821293 abandoned by Ottomata:

[operations/puppet@production] Don't hardcode /opt/conda-analytics in spark3.env.sh.erb

Reason:

Things have changed :)

https://gerrit.wikimedia.org/r/821293