Page MenuHomePhabricator

Rebuild conda-analytics container on Bullseye
Closed, ResolvedPublic

Assigned To
Authored By
Stevemunene
Apr 16 2024, 12:23 PM
Referenced Files
F48278179: image.png
Tue, Apr 23, 1:05 PM
F48273817: image.png
Tue, Apr 23, 12:25 PM
F47190260: image.png
Apr 17 2024, 5:10 PM
F47118734: image.png
Apr 17 2024, 7:52 AM

Description

While building the conda-analytics debian package, we ran into a debian related challenge where the buster-backports repo is no longer available which prevents us from building any images that include his repo on apt-update failure.

Using docker image sha256:005e63f9e66f5f76003f3a533a99253e3a38092b3b0823fddb134b8314f0f221 for docker-registry.wikimedia.org/wikimedia-buster:20210523 with digest docker-registry.wikimedia.org/wikimedia-buster@sha256:f67057421f6653f40907d3421a9666ef3fb140ef4e642ed572053deb5cca1b31 ...
$ apt-get update
Get:1 http://mirrors.wikimedia.org/debian buster InRelease [122 kB]
Get:2 http://mirrors.wikimedia.org/debian buster-updates InRelease [56.6 kB]
Ign:3 http://mirrors.wikimedia.org/debian buster-backports InRelease
Get:4 http://apt.wikimedia.org/wikimedia buster-wikimedia InRelease [178 kB]
Err:5 http://mirrors.wikimedia.org/debian buster-backports Release
  404  Not Found [IP: 208.80.154.139 80]
Get:6 http://security.debian.org buster/updates InRelease [34.8 kB]
Get:7 http://mirrors.wikimedia.org/debian buster/main amd64 Packages [10.7 MB]
Get:8 http://mirrors.wikimedia.org/debian buster-updates/main amd64 Packages [9745 B]
Get:9 http://apt.wikimedia.org/wikimedia buster-wikimedia/main amd64 Packages [94.4 kB]
Get:10 http://security.debian.org buster/updates/main amd64 Packages [775 kB]
Reading package lists...
E: The repository 'http://mirrors.wikimedia.org/debian buster-backports Release' does not have a Release file.
Cleaning up project directory and file based variables
00:01
ERROR: Job failed: exit code 1

This is mentioned on T362518 as well.
We plan on moving the process to Bullseye to avoid this, and this is to track the changes and process for this.

Details

TitleReferenceAuthorSource BranchDest Branch
configure conda-analytics to use bullseye imagerepos/data-engineering/conda-analytics!45stevemunenechange_conda_analytics_to_use_bullseyemain
Customize query in GitLab

Event Timeline

Mentioned in SAL (#wikimedia-analytics) [2024-04-17T07:37:53Z] <stevemunene> disable puppet on an-test-client1002 to test new conda anaytics deb T362648

New package installs correctly and the conda functionality seems unaffected.

stevemunene@an-test-client1002:~$ conda-analytics-clone bullseye-test
Creating new cloned conda env bullseye-test...
Source:      /opt/conda-analytics
Destination: /home/stevemunene/.conda/envs/bullseye-test
The following packages cannot be cloned out of the root environment:
 - conda-forge/linux-64::conda-23.10.0-py310hff52083_1
 - conda-forge/noarch::conda-libmamba-solver-23.12.0-pyhd8ed1ab_0
Packages: 223
Files: 1248
.
.
..
.
.
.
.
Wed 17 Apr 2024 07:43:56 AM UTC Created user conda environment bullseye-test

To activate this environment with vanilla conda run:
  source /opt/conda-analytics/etc/profile.d/conda.sh
  conda activate bullseye-test

Alternatively, you can use the conda-analytic helper script:
  source conda-analytics-activate bullseye-test

image.png (770×2 px, 116 KB)

Mentioned in SAL (#wikimedia-analytics) [2024-04-17T08:00:51Z] <stevemunene> enable puppet on an-test-client1002 done testing new conda anaytics deb T362648

New package installs correctly and the conda functionality seems unaffected.

stevemunene@an-test-client1002:~$ conda-analytics-clone bullseye-test
Creating new cloned conda env bullseye-test...
Source:      /opt/conda-analytics
Destination: /home/stevemunene/.conda/envs/bullseye-test
The following packages cannot be cloned out of the root environment:
 - conda-forge/linux-64::conda-23.10.0-py310hff52083_1
 - conda-forge/noarch::conda-libmamba-solver-23.12.0-pyhd8ed1ab_0
Packages: 223
Files: 1248
.
.
..
.
.
.
.
Wed 17 Apr 2024 07:43:56 AM UTC Created user conda environment bullseye-test

To activate this environment with vanilla conda run:
  source /opt/conda-analytics/etc/profile.d/conda.sh
  conda activate bullseye-test

Alternatively, you can use the conda-analytic helper script:
  source conda-analytics-activate bullseye-test

image.png (770×2 px, 116 KB)

Could we also run a sanity Spark test? See T344910#9331963 for an example that jsut runs two Spark SQL queries.

Could we also run a sanity Spark test? See T344910#9331963 for an example that jsut runs two Spark SQL queries.

Just did the sanity test on an-test-client1002 @xcollazo following the guide on the linked comment and looks good to me

cd /home/stevemunene/.conda/envs/spark34t/lib/python3.10/site-packages/pyspark/jars

zip -r ~/artifacts/spark-3.4.1-assembly.zip .

hdfs dfs -mkdir -p /user/stevemunene/artifacts

hdfs dfs -copyFromLocal ~/artifacts/spark-3.4.1-assembly.zip /user/stevemunene/artifacts
hdfs dfs -chmod +r /user/stevemunene/artifacts/spark-3.4.1-assembly.zip

hdfs dfs -mkdir -p /user/stevemunene/artifacts

Notebook config

%env SPARK_HOME=/home/stevemunene/.conda/envs/spark34t/lib/python3.10/site-packages/pyspark
%env SPARK_CONF_DIR=/etc/spark3/conf
import wmfdata

spark = wmfdata.spark.create_custom_session(
    master='yarn',
    spark_config={
        "spark.shuffle.service.name": 'spark_shuffle_3_4',
        "spark.shuffle.service.port": '7339',
        "spark.yarn.archive": "hdfs:///user/stevemunene/artifacts/spark-3.4.1-assembly.zip",
        "spark.dynamicAllocation.maxExecutors": 128,
        ##
        # extras to make Iceberg work on 3.4.1:
        ##
        "spark.jars.packages": "org.apache.iceberg:iceberg-spark-runtime-3.4_2.12:1.4.1",
        "spark.jars.ivySettings": "/etc/maven/ivysettings.xml",  # fix jar pulling
    }
)

image.png (2×2 px, 489 KB)

Just did the sanity test on an-test-client1002 @xcollazo following the guide on the linked comment and looks good to me

Thank you!

Mentioned in SAL (#wikimedia-analytics) [2024-04-23T12:13:38Z] <stevemunene> deploy conda-analytics v 0.0.29 to hadoop test cluster T362648

To upgrade the version

Downloaded the built package to the apr server,

stevemunene@apt1002:~$ dpkg-deb --info conda-analytics-0.0.29_amd64.deb 
 new Debian package, version 2.0.
 size 1129593128 bytes: control archive=940 bytes.
     570 bytes,    13 lines      control              
      49 bytes,     1 lines      files                
     645 bytes,    19 lines   *  postinst             #!/usr/bin/env
 Package: conda-analytics
 Version: 0.0.29
 Architecture: amd64
 Maintainer: Aqu (WMF) <aquhen@wikimedia.org>
 Installed-Size: 4361641
 Depends: bash, default-jre-headless | java8-runtime-headless, libsasl2-2, libmariadb-dev
 Section: python
 Priority: optional
 Homepage: https://gitlab.wikimedia.org/repos/data-engineering/conda-analytics
 Description: conda packed environment with pyspark for WMF
  This package contains the standalone environment conda-analytics.
  This package is intended to be installed across the analytics cluster to provide Spark 3 and other
  libraries.

Updated the version of conda analytics in apt repo

stevemunene@apt1002:~$ sudo -i reprepro includedeb buster-wikimedia `pwd`/conda-analytics-0.0.29_amd64.deb
Exporting indices...
stevemunene@apt1002:~$ sudo -i reprepro includedeb bullseye-wikimedia `pwd`/conda-analytics-0.0.29_amd64.deb
Exporting indices...
Deleting files no longer referenced..
stevemunene@apt1002:~$ sudo -i reprepro ls conda-analytics
conda-analytics | 0.0.29 |   buster-wikimedia | amd64
conda-analytics | 0.0.29 | bullseye-wikimedia | amd64

Next using the debdeploy tool

stevemunene@cumin1002:~$ generate-debdeploy-spec
Please enter the name of source package (e.g. openssl). type '' or 'quit' to abort
>conda-analytics
Enter an optional comment, e.g. a reference to a security advisory or a CVE ID mapping
>T356231

tool           -> The updated packages is an enduser tool, can be
                  rolled-out immediately.
daemon-direct  -> Daemons which are restarted during update, but which
                  do no affect existing users.
daemon-disrupt -> Daemons which are restarted during update, where the
                  users notice an impact. The update procedure is almost
                  identical, but displays additional warnings
library        -> After a library is updated, programs may need to be
                  restarted to fully effect the change. In addition
                  to libs, some applications may also fall under this rule,
                  e.g. when updating QEMU, you might need to restart VMs.

Please enter the update type:
>tool
Please enter the version of conda-analytics fixed in bookworm. Leave blank if no fix is available/required for bookworm.
>
Please enter the version of conda-analytics fixed in bullseye. Leave blank if no fix is available/required for bullseye.
>0.0.29
Please enter the version of conda-analytics fixed in buster. Leave blank if no fix is available/required for buster.
>0.0.29

Usually every upgrade only modifies existing package names. There are rare exceptions
e.g. if a rebase to a new upstream release is necessary.

Enter an optional comma-separated list of binary package names
which are being switched to a new name.
Leave blank to skip
>
Spec file created as 2024-04-23-conda-analytics.yaml

run an apt update on the test cluster first

stevemunene@cumin1002:~$ sudo cumin A:hadoop-test 'apt update'
8 hosts will be targeted:
an-test-client1002.eqiad.wmnet,an-test-coord1001.eqiad.wmnet,an-test-master[1001-1002].eqiad.wmnet,an-test-ui1001.eqiad.wmnet,an-test-worker[1001-1003].eqiad.wmnet

New version available

stevemunene@an-test-client1002:~$ apt-cache policy conda-analytics
conda-analytics:
  Installed: 0.0.28
  Candidate: 0.0.29
  Version table:
     0.0.29 1001
       1001 http://apt.wikimedia.org/wikimedia bullseye-wikimedia/main amd64 Packages
 *** 0.0.28 100
        100 /var/lib/dpkg/status

deploying to the test cluster

stevemunene@cumin1002:~$ sudo debdeploy deploy -u 2024-04-23-conda-analytics.yaml -s hadoop-test
Rolling out conda-analytics:
Non-daemon update, no service restart needed

conda-analytics was updated: 0.0.28 -> 0.0.29
  an-test-client1002.eqiad.wmnet,an-test-coord1001.eqiad.wmnet,an-
test-master[1001-1002].eqiad.wmnet,an-test-
worker[1001-1003].eqiad.wmnet (7 hosts)

The package to be updated isn't installed on these hosts:
  an-test-ui1001.eqiad.wmnet (1 hosts)

Verified

stevemunene@an-test-client1002:~$ apt-cache policy conda-analytics
conda-analytics:
  Installed: 0.0.29
  Candidate: 0.0.29
  Version table:
 *** 0.0.29 1001
       1001 http://apt.wikimedia.org/wikimedia bullseye-wikimedia/main amd64 Packages
        100 /var/lib/dpkg/status

image.png (566×686 px, 87 KB)

Moving on to production hosts

Mentioned in SAL (#wikimedia-analytics) [2024-04-23T12:50:03Z] <stevemunene> deploy conda-analytics v 0.0.29 to analytics stat hosts T362648

Mentioned in SAL (#wikimedia-analytics) [2024-04-23T12:59:52Z] <stevemunene> deploy conda-analytics v 0.0.29 to analytics-airflow hosts T362648

run an apt update on the hadoop workers

stevemunene@cumin1002:~$ sudo cumin A:hadoop-worker 'apt update'
106 hosts will be targeted:
an-worker[1078-1175].eqiad.wmnet,analytics[1070-1077].eqiad.wmnet

stevemunene@cumin1002:~$ sudo debdeploy deploy -u 2024-04-23-conda-analytics.yaml -s hadoop-worker
Rolling out conda-analytics:
Non-daemon update, no service restart needed

conda-analytics was updated: 0.0.28 -> 0.0.29
  an-worker[1078-1175].eqiad.wmnet,analytics[1070-1077].eqiad.wmnet
(106 hosts)

Next upgrade the coordinators

stevemunene@cumin1002:~$ sudo cumin A:hadoop-coordinator 'apt update'
2 hosts will be targeted:
an-coord[1003-1004].eqiad.wmnet

stevemunene@cumin1002:~$ sudo debdeploy deploy -u 2024-04-23-conda-analytics.yaml -s hadoop-coordinator
Rolling out conda-analytics:
Non-daemon update, no service restart needed

conda-analytics was updated: 0.0.28 -> 0.0.29
  an-coord[1003-1004].eqiad.wmnet (2 hosts)

Next are the stat hosts

stevemunene@cumin1002:~$ sudo debdeploy deploy -u 2024-04-23-conda-analytics.yaml -s stat
Rolling out conda-analytics:
Non-daemon update, no service restart needed

conda-analytics was updated: 0.0.27 -> 0.0.29
  stat[1004-1009].eqiad.wmnet (6 hosts)

conda-analytics was updated: 0.0.28 -> 0.0.29
  stat[1010-1011].eqiad.wmnet (2 hosts)

stevemunene@cumin1002:~$ sudo cumin A:stat 'systemctl restart jupyterhub-conda.service'
8 hosts will be targeted:
stat[1004-1011].eqiad.wmnet
OK to proceed on 8 hosts? Enter the number of affected hosts to confirm or "q" to quit: 8
===== NO OUTPUT =====                                                                                             
PASS |████████████████████████████████████████████████████████████████████| 100% (8/8) [00:01<00:00,  4.54hosts/s]
FAIL |                                                                            |   0% (0/8) [00:01<?, ?hosts/s]
100.0% (8/8) success ratio (>= 100.0% threshold) for command: 'systemctl restar...ub-conda.service'.
100.0% (8/8) success ratio (>= 100.0% threshold) of nodes successfully executed all commands.

Then we have the airflow hosts

stevemunene@cumin1002:~$ sudo debdeploy deploy -u 2024-04-23-conda-analytics.yaml -s analytics-airflow
Rolling out conda-analytics:
Non-daemon update, no service restart needed

conda-analytics was updated: 0.0.27 -> 0.0.29
  an-airflow[1002,1004-1007].eqiad.wmnet,an-launcher1002.eqiad.wmnet
(6 hosts)

All hosts are now running on v0.0.29

image.png (640×1 px, 121 KB)