⚓ T234629 Move the Analytics infrastructure to Debian Buster

	Subject	Repo	Branch	Lines +/-
	Default PYSPARK_PYTHON to exact versioned python executable used on driver.	operations/debs/spark2	debian	+20 -4

Status	Assigned	Task
Resolved	elukey	T234629 Move the Analytics infrastructure to Debian Buster
Resolved	• razzi	T231067 Install Debian Buster on Hadoop
Duplicate	None	T220542 Update R from 3.3.3 to 3.6.0 on stat and notebook machines
Resolved	elukey	T230724 Upgrade all SWAP users to JupyterLab 1.0
Resolved	MoritzMuehlenhoff	T233604 Create OpenJDK 8 packages for Buster
Declined	None	T275896 Review ROCm deployment procedures and current packages
Resolved	• razzi	T278421 Upgrade furud/flerovium to Debian Buster
Resolved	elukey	T278422 Upgrade the rest of the Hadoop test cluster to Buster
Resolved	• razzi	T278423 Upgrade the Hadoop masters to Debian Buster
Resolved	elukey	T278424 Upgrade the Hadoop coordinators to Debian Buster
Resolved	elukey	T252740 Move Matomo to Debian Buster
Resolved	elukey	T252741 Upgrade matomo to the latest upstream
Resolved	elukey	T252742 Create a VM for matomo1002 (eqiad)
Resolved	elukey	T252767 Move Archiva to Debian Buster
Resolved	elukey	T254849 Purge old files on Archiva to free some space
Resolved	elukey	T254890 Create archiva1002 as replacement of archiva1001
Resolved	herron	T252773 Move kafkamon hosts to Debian Buster
Resolved	elukey	T253980 Upgrade Druid to Debian Buster
Resolved	elukey	T255026 Upgrade schema[12]00[12] to Debian Buster
Resolved	elukey	T260347 Create 4 new VMs to replace schema[12]00[12]
Resolved	klausman	T255028 Move the stat1004-6-7 hosts to Debian Buster
Resolved	elukey	T255123 Upgrade Kafka Brokers to Debian Buster

elukey triaged this task as Medium priority.Oct 4 2019, 2:06 PM

elukey created this task.

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptOct 4 2019, 2:06 PM

elukey added a subtask: T231067: Install Debian Buster on Hadoop.Oct 4 2019, 2:07 PM

MoritzMuehlenhoff subscribed.Oct 7 2019, 8:06 AM

Milimetric moved this task from Incoming to Operational Excellence on the Analytics board.Oct 7 2019, 4:06 PM

Milimetric added a project: Analytics-Kanban.

Milimetric moved this task from Next Up to Parent Tasks on the Analytics-Kanban board.

elukey mentioned this in T220542: Update R from 3.3.3 to 3.6.0 on stat and notebook machines.Oct 16 2019, 8:12 AM

elukey updated the task description. (Show Details)Feb 18 2020, 12:55 PM

elukey changed the status of subtask T231067: Install Debian Buster on Hadoop from Open to Stalled.Feb 18 2020, 2:36 PM

elukey updated the task description. (Show Details)Feb 18 2020, 2:57 PM

elukey updated the task description. (Show Details)

nshahquinn-wmf subscribed.Feb 19 2020, 3:13 AM

elukey updated the task description. (Show Details)Feb 20 2020, 7:08 AM

elukey updated the task description. (Show Details)

elukey updated the task description. (Show Details)Feb 28 2020, 2:05 PM

elukey updated the task description. (Show Details)Mar 13 2020, 8:56 AM

elukey updated the task description. (Show Details)May 14 2020, 6:56 AM

elukey updated the task description. (Show Details)May 14 2020, 1:23 PM

elukey updated the task description. (Show Details)May 14 2020, 3:20 PM

For the stat100x, Kafka and Druid nodes it would be great that the Partman recipe letf /srv intact, but it doesn't seem feasible at the moment: https://phabricator.wikimedia.org/T252027

In the past we used to remove the partman recipe from netboot.cfg, and drive the debian install manually, but this seems not the best option since once one stops the d-i process then the next steps will belong to the standard debian upstream install process, not ours (for example DNS resolvers config will be missed, etc..).

elukey updated the task description. (Show Details)May 25 2020, 6:11 AM

elukey updated the task description. (Show Details)Jun 3 2020, 2:57 PM

elukey updated the task description. (Show Details)

elukey updated the task description. (Show Details)Jun 3 2020, 3:08 PM

@Ottomata today Miriam asked to me some info about why pyspark on stat100[5,8] were yielding version issues (3.7 on driver vs 3.5 on workers) and I found https://gerrit.wikimedia.org/r/#/c/operations/debs/spark2/+/562651/. Do we have another workaround or should we revamp that patch? I didn't follow the python versioning issues at the time :(

Cc: @EBernhardson @Miriam

elukey added subscribers: Miriam, EBernhardson.Jun 4 2020, 10:01 AM

Ah yes! https://phabricator.wikimedia.org/T229347#5439259

PYSPARK_PYTHON=python3.7 pyspark2 --master yarn

I added https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Spark#Set_the_python_version_pyspark_should_use to document.

We should probably change the spark2 deb to automatically set PYSPARK_PYTHON to the default python version on the launching node.

Ping @Miriam

Oh this is great, thanks so much @Ottomata !

Change 602386 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/debs/spark2@debian] Default PYSPARK_PYTHON to exact versioned python executable used on driver.

https://gerrit.wikimedia.org/r/602386

gerritbot added a project: Patch-For-Review.Jun 4 2020, 2:52 PM

elukey updated the task description. (Show Details)Jun 5 2020, 10:26 AM

elukey updated the task description. (Show Details)Jun 5 2020, 5:54 PM

• Nuria closed subtask T253980: Upgrade Druid to Debian Buster as Resolved.Jun 8 2020, 7:39 PM

elukey updated the task description. (Show Details)Jun 9 2020, 8:45 AM

elukey updated the task description. (Show Details)Jun 18 2020, 12:32 PM

• Nuria closed subtask T252740: Move Matomo to Debian Buster as Resolved.Jun 19 2020, 4:56 PM

elukey updated the task description. (Show Details)Jul 2 2020, 7:52 AM

Change 602386 merged by Ottomata:
[operations/debs/spark2@debian] Default PYSPARK_PYTHON to exact versioned python executable used on driver.

https://gerrit.wikimedia.org/r/602386

Maintenance_bot removed a project: Patch-For-Review.Jul 8 2020, 3:11 PM