Page MenuHomePhabricator

Move the Analytics infrastructure to Debian Buster
Open, MediumPublic0 Estimated Story Points

Description

Tracking task for upgrading Analytics systems to Debian Buster.

  • Q4 2019/2020
  • Matomo (probably worth to create matomo1002?) - IN PROGRESS
  • Archiva (probably worth to create archiva1002?) - IN PROGRESS
  • Notebooks - easier after T243934, probably we are going to decom them decom tracked in T249752
  • Druid Analytics - First Druid nodes on Buster in T252771 - IN PROGRESS
  • Druid Public
  • Q1 2020/2021
  • stat100[4,6,7]
  • Kafka Jumbo - First Kafka brokers on Buster in T252675 - IN PROGRESS
  • Event Schema hosts
  • kafkamon - IN PROGRESS (shared with The SRE Observability team)
  • Q2 2020/2021
  • Hadoop (needs BigTop 1.5 deployed)
  • AQS
  • Eventlogging (we may decom the host due to T238230 before time)
  • Thorium - the host needs to be refreshed next FY so we can possibly couple the two things
  • Yarn and Hue (analytics-tool1001) - This probably will need Packaging Hue for Python3, see T233073

Relevant task:

  • find a partman recipe able to preserve /srv - T252027
  • new ganeti nodes T228924

Related Objects

Event Timeline

elukey triaged this task as Medium priority.Oct 4 2019, 2:06 PM
elukey created this task.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptOct 4 2019, 2:06 PM
Milimetric added a project: Analytics-Kanban.
Milimetric moved this task from Next Up to Parent Tasks on the Analytics-Kanban board.
elukey updated the task description. (Show Details)Feb 18 2020, 12:55 PM
elukey changed the status of subtask T231067: Install Debian Buster on Hadoop from Open to Stalled.Feb 18 2020, 2:36 PM
elukey updated the task description. (Show Details)Feb 18 2020, 2:57 PM
elukey updated the task description. (Show Details)
elukey updated the task description. (Show Details)
elukey updated the task description. (Show Details)Feb 20 2020, 7:08 AM
elukey updated the task description. (Show Details)
elukey updated the task description. (Show Details)Feb 28 2020, 2:05 PM
elukey updated the task description. (Show Details)Mar 13 2020, 8:56 AM
elukey updated the task description. (Show Details)May 14 2020, 6:56 AM
elukey updated the task description. (Show Details)May 14 2020, 1:23 PM
elukey updated the task description. (Show Details)May 14 2020, 3:20 PM

For the stat100x, Kafka and Druid nodes it would be great that the Partman recipe letf /srv intact, but it doesn't seem feasible at the moment: https://phabricator.wikimedia.org/T252027

In the past we used to remove the partman recipe from netboot.cfg, and drive the debian install manually, but this seems not the best option since once one stops the d-i process then the next steps will belong to the standard debian upstream install process, not ours (for example DNS resolvers config will be missed, etc..).

elukey updated the task description. (Show Details)May 25 2020, 6:11 AM
elukey updated the task description. (Show Details)Jun 3 2020, 2:57 PM
elukey updated the task description. (Show Details)
elukey updated the task description. (Show Details)Jun 3 2020, 3:08 PM
elukey added a comment.EditedJun 4 2020, 9:53 AM

@Ottomata today Miriam asked to me some info about why pyspark on stat100[5,8] were yielding version issues (3.7 on driver vs 3.5 on workers) and I found https://gerrit.wikimedia.org/r/#/c/operations/debs/spark2/+/562651/. Do we have another workaround or should we revamp that patch? I didn't follow the python versioning issues at the time :(

Cc: @EBernhardson @Miriam

Ottomata added a comment.EditedJun 4 2020, 2:16 PM

Ah yes! https://phabricator.wikimedia.org/T229347#5439259

PYSPARK_PYTHON=python3.7 pyspark2 --master yarn

I added https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Spark#Set_the_python_version_pyspark_should_use to document.

We should probably change the spark2 deb to automatically set PYSPARK_PYTHON to the default python version on the launching node.

Ping @Miriam

Miriam added a comment.Jun 4 2020, 2:34 PM

Oh this is great, thanks so much @Ottomata !

Change 602386 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/debs/spark2@debian] Default PYSPARK_PYTHON to exact versioned python executable used on driver.

https://gerrit.wikimedia.org/r/602386

elukey updated the task description. (Show Details)Jun 5 2020, 10:26 AM
elukey updated the task description. (Show Details)Jun 5 2020, 5:54 PM
elukey updated the task description. (Show Details)Jun 9 2020, 8:45 AM
elukey updated the task description. (Show Details)Jun 18 2020, 12:32 PM
elukey updated the task description. (Show Details)Jul 2 2020, 7:52 AM

Change 602386 merged by Ottomata:
[operations/debs/spark2@debian] Default PYSPARK_PYTHON to exact versioned python executable used on driver.

https://gerrit.wikimedia.org/r/602386