Page MenuHomePhabricator

Create Plan for Spark 2 Deprecation
Closed, ResolvedPublic5 Estimated Story Points

Description

In order to deprecate Spark 2 and fully move to Spark 3 all the current jobs need to be migrated. This task is to create a migration plan and provide a communication to teams who own Spark 2 jobs and will need to budget time to handle the migration.
Key Tasks:
  • Create an inventory of all the remaining jobs still using spark 2, including owners and stakeholders
  • Mark those which are DEs responsibility to move and migrate them
  • Write a statement of intent that we are moving to spark 3 including
    • Changes to workflows
    • Changes to data
    • Features and Benefits of the move
    • End Date for support of Spark 2
    • Finalizations
    • Send out Final Communication to Analytics Group

Event Timeline

Here's the link to the spreadsheet with the list of Spark2 jobs still running in the Hadoop cluster.
https://docs.google.com/spreadsheets/d/1j1HzjebGU61mDRRMS8Lyx7DpYiqQrxaM09HL5QRPRfc

Thanks @mforns for creating the list. Would it be helpful to have the risk v complexity column similar to the Oozie migration spreadsheet both for our and other team's jobs?

@Miriam Could you please help Marcel review the inventory of all the remaining jobs still using spark 2 in your team? Thanks a lot!
https://docs.google.com/spreadsheets/d/1j1HzjebGU61mDRRMS8Lyx7DpYiqQrxaM09HL5QRPRfc

EChetty set the point value for this task to 5.Oct 28 2022, 1:13 PM

@mpopov Could you please help Marcel review the inventory of all the remaining jobs still using spark 2 in your team? Thanks a lot!
https://docs.google.com/spreadsheets/d/1j1HzjebGU61mDRRMS8Lyx7DpYiqQrxaM09HL5QRPRfc

And thank you @Miriam as well! I saw your team added a couple jobs. Would that be all? If so I will close the list :-)

Thank you @mforns for the ping! We are almost there, could you give us another 24 hours?

Of course @Miriam! just wanted to know whether those were all. Let me know if I can help! Cheers

Spark 2 deprecation was announced via Slack and analytics-alerts:

Hi All, the Data Engineering is upgrading to Spark 3 and will no longer be supporting Spark 2 on our Hadoop cluster after March 31st, 2023. If your team owns Spark 2 jobs in production, please plan for the time needed to upgrade to Spark 3.

You can find more information about the upgrade on: https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Spark/Migration_to_Spark_3. Please add any missing jobs to the migration list on that page. If you need help from the data engineering team you can reach out to
@Jackeline Argüello or join us for the data engineering office hours.