Page MenuHomePhabricator

Migrate the Mediawiki-geoeditors jobs to Airflow
Closed, ResolvedPublic3 Estimated Story Points

Description

Goal:
Migrate the media wiki-geoeditors load, monthly and yearly jobs to Airflow:

Job Details:

Type | Input | Processing | Output
Load | HDFS | Hive | Hive
Export | Hive | Hive | Hive & Archive

Success Criteria:

  • Have the 1 Load Jobs Migrated (SLA 42 Days)
  • Have the 2 Export Jobs Migrated - Monthly (SLA 36 Days), Yearly (SLA 367 Days)

Event Timeline

EChetty set the point value for this task to 3.Aug 16 2022, 3:17 PM
EChetty moved this task from Discussed (Radar) to Sprint 00 on the Data Pipelines board.
EChetty edited projects, added Data Pipelines (Sprint 00); removed Data Pipelines.
xcollazo triaged this task as High priority.Sep 9 2022, 1:50 PM

Change 831639 had a related patch set uploaded (by Xcollazo; author: Xcollazo):

[analytics/refinery@master] Modify geoeditor SQL scripts to play nice with Spark3

https://gerrit.wikimedia.org/r/831639

Change 831639 merged by Joal:

[analytics/refinery@master] Modify geoeditor SQL scripts to play nice with Spark3

https://gerrit.wikimedia.org/r/831639

Killed the following oozie jobs after Airflow migration:

0088610-220613130955581-oozie-oozi-C     mediawiki-geoeditors-public_monthly-coordRUNNING   1    MONTH        2022-07-01 00:00 GMT    2022-10-01 00:00 GMT    
0019086-210107075406929-oozie-oozi-C     mediawiki-geoeditors-yearly-coordRUNNING   12   MONTH        2021-01-01 00:00 GMT    2023-01-01 00:00 GMT    
0021850-201202074829419-oozie-oozi-C     mediawiki-geoeditors-load-coordRUNNING   1    MONTH        2020-12-01 00:00 GMT    2022-10-01 00:00 GMT

This has been merged. However we still need to scap deploy. Since there are a couple more changes that would be picked up, will wait till stand up to coordinate the deployment.

Piggybacked in today's analytics Airflow deployment. All done here.