The merge request at https://gitlab.wikimedia.org/repos/generated-data-platform/datapipelines/-/merge_requests/51 is the key step of the image suggestions data pipeline onboarding as per https://www.mediawiki.org/wiki/Platform_Engineering_Team./Data_Value_Stream/Data_Pipeline_Onboarding/#Onboarding.
The following list specifies pending tasks besides T307362 and T307371.
Tasks
---
- ~~[ ] The cleanup script fails due to missing Spark session: run it as a non-Spark task~~ - **not needed anymore, superseded by {T307983}**
- [x] fix the `schedule_interval` cron expression
- [x] add explicit descriptions in `DataFrame.write` calls for better monitoring on https://yarn.wikimedia.org/cluster/scheduler - **see https://gitlab.wikimedia.org/repos/generated-data-platform/datapipelines/-/commit/4d0e458fa9030ebccb59a3c39e8c3ef13699fbdc **
- [x] fix Hive connection error, see https://gitlab.wikimedia.org/repos/generated-data-platform/datapipelines/-/merge_requests/55#note_6700 - **caused by https://stackoverflow.com/a/30707252/10719765, fix at https://gerrit.wikimedia.org/r/c/operations/puppet/+/791612 **