Section-Topics already has an optional filter that handles media links. It can replace one section alignment component, namely the script that extract section images.
This is an opportunity to remove duplicate behavior and consolidate shared logic.
NOTE: this ticket accounts for one section alignment input, while the model that learns alignments is a separate task.
Tasks
- merge more fine-grained logic from section alignment's article_images.py into section topic's handle_media
- remove article_images.py from the section alignment pipeline
- remove the corresponding task in the section alignment DAG
- make sure section alignment's recommendation.py takes as input section topic's image dataset in the section alignment DAG