We now have duplicate data between section-topics, section-image-recs and image-suggestions.
In this task we should:
Compare data between these two repos.See https://phabricator.wikimedia.org/T333699#8931094- wikipedias.txt (section-topics) & wikipedias.json (section-image-recs)
- section_titles_denylist.json (image-suggestions & section-topics)
- For any identical data, come up with a strategy to share it from the one source. Some ideas:
- keep it in one of them, delete from the other, and copy at compile time to the other.
- keep it in one of them, delete from the other, and git submodule from the other
- have a 'common' repo where these files reside.
- merge scripts into a monorepo <- PREFERRED OPTION
Note: there is some overlap with T339120; both should follow a similar resolution.