Instead of using a python script (https://github.com/schana/wikimedia-utils/blob/master/get_sitelink_pageviews.py), add the functionality to the spark job.
https://github.com/schana/recommendation-translation/commit/91a2acca70fa361324c9c3f5e11d064ec52b5c4f