Parse wikidumps and extract redirect information for 1 small wiki, romanian.
We provide info on mediawiki history on whether a page is a redirect or not but we don't have historical info about it. The dumps, however, have this information so we can parse them and extract
historical info about historical revisions of redirect pages. The catch is that since mediawiki is multilingual the redirect code depends on the language.
This work needs to be coded in a distributed fashion in pyspark or similar using data in hadoop rather than it being a one-off job on the stats machine.