Jul 15 2020
RE redirects - We don't reuse the redirect tables that @ArielGlenn dumps every few weeks because we need more up-to-date data, but we do something similar to what he described. We typically parse article pages for redirect templates, and add the information we extract about the redirect pages to the pages they redirect to.
Jul 10 2020
+1 on publishing the dataset as a small number of large splittable files compressed with a splittable format. It helps the download and distributed data processing.
Apr 1 2019
Thanks for the summary Ariel.
Feb 20 2019
I'm also interested in the specific reasons why the update frequency needs to be changed, i.e. beside streamlining the monthly workload on the Wikimedia machines.