There is still some modularization that should happen in the Wikidata Analytics core codebase:
- functions that work with the Wikibase API (e.g. to fetch Wikidata labels);
- funtcions that work with SPARQL/GAS programs against WDQS;
- OS operations for hdfs I/O + large file re-compositions;
- some hand crafted batch operations to produce large co-occurrence matrices;
- and some other misc things.
All of the above + if anything else is found needs to be re-factored so that all Wikidata Analytics components use one and the same set of functions.
This implies a development of an internal R package to be installed and used on the Analytics Clients (the stat100* machines).