Nov 5 2017
Oct 9 2017
Jul 15 2017
I have developed a full Bash+Python3 framework that enables to compute PageRank on any Wikipedia language edition (even with low-cost hardware). By default, the input is based on the latest version of the Wikidump and the output involves each page's Q-id and an according ranking score. The software is licensed under GPL v3 and it can be accessed at the following URL:
Mar 17 2017
@Smalyshev, I think we might check first if the type of output is of any use for you. You can get most info (e.g. output/input format) at http://people.aifb.kit.edu/ath/#Wikidata_PageRank. It is not run on Hadoop and it takes fairly little resources (actually it can be optimized to run on a laptop with 16gb of ram). Currently, there are no optimizations in place and we use about 200GB of RAM (processing power doesn't matter). In case good use cases exist and it has been verified that the current output is of any use, as next steps I would consider the following:
- transform the actual link datasets of Wikipedia to a processable format (similar to the output of DBpedia pagelinks)
- develop a processing pipeline as a docker file and make all source code available under a free license
Nov 4 2016
I don't really think nt adds more value. If you produce valid turtle, there are tools such as Raptor RDF Syntax Library that easily convert between different RDF syntaxes. Everyone that really needs nt can do this fairly easily themselves, i.e.
Sep 19 2016
We were recently discussing a Wikipedia PageRank solution (or a combination of that ranking with other features). I could contribute these scores and get ready also to implement some integration (with some help).