Page MenuHomePhabricator

thalhamm
User

Projects

User does not belong to any projects.

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Tuesday

  • Clear sailing ahead.

User Details

User Since
Sep 19 2016, 7:28 PM (405 w, 5 d)
Availability
Available
LDAP User
Unknown
MediaWiki User
Thalhamm [ Global Accounts ]

Recent Activity

Sat, Jun 8

thalhamm added a comment to T366996: Large number wiki dumps with reduced/changed db schema for pagelinks.sql.gz.

It's a feature! Thanks for clarifying

Sat, Jun 8, 9:14 PM · Dumps-Generation
thalhamm created T366996: Large number wiki dumps with reduced/changed db schema for pagelinks.sql.gz.
Sat, Jun 8, 7:28 PM · Dumps-Generation

Nov 5 2017

thalhamm placed T143424: Explore the Entity Relevancy Scoring for Wikidata up for grabs.
Nov 5 2017, 1:21 PM · Wikidata

Oct 9 2017

thalhamm updated subscribers of T143424: Explore the Entity Relevancy Scoring for Wikidata.

@Lydia_Pintscher @Smalyshev Based on the Wikidata PageRank scores, my former colleague Steffen Thoma (KIT) and I developed a Wikidata autocomplete prototype based on Apache Solr. Please see here:

Oct 9 2017, 6:54 PM · Wikidata

Jul 15 2017

thalhamm added a comment to T143424: Explore the Entity Relevancy Scoring for Wikidata.

I have developed a full Bash+Python3 framework that enables to compute PageRank on any Wikipedia language edition (even with low-cost hardware). By default, the input is based on the latest version of the Wikidump and the output involves each page's Q-id and an according ranking score. The software is licensed under GPL v3 and it can be accessed at the following URL:

Jul 15 2017, 5:30 PM · Wikidata

Mar 17 2017

thalhamm added a comment to T143424: Explore the Entity Relevancy Scoring for Wikidata.

@Smalyshev, I think we might check first if the type of output is of any use for you. You can get most info (e.g. output/input format) at http://people.aifb.kit.edu/ath/#Wikidata_PageRank. It is not run on Hadoop and it takes fairly little resources (actually it can be optimized to run on a laptop with 16gb of ram). Currently, there are no optimizations in place and we use about 200GB of RAM (processing power doesn't matter). In case good use cases exist and it has been verified that the current output is of any use, as next steps I would consider the following:

  • transform the actual link datasets of Wikipedia to a processable format (similar to the output of DBpedia pagelinks)
  • develop a processing pipeline as a docker file and make all source code available under a free license
Mar 17 2017, 10:32 PM · Wikidata
thalhamm added a comment to T143424: Explore the Entity Relevancy Scoring for Wikidata.
Mar 17 2017, 10:24 PM · Wikidata

Nov 4 2016

thalhamm added a comment to T144103: Create .nt (NTriples) dumps for wikidata data.

I don't really think nt adds more value. If you produce valid turtle, there are tools such as Raptor RDF Syntax Library that easily convert between different RDF syntaxes. Everyone that really needs nt can do this fairly easily themselves, i.e.

Nov 4 2016, 11:37 AM · Patch-For-Review, User-Smalyshev, Discovery-ARCHIVED, Wikidata-Query-Service, Wikidata

Sep 19 2016

thalhamm claimed T143424: Explore the Entity Relevancy Scoring for Wikidata.

We were recently discussing a Wikipedia PageRank solution (or a combination of that ranking with other features). I could contribute these scores and get ready also to implement some integration (with some help).

Sep 19 2016, 7:32 PM · Wikidata