Page MenuHomePhabricator

[Epic] querying for lexicographical data
Closed, ResolvedPublic


We want to be able to query lexicographical data on to find out all the things.

Open questions:

  • Do we add Lexemes to the main Wikidata dump or make a separate dump (or both)?
    • We may need separate dump at least for initial data load.

Open TODOs (that are not in subtasks)

Event Timeline

Restricted Application added projects: Wikidata, Discovery. · View Herald TranscriptMay 2 2018, 2:40 PM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript
Lydia_Pintscher triaged this task as High priority.May 27 2018, 3:27 PM
Smalyshev updated the task description. (Show Details)Aug 2 2018, 6:31 PM
Yurik added a subscriber: Yurik.Aug 24 2018, 2:32 PM

I think lex data dumps should be available independently of the other Wikidata data. For example, shows all Russian noun declensions (30,000+), and I think such sites can greatly benefit from the community work.
P.S. I have began a discussion with the site authors, trying to get them to donate their database to Wikidata.

@Yurik Yes, we're going in that direction, also because having items+Lexemes in one dump would be waaay too big :)

Thanks for your work! If you need any support in your discussion with this organization, feel free to contact my colleague @johl ( who's expert in partnerships and data donations.

Right now full lexeme dump is just 2.1M compressed, so adding it to main dump would not be a big deal for dump size. However, absent the separate dump, you'd have to always download the huge one, of course. Which makes me still support the separate dump route.

Smalyshev updated the task description. (Show Details)Oct 15 2018, 6:02 PM
Lydia_Pintscher closed this task as Resolved.Nov 2 2018, 3:08 PM