Page MenuHomePhabricator

Lexical Data user scenario: Make it possible to export lexical data as Apertium dictionaries
Open, Needs TriagePublic


Apertium is a Free Software system for rule-based machine translation. It is primarily aimed at pairs of languages. For every language pair it includes a monolingual dictionary for each language, a bilingual dictionary that maps word correspondences between languages, and lists of grammatical rules for transforming one language into another. It stores such information in files of its own format. Modifying these files at the moment requires skills with programming, source code control systems, compilation, etc.

It sounds like Lexical Wikidata stores a lot of information that can be used almost directly in Apertium, and editing this information will be as easy as editing Wikidata. This is probably easier that editing source code files, so it would be possible to use Lexical Wikidata to empower the crowdsourcing of building dictionaries for Apertium.

(Apertium is one particular open source machine translation system that I know, but it can also be used by any other comparable system.)

Event Timeline

Amire80 created this task.Feb 3 2018, 3:48 PM
Restricted Application added a project: Wikidata. · View Herald TranscriptFeb 3 2018, 3:48 PM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript

I talked to Mikel Forcada from the Apertium project about this at the Prehackathon in Olot.

The summary is that this project appears to be feasible, but will require storing all the information required for Apertium about each lexical entity in Lexical Wikidata. This is information such as part of speech (noun, verb, pronoun, etc.), gender, declination group / paradigm, etc. It is probably planned to be in Lexical Wikidata anyway, but I asked Mikel to write here what tags are needed. The set of required lexical entity properties will also probably be different for each language.

In theory, importing Apertium data into Lexeme is also possible, although there may be a licensing issue for importing in this direction (GPL vs CC0).

Lydia_Pintscher moved this task from incoming to monitoring on the Wikidata board.Jan 4 2019, 10:30 AM