Page MenuHomePhabricator

An external actor contributes with corrections
Open, Needs TriagePublic

Description

Corrections which are entered need to go via a well defined API, which also makes it possible for external actors to contribute corrections. Make sure that the lexicon of pronunciations and the process to accept (corrections) are accessible through an API.

This task was originally created for Wikispeech 2016.

Event Timeline

@HaraldBerthelsen, @HannaLindgren, @NikolajLindberg: Is it possible to add all the entries from one lexicon to another, creating a combination of the two? In the simplest case, only "new" (words that are present in only one of the lexicon) would be added. Is it feasible to also add new pronunciations to already existing entries?

Currently, an Entry can only belong to a single lexicon, but there can be any number of different lexica in the same database instance.

If entries of two different lexicons live in the same database instance, to move entries between lexicons is merely a matter of updating the lexiconId field of the Entry in the database. There is currently no API call to move an entry from one lexicon to another this way, but that should be easy to implement.

The moved entries will be added to the lexicon to which they are moved. In other words, different entries of the same orthographic string already present in the lexicon will not be affected. This means that if a more or less identical entry already existed, there will be duplicates. If the moved words are totally new (no homographs exist), then moving the entries should be without problem.

Another way to move entries would be to add an entry from the first lexicon to the second, and then delete it from the first one. (This way, they don't have to be in the same database instance.)

Merging entries from different lexica is a different story, though. Generally, I believe that this has to be done manually, but there may be cases for which we can add code to do batch updates.

If the entries of the lexicon to be added are originally copied from the base lexicon and have subsequently been updated, there could be a mechanism for updating an entry in the base lexicon with a new one. (The problem is how to know what entry to replace. We would need to keep a reference to the original entry when copying it to another lexicon.)

Maybe we could sketch a few scenarios of when and how one would like to be able to move new entries to an existing lexicon and when and how one would like to update or replace entries in batch?

(There is an 'update' call to the lexicon API, to update an existing entry.)

There is now an HTTP call to the lexicon server for moving entries from one lexicon to another (in the same db instance*):

lexicon/move_new_entries?from_lexicon=A&to_lexicon=B&source=SRC&status=STAT

For each entry in lexicon A with an orthography (lex.Entry.Strn) that is not found in lexicon B,
this call will moved all entries of the lexicon A to the lexicon B. The moved entries will get a status called STAT with the source value set to SRC.

Consider it a temporary test thing: you probably want a more elaborate solution.

*So far, we only deal with a single db instance, where all lexical entries live. However, in the future, you most likely would like to separate lexica into different databases (Sqlite db files, to begin with).

Sebastian_Berlin-WMSE renamed this task from [Task] An external actor contributes with corrections (Wikispeech) to An external actor contributes with corrections.Nov 4 2019, 10:44 AM
Sebastian_Berlin-WMSE updated the task description. (Show Details)