⚓ T245821 Move transcription converter from Pronlex to a separate API

		Status	Subtype	Assigned	Task
		Resolved		Sebastian_Berlin-WMSE	T243811 Investigate porting Pronlex to MediaWiki
		Resolved		HannaLindgren	T245821 Move transcription converter from Pronlex to a separate API

HannaLindgren created this task.Feb 21 2020, 12:11 PM

HannaLindgren moved this task from Backlog to In progress on the Wikispeech-Jobrunner (Sprint) board.Feb 21 2020, 12:39 PM

HannaLindgren moved this task from Unsorted to Lexicon on the Wikispeech-Text-to-Speech board.

Definitions
SymbolSet - a symbol set definition for a given language - https://github.com/stts-se/pronlex/tree/master/symbolset
Mapper - map transcriptions between different symbol sets for a given language - https://github.com/stts-se/pronlex/tree/master/symbolset/mapper
Converter - map transcriptions between different languages - https://github.com/stts-se/pronlex/tree/master/symbolset/converter
Validation - component for validating transcriptions partly based on a pre-defined symbolset - https://github.com/stts-se/pronlex/tree/master/validation

When we say "transcription converter", we mean the mapper and the converter. They are both heavily dependent on the symbol set package, so that package will probably go with them. Or? [TODO: decide]

In order to move the transcription converting calls from the pronlex API to a separate API, what do we do with the validation package? It is dependent on the symbol set (but no really the mapper or the converter). Can it stay in the pronlex repository? [TODO: decide]

If we keep the validation package in pronlex, and the command line tools for importing/converting/validation lexicon files and entries, the pronlex repository will have dependencies on the new package. Is this OK? [TODO: decide]

The symbolset has a dependency on lex.Entry but I think that could easily be removed if we put that logic somewhere else, and have the symbolset handle only single transcription strings (instead of full entries).

In the pronlex repository, the use of code related to the symbolset/mapper/converter packages are primarily used in the following components.

Component used	Location/Package	Comment
SymbolSet	converter/
SymbolSet	mapper/
SymbolSet	validation/
SymbolSet	dbapi/dbapi_test	used to prepare lexicon import (lexicon import is part of the dbapi)
SymbolSet	dbapi/validation_test	used to prepare validation tests (validation is part of the dbapi)
SymbolSet	lexserver API	API calls to mapper, symbolset, validation
Mapper	lexserver API	API calls to mapper
Converter	lexserver API	API calls to converter
Mapper	cmd/lexio/convert/	converting transcriptions (and file format) from an external lexicon file to the Wikispeech default format

A complete list of usages:

Use of package symbolset:

cmd/lexio/importLex/
cmd/test_validator/
cmd/validate_lex_file/
dbapi/dbapi_test.go
dbapi/validation_test.go
lexserver/lexserver.go
lexserver/mapper.go
lexserver/symbolset.go
lexserver/validation.go
symbolset/converter/
symbolset/mapper/
validation/

Use of package symbolset/mapper:

cmd/lexio/convert/CMU2WS/
cmd/lexio/convert/csCzPhword2WS/
cmd/lexio/convert/nbNoNST2WS/
cmd/lexio/convert/svSeNST2WS/
lexserver/mapper.go

Use of package symbolset/converter:

lexserver/converter.go

Use of the mapper/converter service in the current version of wikispeech_mockup:

API URL	Called by component	Comment
mapper/maptable	mapper_client.py	Used in initialization tests
mapper/map	mapper_client.py	Used for mapping between phonetic symbol sets
mapper/map	marytts_adapter.py	Used for mapping between phonetic symbol sets -- TODO: what differs from the mapper_client call?

wikispeech_server/adapters/mapper_client.py:16:         self.base_url = "%s/mapper" % config.config.get("Services", "lexicon")
wikispeech_server/adapters/mapper_client.py:22:         url = "%s/%s/%s/%s" % (self.base_url, "maptable", self.from_symbol_set, self.to_symbol_set)
wikispeech_server/adapters/mapper_client.py:40:         url = "%s/%s/%s/%s/%s" % (self.base_url, "map", self.from_symbol_set, self.to_symbol_set, string)
wikispeech_server/adapters/marytts_adapter.py:21:       mapper_url = config.config.get("Services", "lexicon")
wikispeech_server/adapters/marytts_adapter.py:610:      url = mapper_url+"/mapper/map/%s/%s/%s" % (from_symbol_set, to_symbol_set, quote(trans))
wikispeech_server/adapters/marytts_adapter.py:649:      url = mapper_url+"/mapper/map/%s/%s/%s" % (from_symbol_set, to_symbol_set, quote(trans))

A subtask automatically inherits all project tags and subscribers. Just cleaning these up a bit.

HannaLindgren updated the task description. (Show Details)Feb 25 2020, 2:21 PM

HannaLindgren updated the task description. (Show Details)

HannaLindgren mentioned this in T245819: Define required core parts of the Pronlex API.

HannaLindgren updated the task description. (Show Details)Feb 26 2020, 11:22 AM

HannaLindgren updated the task description. (Show Details)Mar 2 2020, 3:14 PM

HannaLindgren updated the task description. (Show Details)

HannaLindgren updated the task description. (Show Details)Mar 2 2020, 3:16 PM

HannaLindgren updated the task description. (Show Details)

HannaLindgren updated the task description. (Show Details)Mar 2 2020, 3:21 PM

HannaLindgren updated the task description. (Show Details)Mar 3 2020, 9:41 AM

HannaLindgren updated the task description. (Show Details)Mar 3 2020, 10:44 AM

HannaLindgren updated the task description. (Show Details)Mar 3 2020, 11:16 AM

HannaLindgren updated the task description. (Show Details)

HannaLindgren updated the task description. (Show Details)Mar 3 2020, 11:25 AM

HannaLindgren updated the task description. (Show Details)Mar 3 2020, 11:29 AM

HannaLindgren updated the task description. (Show Details)

HannaLindgren updated the task description. (Show Details)Mar 3 2020, 12:44 PM

HannaLindgren updated the task description. (Show Details)Mar 3 2020, 1:16 PM

HannaLindgren moved this task from In progress to Done on the Wikispeech-Jobrunner (Sprint) board.

HannaLindgren closed this task as Resolved.Mar 24 2020, 11:44 AM

@Sebastian_Berlin-WMSE @kalle @Lokal_Profil I found a note saying that we should inform you guys when this task is ready for testing from your side, so here we go: The updates have been pushed to master, and can be tested now (and docs updated).

The new repo: https://github.com/stts-se/symbolset

Our Wikispeech summary page: http://stts-se.github.io/wikispeech/

Move transcription converter from Pronlex to a separate API
Closed, ResolvedPublic
Actions

Description

Related Objects
Search...

Event Timeline

Move transcription converter from Pronlex to a separate APIClosed, ResolvedPublicActions

Description

Related ObjectsSearch...

Event Timeline

Move transcription converter from Pronlex to a separate API
Closed, ResolvedPublic
Actions

Related Objects
Search...