Page MenuHomePhabricator

Investigate porting Pronlex to MediaWiki
Closed, ResolvedPublic

Description

This is a result of T233287: WMSE site visit at WMDE for exchange/collaboration around Wikispeech

Backgrund

One of the two stateless features of the Wikispeech backend (Speechoid) is the lexicon of pronunciations. This lexicon is part of the Pronlex component which deals with the management of the lexicon as well as its usage during speech synthesis.

Porting Pronlex (or the required bits thereof) to the MediaWiki extension part of Wikispeech has been suggested as a solution to this.

This has the additional benefit of being a good preparatory step for adding the option of using Wikidata (or any Wikibase instance) instead of the lexicon [out of scope for this task]. It would also make use of mechanisms for handling databases built into MediaWiki.

Since pronlex is the only component written in Go this would also have the benefit of reducing the number of languages used by Wikispeech.

An identified downside of the porting is that Pronlex is today usable even without MediaWiki. Porting it would essentially result in forking the project likely resulting in both having to be maintained in parallel.

If the porting does not take place Pronlex would still have to be updated to be brought in line with Wikimedia/MediaWiki expectations e.g. going from Sqlite3 to Mysql and setting up a mechanism where database reads can be done from a slave server [out of scope for this task].

To investigate

What has to be clarified in order to make a decision on porting is

  1. exactly what jobs are done by Pronlex as part of the speech rendering cycle,
  2. an estimate of the time/effort required for porting the required parts of Pronlex,
  3. an estimate of the time/effort required for updating wikispeech_mockup to expect lexicon handling to have been done prior to being called.

Notes and mockups from the internal meetings can be found here.

Event Timeline

@Sebastian_Berlin-WMSE can you give the description a once over to ensure I captured the main reasoning as well as the most promenent questions?

I added one potential benefit (using MediaWiki's database mechanisms), but otherwise it looks good.

Conclusion from 20-02-21 meeting (in Swedish):
STTS definierar vilka delar av Pronlex API som är obligatoriskt
WMSE Stämmer av med WMDE att implementationen är “rimlig”
STTS Ska se över om delar av Logic A/B ska flyttas över in i Pronlex för att abstrahera vad Wikispeech mockup förväntar sig.
WMSE Implementerar de obligatoriska delarna i MW
WMSE Implementerar maintenance-script för att initialt skapa och befolka databasen

Left to do in this task.

Create an umbrella task for the re-architecturing, add the missing WMSE tasks:

WMSE Implementerar de obligatoriska delarna i MW
WMSE Implementerar maintenance-script för att initialt skapa och befolka databasen//

Document the decision in the Project Log (T246086) and consider an ADR (needs task)

I got a negative initial reply when I checked the viability of this on the WMF side. Will update here when I have more info.

I got a negative initital reply when I checked the viabilirty of this on the WMF side. Will update here when I have more info.

Leaving this open until the next Pronlex meeting with STTS

Awaiting clarifications from @Addshore

Just so I don't miss this, where am I clarifying? :)

Awaiting clarifications from @Addshore

Just so I don't miss this, where am I clarifying? :)

No worries. This was my follow-up e-mail about whether the resulting dbs would live togehter with other SQL dbs and be managed by the DBAs