Page MenuHomePhabricator

Draft planning for Manuscript component
Closed, ResolvedPublic

Description

From the application: Användare som vill kunna göra inspelningar för särskilda ändamål kan användamanusskaparen​​ för att skapa manus, d.v.s. texter för inspelning. Dessa kan t.ex. varafonetiskt balanserade för att skapa nya röster för talsyntes eller Wikipedia-texter föratt spela in hela artiklar.


Start by writing down a summary of what we know and what we think about the component's required functionality.

Event Timeline

The component for generating manuscripts for recording sentences (rather than full articles):

  • Corpus of text in a normalised format
  • First iteration: sentences rather than full text
  • Filtering of unwanted text:
    • strange words
    • too long or too short sentence
    • unwanted characters or mark-up
    • ...
  • Filtering desired text features:
    • common words
    • including words of special word list (important words, names, domains, etc)
  • Selection of (phonetically balanced) sub sets of filtered corpus
    • feature extraction of sentences (word frequencies, character n-gram frequencies
    • scoring functionality:
      • comparing sentences, to see which ones have to most new desired features, compared to the already selected features