Page MenuHomePhabricator

Map TTS response to page HTML
Closed, ResolvedPublic1.5 Estimated Story Points

Description

In order to highlight the text being recited (T122158), skip by token (T140089, T133687) etc., the time stamps token information returned from the TTS must be mapped to the HTML on the page.

This HTML is passed to the TTS via the Cleaner hence the need for a mapping.

Expected result:
Map the tokens in the TTS response to the words in the HTML.

Ideas:

  1. Add markup for elements removed by the Cleaner and make TTS ignore these for audio generation but keep them in the response.
  2. Make Cleaner add marker to the page HTML for any skipped elements. These can then be ignored when doing sequential mapping of tokens.

Related Objects

Event Timeline

Worked on in: 2016-08-10:

  • Initial investigation

To do in: Sprint 2016-08-24:

  • Remainder

There is problem when a token in the response from the TTS-server doesn't match the request, e.g. the input "1965" gives the token ["nineteen sixty five", 1.43] (there are also a bunch of newlines, but I'm assuming they aren't supposed to be there). Wikispeech-STTS: Would it be possible to also return the string that is the input for a token, i.e. ["nineteen sixty five", 1.43, "1965"]?

Yes.
https://morf.se/wikispeech/?lang=en&input=1965
now returns:

{
   audio: "http://morf.se//wikispeech_mockup/tmp/tmpciiev6_n.opus",
   tokens: [
      {
          endtime: 1.43,
          expanded: "nineteen sixty five",
          orth: "1965"
      },
     {
         endtime: 1.645,
         orth: ""
     }
   ]
}

So the player can use "orth" or "expanded", if it exists, depending on the use.

Worked on in Wikispeech (Sprint 2016-08-24):

  • Initial mapping almost complete

To do in Wikispeech (Sprint 2016-09-07):

  • Ensure coverage of identified cases
  • First commit

Story points kept at 5 to reflect the remaining work.

Lokal_Profil changed the point value for this task from 5 to 3.5.Sep 21 2016, 8:38 AM

Worked on in Wikispeech (Sprint 2016-09-07):

  • Pre-exiting tests rebased and running

To do in Wikispeech (Sprint 2016-09-21):

  • Code cleanup
  • One identified case still needs coverage
Lokal_Profil changed the point value for this task from 3.5 to 1.5.Oct 4 2016, 12:01 PM

Worked on in Wikispeech (Sprint 2016-09-21):

  • Re-factoring and code cleanup
  • Last case handled

To do in Wikispeech (Sprint 2016-10-05):

  • Identify meaningful name for last variables
  • 1st Review

Leaving 1.5 points as the size of the patch likely mean a longer review cycle.

Change 314237 had a related patch set uploaded (by Sebastian Berlin (WMSE)):
Map tokens from TTS responses to HTML

https://gerrit.wikimedia.org/r/314237

For this task, only the position of HTML substrings are stored in the mapping. Time stamps are introduced in T140089.

Sebastian_Berlin-WMSE changed the point value for this task from 1.5 to 1.

Worked on in Wikispeech (Sprint 2016-10-05):

  • Clean up of code.
  • Committed for review.

To do in Wikispeech (Sprint 2016-10-19):

  • Review.
Lokal_Profil changed the point value for this task from 1 to 2.5.

Worked on in Wikispeech (Sprint 2016-10-19):

  • Reviewed

To do in Wikispeech (Sprint 2016-10-19):

  • Determine follow-up tasks (separate from the current patch)

New estimate of points reflect the various issues encountered during review but may come to change as follow-up tasks crystallize.

Worked on in Wikispeech (Sprint 2016-10-19):

  • Reviewed
  • Implementation based on review started.

To do in Wikispeech (Sprint 2016-11-02):

  • Finish implementation based on review.
  • Add tasks for any remaining issues that isn't naturally covered by the patch for this task.
Sebastian_Berlin-WMSE changed the point value for this task from 2.5 to 1.5.

Worked on in Wikispeech (Sprint 2016-11-02):

  • Implementation based on first review.
  • Second review.

To do in Wikispeech (Sprint 2016-11-16):

  • Finish review.

Change 314237 merged by jenkins-bot:
Map tokens from TTS responses to HTML

https://gerrit.wikimedia.org/r/314237