Page MenuHomePhabricator

Use API action to preprocess text
Closed, ResolvedPublic8 Estimated Story Points

Description

Instead of preprocessing text (running cleaner and segmenter) in onParserAfterTidy(), this should be done by the API action (which will be created in T164250). The response from the API (probably a JSON object) will be stored and used instead of the HTML elements (<utterances> and children) in the current implementation.

Since the text preprocessing will be run after the page is loaded, it should be possible to not process the whole page in one go. Instead, maybe process one paragraph at a time.

Related Objects

Event Timeline

Lokal_Profil set the point value for this task to 20.

In the first implementation have (js) trigger pre-processing on page load. As a follow-up task make this behaviour configurable (pre-process on play, user setting, url parameter etc.)

While the API should accept parameters for cleaning and segmenting, these should still be kept in the config, both to be read by JS to make the request and for T164252: Keep functionality for preprocessing without javascript.

Change 358378 had a related patch set uploaded (by Sebastian Berlin (WMSE); owner: Sebastian Berlin (WMSE)):
[mediawiki/extensions/Wikispeech@master] Add API for segmenting text

https://gerrit.wikimedia.org/r/358378

Sebastian_Berlin-WMSE changed the point value for this task from 20 to 12.Jun 20 2017, 1:02 PM

Worked on in Wikispeech (Sprint 2017-06-07):

  • Most of requesting segments, playing/stopping and navigation reimplemented.
  • Started on rewriting tests.

To do in Wikispeech (Sprint 2017-06-21):

  • Rewrite remaining tests.
  • Rebase, comment and clean up code.
Lokal_Profil changed the point value for this task from 12 to 8.Jul 4 2017, 9:17 AM

Worked on in Wikispeech (Sprint 2017-06-21):

  • Tests largely rewritten
  • Re-implemented some logic due to differences in structure to HTML.

To do in Wikispeech (Sprint 2017-07-05):

  • Rebase, comment and clean up code.

This is now implemented locally and seem to run fine with the new API. "Just" need to clean up the code now.

Not to self from review for previous patch.

  • Check what the default removeTags and segmentBreakingTags are when accessing the API directly. Specifically do they default to the config settings?
  • Check that section [edit] links are handled correctly

Change 365563 had a related patch set uploaded (by Sebastian Berlin (WMSE); owner: Sebastian Berlin (WMSE)):
[mediawiki/extensions/Wikispeech@master] Use API to retrieve utterances

https://gerrit.wikimedia.org/r/365563

Change 365563 merged by jenkins-bot:
[mediawiki/extensions/Wikispeech@master] Use API to retrieve utterances

https://gerrit.wikimedia.org/r/365563