Page MenuHomePhabricator

Create API action for getting segmented text
Closed, ResolvedPublic6 Estimated Story Points

Description

Create an API action that, given a page title, returns the segmented text of that page.

While Wikispeech only needs the segmented text, if it's not to much work, also add the option to get only cleaned text. This could be useful for other tools. For the same reason, the action should take parameters, such as what HTML tags to remove, rather than reading these directly from extension.json.

Functions

  • Get segmented text
  • Get cleaned text [not required for Wikispeech]
    • Remove tags (same as above)

Event Timeline

Lokal_Profil set the point value for this task to 16.
Sebastian_Berlin-WMSE renamed this task from Create API action for getting cleaned and segmented text to Create API action for getting segmented text.May 24 2017, 7:24 AM
Sebastian_Berlin-WMSE updated the task description. (Show Details)
Sebastian_Berlin-WMSE changed the point value for this task from 16 to 6.May 24 2017, 7:47 AM

Worked on in Wikispeech (Sprint 2017-05-10):

  • Implemented API, runs locally.

To do in Wikispeech (Sprint 2017-05-24):

  • Clean up code.
  • Start review.
Sebastian_Berlin-WMSE updated the task description. (Show Details)

Worked on in Wikispeech (Sprint 2017-05-24):

  • Cleaned up code.

To do in Wikispeech (Sprint 2017-06-07):

Change 358378 had a related patch set uploaded (by Sebastian Berlin (WMSE); owner: Sebastian Berlin (WMSE)):
[mediawiki/extensions/Wikispeech@master] Add API for segmenting text

https://gerrit.wikimedia.org/r/358378

Sebastian_Berlin-WMSE changed the point value for this task from 6 to 4.

Worked on in Wikispeech (Sprint 2017-06-07):

  • Finished implementation.
  • Uploaded patch.

To do in Wikispeech (Sprint 2017-06-21):

  • Review.
  • Investigate if we need to disable translatewiki.net harvesting i18n.
Lokal_Profil changed the point value for this task from 4 to 6.

Worked on in Wikispeech (Sprint 2017-06-21):

  • First Review

To do in Wikispeech (Sprint 2017-07-05):

  • Implement review
    • Investigate APITestCase
    • Investigate "per value" i18n strings
  • Investigate if we need to disable translatewiki.net harvesting i18n.

We already have i18n files which are not showing up in translatewiki.bet so it should be ok to just go on and then ask to have it added once we are ready.

Change 358378 merged by jenkins-bot:
[mediawiki/extensions/Wikispeech@master] Add API for segmenting text

https://gerrit.wikimedia.org/r/358378