Create an API action that, given a page title, returns the segmented text of that page.
While Wikispeech only needs the segmented text, if it's not to much work, also add the option to get only cleaned text. This could be useful for other tools. For the same reason, the action should take parameters, such as what HTML tags to remove, rather than reading these directly from extension.json.
Functions
- Get segmented text
- Segment by tags, takes list of tags that should be used for separating segments. Requires T149091: Segment by tags.
- Remove tags, takes list of tags that should be removed entirely (WikispeechRemoveTags in the config)
- How to get title? Should be done in T161097: Recitation should include the article title
- Get cleaned text [not required for Wikispeech]
- Remove tags (same as above)