User request
I would like LinguaLibre to allow the production of audiobooks using texts from Wikisource or even other projects and then add the recordings at the correct places.
Avenue
- Step 4 > tinker « External tool » list loader to accept mediawiki urls
- Convert url into API query returning raw text
- Inject raw text as content to be recorded
- T370618 Being in poems recording states, will consider the whole as a single text to record.
Exploration
Wikipedia has an easy path to extract the plain text, it require a light automatic formating (or utf-8).
Wikisource makes heavy usages of cascade template inclusion, so the target page plain text is often just templates for the sub-pages. This makes it difficult to integrate, while it's the main request.
- https://fr.wikisource.org/wiki/Élégies,_Marie_et_romances/Romances
- https://fr.wikisource.org/w/api.php?action=query&prop=extracts&explaintext&format=json&utf8=1&origin=*&titles=Élégies,_Marie_et_romances/Romances
- https://github.com/wikimedia/ws-export
- https://ws-export.wmcloud.org/?lang=fr&page=Élégies,_Marie_et_romances/Romances&format=txt&fonts=&credits=false&images=false (limited to one query / minute)
ws-export is active, can exchange with them.
Observation
Wikisource : not straight forward. Cost of development is important.
Wikipedia : not straight forward, usage of micro-templates is so ubiquitous that extracted text is likely full of gaps only noticed when reading, and preventing quality recording.
I dont recommend attacking this issue. - Yug
Solution
@Poslovitch's Dicotheque sucessfully pull text from wikisource by fetching html. The dicotheque code can be reused.