If an utterance contains unicode characters, the highlighting ends beyond the utterance. This is likely caused by how Segmenter calculates endOffset.
Description
Details
Subject | Repo | Branch | Lines +/- | |
---|---|---|---|---|
Calculate correct offsets for unicode characters | mediawiki/extensions/Wikispeech | master | +302 -97 |
Related Objects
- Mentioned In
- rEWISeab37b863c9c: Calculate correct offsets for unicode characters
rEWIS108d5b654f18: Calculate correct offsets for unicode characters
rEWIS9eccc4040f3d: Calculate correct offsets for unicode characters
rEWIS62100113fc9b: Calculate correct offsets for unicode characters
rEWISda248ddd4586: Calculate correct offsets for unicode characters
T159811: Overlapping highlighting
T159809: Highlighting fails on bold/italicized utterance
T159671: IndexSizeError triggered and cascades into failing buttons - Mentioned Here
- rEWIS64cbd9648dc6: Merge "Calculate correct offsets for unicode characters"
T159810: Player button fails to return to stop upon completed recitation of page
T158954: Use XPath to get text nodes related to utterances
T159669: Empty <text> elements created
T159671: IndexSizeError triggered and cascades into failing buttons
T159809: Highlighting fails on bold/italicized utterance
T159811: Overlapping highlighting
Event Timeline
I found that many (all?) string functions have a mulitbyte version. Switching to these will hopefully be enough.
This is implemented in a local branch. Review will wait until T158954: Use XPath to get text nodes related to utterances is done. In solving this, a fair bit of the segmenting was rewritten, which also solves the following tasks:
and the following may well be solved (and should be rechecked) when both this and T158954 are done:
Worked on in Wikispeech (Sprint 2017-03-08):
- Local implementation.
To do in Wikispeech (Sprint 2017-03-22):
- Upload patch to gerrit.
- Review.
Change 344616 had a related patch set uploaded (by Lokal Profil; owner: Sebastian Berlin (WMSE)):
[mediawiki/extensions/Wikispeech@master] Calculate correct offsets for unicode characters
Change 344616 had a related patch set uploaded (by Sebastian Berlin (WMSE)):
[mediawiki/extensions/Wikispeech@master] Calculate correct offsets for unicode characters
Change 344616 merged by jenkins-bot:
[mediawiki/extensions/Wikispeech@master] Calculate correct offsets for unicode characters
Mentioned in SAL (#wikimedia-labs) [2017-03-30T10:15:39Z] <Sebastian-WMSE> Deploy latest from Git master: 64cbd96 (T159545, T159811, T159809)