Unicode characters increase length of highlighting
Closed, ResolvedPublic2.5 Estimated Story Points
Actions

Assigned To

Authored By

	Sebastian_Berlin-WMSE
	Mar 3 2017, 4:14 PM

Description

If an utterance contains unicode characters, the highlighting ends beyond the utterance. This is likely caused by how Segmenter calculates endOffset.

Details

	Subject	Repo	Branch	Lines +/-
	Calculate correct offsets for unicode characters	mediawiki/extensions/Wikispeech	master	+302 -97

Customize query in gerrit

Related Objects

Mentioned In: rEWISeab37b863c9c: Calculate correct offsets for unicode characters
rEWIS108d5b654f18: Calculate correct offsets for unicode characters
rEWIS9eccc4040f3d: Calculate correct offsets for unicode characters
rEWIS62100113fc9b: Calculate correct offsets for unicode characters
rEWISda248ddd4586: Calculate correct offsets for unicode characters
T159811: Overlapping highlighting
T159809: Highlighting fails on bold/italicized utterance
T159671: IndexSizeError triggered and cascades into failing buttons
Mentioned Here: rEWIS64cbd9648dc6: Merge "Calculate correct offsets for unicode characters"
T159810: Player button fails to return to stop upon completed recitation of page
T158954: Use XPath to get text nodes related to utterances
T159669: Empty <text> elements created
T159671: IndexSizeError triggered and cascades into failing buttons
T159809: Highlighting fails on bold/italicized utterance
T159811: Overlapping highlighting

Event Timeline

Sebastian_Berlin-WMSE created this task.Mar 3 2017, 4:14 PM

Sebastian_Berlin-WMSE moved this task from Backlog to This Week on the User-Sebastian_Berlin-WMSE board.

Lokal_Profil mentioned this in T159671: IndexSizeError triggered and cascades into failing buttons.Mar 6 2017, 9:14 AM

Sebastian_Berlin-WMSE moved this task from Incoming to Proposed for next sprint on the Wikispeech board.Mar 8 2017, 9:04 AM

Lokal_Profil edited projects, added Wikispeech (Sprint 2017-03-08); removed Wikispeech.Mar 8 2017, 10:07 AM

Lokal_Profil set the point value for this task to 4.5.

Sebastian_Berlin-WMSE renamed this task from Non ASCII characters increase length of highlighting to Unicode characters increase length of highlighting.Mar 10 2017, 7:22 AM

Sebastian_Berlin-WMSE updated the task description. (Show Details)

I found that many (all?) string functions have a mulitbyte version. Switching to these will hopefully be enough.

Sebastian_Berlin-WMSE moved this task from Backlog to In progress on the Wikispeech (Sprint 2017-03-08) board.Mar 13 2017, 9:52 AM

This is implemented in a local branch. Review will wait until T158954: Use XPath to get text nodes related to utterances is done. In solving this, a fair bit of the segmenting was rewritten, which also solves the following tasks:

and the following may well be solved (and should be rechecked) when both this and T158954 are done:

Worked on in Wikispeech (Sprint 2017-03-08):

Local implementation.

To do in Wikispeech (Sprint 2017-03-22):

Upload patch to gerrit.
Review.

Sebastian_Berlin-WMSE moved this task from Backlog to In progress on the Wikispeech (Sprint 2017-03-22) board.Mar 22 2017, 9:47 AM

Sebastian_Berlin-WMSE mentioned this in T159809: Highlighting fails on bold/italicized utterance.Mar 22 2017, 11:10 AM

Lokal_Profil mentioned this in T159811: Overlapping highlighting.Mar 22 2017, 11:10 AM

Sebastian_Berlin-WMSE mentioned this in rEWISda248ddd4586: Calculate correct offsets for unicode characters.Mar 24 2017, 2:16 PM

Lokal_Profil mentioned this in rEWIS62100113fc9b: Calculate correct offsets for unicode characters.Mar 24 2017, 4:28 PM

Change 344616 had a related patch set uploaded (by Lokal Profil; owner: Sebastian Berlin (WMSE)):
[mediawiki/extensions/Wikispeech@master] Calculate correct offsets for unicode characters

https://gerrit.wikimedia.org/r/344616

gerritbot added a project: Patch-For-Review.Mar 24 2017, 4:29 PM

Sebastian_Berlin-WMSE mentioned this in rEWIS9eccc4040f3d: Calculate correct offsets for unicode characters.Mar 24 2017, 10:12 PM

Change 344616 had a related patch set uploaded (by Sebastian Berlin (WMSE)):
[mediawiki/extensions/Wikispeech@master] Calculate correct offsets for unicode characters

https://gerrit.wikimedia.org/r/344616

Sebastian_Berlin-WMSE mentioned this in rEWIS108d5b654f18: Calculate correct offsets for unicode characters.Mar 24 2017, 10:13 PM

Sebastian_Berlin-WMSE mentioned this in rEWISeab37b863c9c: Calculate correct offsets for unicode characters.Mar 30 2017, 7:43 AM

Change 344616 merged by jenkins-bot:
[mediawiki/extensions/Wikispeech@master] Calculate correct offsets for unicode characters

https://gerrit.wikimedia.org/r/344616

Mentioned in SAL (#wikimedia-labs) [2017-03-30T10:15:39Z] <Sebastian-WMSE> Deploy latest from Git master: 64cbd96 (T159545, T159811, T159809)

Sebastian_Berlin-WMSE moved this task from This Week to Done on the User-Sebastian_Berlin-WMSE board.Mar 30 2017, 10:16 AM

Sebastian_Berlin-WMSE moved this task from In progress to Done on the Wikispeech (Sprint 2017-03-22) board.Mar 30 2017, 12:10 PM

Sebastian_Berlin-WMSE closed this task as Resolved.Apr 5 2017, 8:39 AM

Sebastian_Berlin-WMSE claimed this task.

Unicode characters increase length of highlightingClosed, ResolvedPublic2.5 Estimated Story PointsActions

Description

Details

Related Objects

Event Timeline

Unicode characters increase length of highlighting
Closed, ResolvedPublic2.5 Estimated Story Points
Actions