Page MenuHomePhabricator

Use XPath to get text nodes related to utterances
Closed, ResolvedPublic1.5 Estimated Story Points

Description

The current method of getting the text nodes related to an utterance uses a path made up of indices, e.g. [1, 0, 3]. it looks like i should be possible to replaced this by XPath-expressions, with all the benefits of using a standard implementation.

So lets do that.

Event Timeline

While not necessary for T148623: Highlight recited word, I think it's better to take a look at this first.

I have an implementation of this that works, but still needs a bit of clean up and updating of tests. This will wait until T148622: Highlight recited sentence is done, to minimize extra work with merging.

A highlight of using XPath is that all the code in Cleaner that deals with extracting tags from the HTML is no longer needed.

Lokal_Profil renamed this task from Investigate whether XPath can be used to get text nodes related to utterances to Use XPath to get text nodes related to utterances.Mar 8 2017, 9:24 AM
Lokal_Profil updated the task description. (Show Details)
Lokal_Profil set the point value for this task to 7.5.
Lokal_Profil subscribed.

Note the change of scope for the task.

Worked on in Wikispeech (Sprint 2017-02-22):

  • Preliminary investigation and basic implementation

To do in Wikispeech (Sprint 2017-03-08):

  • Update tests
  • Proper implementation

While not directly related to this task, I discovered that CleanedTags are not needed currently. They were required for calculating character positions in the original HTML. Some representation of tags is likely needed for T133689: Recognise certain tags, notify user and allow interaction (was: Pling on navigation), but only for the elements that should give some kind of feedback.

Change 342024 had a related patch set uploaded (by Sebastian Berlin (WMSE)):
[mediawiki/extensions/Wikispeech] Use XPath to get text nodes for utterances.

https://gerrit.wikimedia.org/r/342024

Worked on in Wikispeech (Sprint 2017-03-08):

  • Properly implemented.
  • First review.

To do in Wikispeech (Sprint 2017-03-22):

  • Finish reviewing.

Change 342024 merged by jenkins-bot:
[mediawiki/extensions/Wikispeech@master] Use XPath to get text nodes for utterances.

https://gerrit.wikimedia.org/r/342024