Page MenuHomePhabricator

Segment by tags
Closed, ResolvedPublic2 Estimated Story Points

Description

Text should be segmented by some tags, even if they don't end in full stop. Currently these should include:

  • Headers (<h1>, <h2> ...)
  • <p>
  • <br />

Later, when/if they're not removed, they should include:

  • <li>
  • <td>/<th>

This should be configurable, like removed tags.

Event Timeline

Sebastian_Berlin-WMSE updated the task description. (Show Details)
Sebastian_Berlin-WMSE set the point value for this task to 2.

Segmenting on tags should be followed by the ordinary segmenter.

So:

<h1> Heading. The Story. </h1>
Text.

should give:
[ Heading., The Story., Text.]

Removing from sprint due to prioritisation in T151786

Lokal_Profil changed the point value for this task from 2 to 8.Feb 7 2017, 3:01 PM

Changed story points to new system

This is now (much more) noticeable with sentence highlighting.

When T158954: Use XPath to get text nodes related to utterances is done, this should be doable by adding "segment break" objects during cleaning and cutting segments by these.

Sebastian_Berlin-WMSE changed the task status from Open to Stalled.May 4 2017, 9:23 AM

Implemented locally. Will wait for patches under review to be done before rebasing and review.

Lokal_Profil changed the point value for this task from 8 to 4.

Worked on in Wikispeech (Sprint 2017-04-25):

  • Implemented locally
  • Prepared for review

To do in Wikispeech (Sprint 2017-05-10):

  • Rebase and push patch
  • Review

Change 356827 had a related patch set uploaded (by Sebastian Berlin (WMSE); owner: Sebastian Berlin (WMSE)):
[mediawiki/extensions/Wikispeech@master] Segment by tags

https://gerrit.wikimedia.org/r/356827

Sebastian_Berlin-WMSE changed the task status from Stalled to Open.Jun 2 2017, 1:05 PM

Decided to upload this even though T148623: Highlight recited word is still under review, since there is minimal overlap in the changed code. The rebasing should be relatively painless.

Change 356827 had a related patch set uploaded (by Sebastian Berlin (WMSE); owner: Sebastian Berlin (WMSE)):
[mediawiki/extensions/Wikispeech@master] Segment by tags

https://gerrit.wikimedia.org/r/356827

Change 356827 had a related patch set uploaded (by Sebastian Berlin (WMSE); owner: Sebastian Berlin (WMSE)):
[mediawiki/extensions/Wikispeech@master] Segment by tags

https://gerrit.wikimedia.org/r/356827

Lokal_Profil changed the point value for this task from 3 to 2.

Worked on in Wikispeech (Sprint 2017-06-21):

  • First review cycle complete

To do in Wikispeech (Sprint 2017-07-05):

  • Implement pre-sprint discussion comments.
  • Review

Change 356827 merged by jenkins-bot:
[mediawiki/extensions/Wikispeech@master] Segment by tags

https://gerrit.wikimedia.org/r/356827

Mentioned in SAL (#wikimedia-cloud) [2017-07-06T11:30:48Z] <Sebastian-WMSE> Deploy latest from Git master: c766368, 5abcb1f, 63af3be (T149091)