HomePhabricator

Segment by tags
143d7fc6e0b3Unpublished

Unpublished Commit · Learn More

Not On Permanent Ref: This commit is not an ancestor of any permanent ref.
This commit no longer exists in the repository. It may have been part of a branch which was deleted.This commit has been deleted in the repository: it is no longer reachable from any branch, tag, or ref.

Description

Segment by tags

Cleaned text is now split by certain tags. These are specified in by
the config variable WikispeechSegmentBreakingTags. By default, these
tags are h*, p, br and li. Removed ol and ul from
WikispeechRemoveTags, since lists are now recited reasonably well.

During cleaning, SegmentBreak objects are added where the specified
tags are encountered (the tags themselves are still removed). During
segmenting, when a SegmentBreak is encountered, a new segment is
created.

Renamed what "things" (CleanedText and SegmentBreak) in content are
called to "item".

Change-Id: I688f20f6e4a662efb4a74eb2e3e94996b231445f
Bug: T149091

Details

Provenance
Sebastian_Berlin-WMSEAuthored on May 3 2017, 12:50 PM
ChangeId
I688f20f6e4a662efb4a74eb2e3e94996b231445f

Commit No Longer Exists

This commit no longer exists in the repository.