Segment by tags
Cleaned text is now split by certain tags. These are specified in by
the config variable WikispeechSegmentBreakingTags. By default, these
tags are h*, p, br and li. Removed ol and ul from
WikispeechRemoveTags, since lists are now recited reasonably well.
During cleaning, SegmentBreak objects are added where the specified
tags are encountered (the tags themselves are still removed). During
segmenting, when a SegmentBreak is encountered, a new segment is
created.
Renamed what "things" (CleanedText and SegmentBreak) in content are
called to "item".
Change-Id: I688f20f6e4a662efb4a74eb2e3e94996b231445f
Bug: T149091