Segment by tags
Cleaned text is now split by certain tags. These are specified in by
the config variable WikispeechSegmentBreakingTags. By default, these
tags are h*, p, br and li. Removed ol and ul from
WikispeechRemoveTags, since lists are now recited reasonably well.
During cleaning, SegmentBreak objects are added where the specified
tags are encountered (the tags themselves are still removed). During
segmenting, when a SegmentBreak is encountered, a new segment is
created.
Renamed what "things" (CleanedText and SegmentBreak) in content are
called to "item".
Bug: T149091
Change-Id: I688f20f6e4a662efb4a74eb2e3e94996b231445f