Map tokens from TTS responses to HTML
Added mapping between the tokens received from the TTS server to the "words"
in the html. Tokens are stored in the utterance elements and are assigned a
position attribute, which is the index of the start of the corresponding
html substring. Removed HTML tags are stored in the tokens element.