Instead of preprocessing text (running cleaner and segmenter) in onParserAfterTidy(), this should be done by the API action (which will be created in T164250). The response from the API (probably a JSON object) will be stored and used instead of the HTML elements (<utterances> and children) in the current implementation.
Since the text preprocessing will be run after the page is loaded, it should be possible to not process the whole page in one go. Instead, maybe process one paragraph at a time.