I've turned up a flow revision for which the text in external store contains a ^H (\b) embedded in it.
Details on the revision:
page: https://www.mediawiki.org/wiki/Extension_talk:LinkedWiki
topic: Notice: Undefined index: Beschrijving in /var/www/wikifarm-mw1.19/extensions/LinkedWiki/LinkedWiki.php on line 283
post: https://www.mediawiki.org/w/index.php?title=Topic:Ret7qp83fy2cwmjd&topic_showPostId=rfb0t2cr56qwgrp5#flow-post-rfb0t2cr56qwgrp5
rev id (alnum): rfb0t2cr56qwgrp5
flags: utf-8,gzip,html,external
url: DB://cluster25/650451
Content after decompression has a ^H in the line
<binding name="Beschrijving"><literal>Het product Liaan e-<span typeof="mw:Entity" data-parsoid='{"src":"&#8;","srcContent":"\b","dsr":[694,698,null,null]}'></span>Dienstverlening is ontwikkeld om uw organisatie uitgebreid te ondersteunen bij het implementeren van digitale dienstverlening / e-Formulieren.
between
null]}'> and </span>
I verified this by pulling the specific blob_text from external store and decompressing it, then running it through od -c.
This bad character is duly written out in the flow xml dumps, which breaks XMLReader() when we try to re-use these dumps for prefetch.
So there are two problems: 1) the ^H in the revision, 2) bad CDATA isn't stripped out before the revision content is written to the dump file.