Look at the following wikitext
<div><span> x </span> <span> y </span> </div>
and the corresponding output
[subbu@earth parsoid] echo "<div><span>\nx </span> <span>\ny </span>\n</div>" | parse.js --normalize <div><span> <p>x <span> y </span></p> </span></div>
So, in that Parsoid output, the p-wrapping is broken by the <span> being on the same line as the <div>. If the <span> happened to have a "white-space:nowrap;" style on it, it leads to ugliness like this: https://parsoid-vd-tests.wikimedia.org/visualdiff/pngs/shwiki/Gonfaron.parsoid.png
It turns out that the PHP parser emits similar output on that snippet as well:
vagrant@mediawiki-vagrant:/vagrant/mediawiki/maintenance$ mwscript parse.php --wiki=enwiki < /tmp/wt parse.php: warning: reading wikitext from STDIN. Press CTRL+D to parse. <div><span> <p>x </span> <span> y </span> </p> </div>
But, here is what happens when you tidy it!
vagrant@mediawiki-vagrant:/vagrant/mediawiki/maintenance$ mwscript parse.php --wiki=enwiki --tidy < /tmp/wt parse.php: warning: reading wikitext from STDIN. Press CTRL+D to parse. <div> <p><span>x</span> <span>y</span></p> </div>
Here are outputs with the various Tidy-replacement solutions:
RemexHTML:
vagrant@mediawiki-vagrant:/vagrant/mediawiki/maintenance$ mwscript parse.php --wiki=enwiki --tidy < /tmp/wt parse.php: warning: reading wikitext from STDIN. Press CTRL+D to parse. <div><span> <p>x <span> y </span> </p> </span></div>
Balancer:
vagrant@mediawiki-vagrant:/vagrant/mediawiki/maintenance$ mwscript parse.php --wiki=enwiki --tidy < /tmp/wt parse.php: warning: reading wikitext from STDIN. Press CTRL+D to parse. <div><span> <p>x <span> y </span> </p> </span></div>
Depurate:
vagrant@mediawiki-vagrant:/vagrant/mediawiki/maintenance$ mwscript parse.php --wiki=enwiki --tidy < /tmp/wt parse.php: warning: reading wikitext from STDIN. Press CTRL+D to parse. <div><span> <p>x <span> y </span> </p> </span></div>
So, looks like this is a difference between Tidy and a HTML5-based parsing solution. The broader problem here seems to be the p-wrapping done partially on strings/tokens and partially on the DOM. With a purely DOM wrapping solution, there would not be any p-wrapper around any of the span content since it is inside a <div>. That would be okay since it wouldn't wrap a <span> around a paragraph by splitting it.
One temporary hack / solution would be to edit the template to emit a newline after the <div> but, we need to explore a better p-wrapping solution in the parsers.