Custom tag types are supported in HTML5, and they are parsed as inline content. Thus instead of replacing <figure> with <span> in the Parsoid DOM spec for inline figures, we could replace <figure> with <figure-inline> instead. That would allow better semantic matching than abusing <span>s.
We'd still have to protect block-level content in the figure caption, of course. I think we already move the caption into data-mw for inline media, but if we didn't we'd have to use <figcaption-inline> (or some such) since <figcaption> is also a block element.