In order to be able to stop shipping the older styles in content.thumbnails-*.less, content that's mimicking the parser media output needs to be migrated to the new structure.
However, the styles in content.media-*.less are targeted towards typeof="mw:File" annotations, which might not make sense semantically for that content (and would likely be sanitized away). Perhaps we need to add a class in the stylesheet for this content to target instead? But note that, in T314097, it was discouraged to add a classes that MediaWiki core isn't generating:
I don't think it makes sense to put a class here that's not generated by MediaWiki core however. At the very least not without a huge inline comment explaining why.
EDIT: from T318433#10397735 below:
We discussed this task during MW engineering offsite. Proposed solution, inspired by T204370, is to add a new parser function {{#media}}, which in its full form will be a replacement for [[File:]] syntax, but with cleaner option handling (eg, requiring caption=.... instead of parsing any unrecognized option as the caption). {{#media|src=Foo.jpg}} would be the equivalent of [[File:Foo.jpg]] but instead of src you could pass content=... as an option to embed arbitrary user-generated content, using the structure described above in:
(We probably don't actually need typeof=mw:UserContent on the figcaption, since the caption is always user-generated content; this isn't new in any way. See T331655 for a fuller discussion of marking extension-generated content.)
The implementation should call an Parsoid API in order to share as much code as possible with the existing Parsoid image handling code.