The generic media structure looks like,
<wrapper> figure,span <link> a,span <media element> audio, video, img, etc.
and the less mirrors that,
figure[ typeof~='mw:File' ] > *:first-child > img,
The child combinators are necessary because media can be nested in the figcaption,
<wrapper> <link>...</link> <caption>...</caption> </wrapper>
and styles should not apply in there.
Unfortunately, active formatting elements reopening in the wrapper will prevent the styles from applying. For example, <p>'''''[[File:Foobar.jpg|thumb]]'''''</p> renders as,
<p data-parsoid='{"stx":"html","autoInsertedEnd":true}'><i data-parsoid='{"autoInsertedEnd":true}'><b data-parsoid='{"autoInsertedEnd":true}'></b></i></p><figure class="mw-default-size" typeof="mw:Image/Thumb"><i data-parsoid='{"autoInsertedStart":true,"autoInsertedEnd":true}'><b data-parsoid='{"autoInsertedStart":true,"autoInsertedEnd":true}'><a href="./File:Foobar.jpg" class="mw-file-description"><img resource="./File:Foobar.jpg" src="http://example.com/images/thumb/3/3a/Foobar.jpg/180px-Foobar.jpg" decoding="async" data-file-width="1941" data-file-height="220" data-file-type="bitmap" height="20" width="180" srcset="http://example.com/images/thumb/3/3a/Foobar.jpg/270px-Foobar.jpg 1.5x, http://example.com/images/thumb/3/3a/Foobar.jpg/360px-Foobar.jpg 2x"/></a><figcaption></figcaption></b></i></figure><p class="mw-empty-elt" data-parsoid='{"autoInsertedStart":true,"stx":"html"}'></p>
Let's add some logging or metrics see how common this is and under what conditions editors are producing it.