From T216003#7836958,
This is a bit of a mess. So, the format (framed, thumbnail, etc) is first one wins. Dimensions (width, height) are last one wins. Horizontal and vertical alignment (left, right, etc) are first one wins. Caption is last one wins. The legacy parser and Parsoid agree on that. However, for any other media option, Parsoid is first one wins and the legacy parser is last one wins. This is where the discrepancy with upright originates.
https://github.com/wikimedia/mediawiki/blob/master/includes/parser/Parser.php#L5353-L5442
Parsoid has been linting bogus media options for quite some time so hopefully this duplication is now rare in practice and making a breaking change so this can all be consistent won't be too disruptive.
There are about 24k lints on enwiki,
https://en.wikipedia.org/wiki/Special:LintErrors/bogus-image-options