The multipart content model should support a "main" part containing wikitext, and several "attachments". The different parts would be bundled together in some kind of envelope structure, using something like JSON, XML, or mime/multipart.
Only the main/text part of the content should be exposed via EditPage and action=edit. Some other parts can be accessed/edited via EditPage and action=edit by requesting them specifically. Some parts, depending on their content model, may not be editable via the text based interfaces.
Note that exposing only the textual "main" part via action=edit breaks the assumption that it is possible to grab a revision's content, modify it, and save it. The revision's content would be the full blob containing all parts (unless requested otherwise).