In the context of the move to separate data-mw storage and -retrieval (T78676) the question of which data VisualEditor needs *in principle* to initialize a basic edit interface has come up repeatedly. The question is not about limitations of the current implementation, but about which data will be needed in any implementation. This is important for performance, as some envisioned performance improvements from the split of data-mw can only be realized if VE can render a basic edit view with HTML only.
The main sticky point seems to be about references:
For references, we need to know (a) the provenance of the reference (where it was generated, local or remote) and the content (what template name, if any, so we can provide the right editing tool). For reference lists, we need the contents of the references to render them in place correctly (and dynamically).
To non-VE-experts, it is not clear how a) affects the rendering. The content for b) tends to be available in the HTML.