We want to use sandboxed iframes for executing untrusted or risky Javascript code, generally for T169027: Provide iframe sandboxing for rich-media extensions (defense in depth) and more immediately for T222807: Sandbox Graph extension into an iframe. Modern browsers partition the HTTP cache, as a part of their state partitioning efforts: an embedded cross-domain iframe will use a cache that's not shared with the parent page (see this Chrome blog post explaining the concept, and the Partition the HTTP Cache and Replacing Frame Site with "is-cross-site" in the HTTP Cache Partitioning Scheme chromestatus.com pages for more technical details and the current state of cache partitioning). For a sandboxed iframe, at least in present-day Chrome, this effectively means no caching (sandboxed iframes are more or less treated as having a unique random one-off domain name, so each a sandboxed iframe will never be considered to be on the same domain or partition than anything else).
Without caching, the load on Wikimedia servers increases, and the latency and bandwidth use increases for the user. In the specific case of the Graph extension, I think the impact on servers is not very significant (it's mainly ResourceLoader load.php requests, which are cached in Varnish and scale well) but the bandwidth increase is quite large, about 0.3 MB per graph shown on the screen (that's comparable to the total payload size of visiting a Wikipage page with no images on it, so could be a 100% bandwidth increase per graph). This makes it infeasible to use iframe sandboxing without some kind of performance workaround for at least ResourceLoader modules (which are the majority of the uncached requests). We want to identify possible workarounds and evaluate their feasibility and estimate the effort required. Specifically for T346291: Re-enable the Graph Extension for use at all Wikimedia Wikis, we also want to check whether there are acceptable workarounds which still keep iframe sandboxing a relatively low-effort path for having graphs (since that was the assumption based on which we decided to use iframes; if it turns out to be false, we might want to reevaluate what's the best way forward).
These are the ideas that came up so far:
- Use an iframe with a real domain (not sandboxed, or sandboxed but has allow-same-origin flag). For security reason this cannot share a parent domain with wikis, so we'd have to register a new top-level domain name (something like wikimedia-usercontent.org ). The new domain would serve static content, and the parent document could inject the actual content (the two documents would not be able to interact diretcly due to same-origin restrictions, but the static content could include a script exposing a postMessage-based API for injecting arbitrary HTML, for example). The cache would still be partitioned so modules would have to be loaded twice, but that overhead would only be incurred once per browsing session, not once per displayed graph (and the embedding document wouldn't need assets which are only required for rendering the iframe, e.g. the Vega library for Graph, so the overhead would be much smaller).
- Same but the new domain would generate the sandboxed content on the server side. This would mean the request is routed to the same MediaWiki instance that's generating the embedding document, just via a different domain; Wikimedia configuration would have to be updated to handle that; and MediaWiki would have to be able to detect that and restrict available actions to just rendering the iframe contents.
- Do the asset loading in the embedding document where caching works normally, and the localStorage-based ResourceLoader store is available. Pass the assets using postMessage; the iframe would use a shim mw.loader which just evaluates the passed-in assets. It's probably infeasible for the embedding document to predict what assets are needed (e.g. something might be requested by an mw.loader.using call), so the shim mw.loader would have to be able to request arbitrary RL modules from the parent document, and the trusted JS code in the parent would have to be able to determine whether a given asset is safe to pass in (ie. whether it depends on user identity or not).