Page MenuHomePhabricator

Work around cache partitioning in iframe sandboxing
Open, Needs TriagePublic

Description

We want to use sandboxed iframes for executing untrusted or risky Javascript code, generally for T169027: Provide iframe sandboxing for rich-media extensions (defense in depth) and more immediately for T222807: Sandbox Graph extension into an iframe. Modern browsers partition the HTTP cache, as a part of their state partitioning efforts: an embedded cross-domain iframe will use a cache that's not shared with the parent page (see this Chrome blog post explaining the concept, and the Partition the HTTP Cache and Replacing Frame Site with "is-cross-site" in the HTTP Cache Partitioning Scheme chromestatus.com pages for more technical details and the current state of cache partitioning). For a sandboxed iframe, at least in present-day Chrome, this effectively means no caching (sandboxed iframes are more or less treated as having a unique random one-off domain name, so each a sandboxed iframe will never be considered to be on the same domain or partition than anything else).

Without caching, the load on Wikimedia servers increases, and the latency and bandwidth use increases for the user. In the specific case of the Graph extension, I think the impact on servers is not very significant (it's mainly ResourceLoader load.php requests, which are cached in Varnish and scale well) but the bandwidth increase is quite large, about 0.3 MB per graph shown on the screen (that's comparable to the total payload size of visiting a Wikipage page with no images on it, so could be a 100% bandwidth increase per graph). This makes it infeasible to use iframe sandboxing without some kind of performance workaround for at least ResourceLoader modules (which are the majority of the uncached requests). We want to identify possible workarounds and evaluate their feasibility and estimate the effort required. Specifically for T346291: Re-enable the Graph Extension for use at all Wikimedia Wikis, we also want to check whether there are acceptable workarounds which still keep iframe sandboxing a relatively low-effort path for having graphs (since that was the assumption based on which we decided to use iframes; if it turns out to be false, we might want to reevaluate what's the best way forward).

These are the ideas that came up so far:

  • Use an iframe with a real domain (not sandboxed, or sandboxed but has allow-same-origin flag). For security reason this cannot share a parent domain with wikis, so we'd have to register a new top-level domain name (something like wikimedia-usercontent.org ). The new domain would serve static content, and the parent document could inject the actual content (the two documents would not be able to interact diretcly due to same-origin restrictions, but the static content could include a script exposing a postMessage-based API for injecting arbitrary HTML, for example). The cache would still be partitioned so modules would have to be loaded twice, but that overhead would only be incurred once per browsing session, not once per displayed graph (and the embedding document wouldn't need assets which are only required for rendering the iframe, e.g. the Vega library for Graph, so the overhead would be much smaller).
  • Same but the new domain would generate the sandboxed content on the server side. This would mean the request is routed to the same MediaWiki instance that's generating the embedding document, just via a different domain; Wikimedia configuration would have to be updated to handle that; and MediaWiki would have to be able to detect that and restrict available actions to just rendering the iframe contents.
  • Do the asset loading in the embedding document where caching works normally, and the localStorage-based ResourceLoader store is available. Pass the assets using postMessage; the iframe would use a shim mw.loader which just evaluates the passed-in assets. It's probably infeasible for the embedding document to predict what assets are needed (e.g. something might be requested by an mw.loader.using call), so the shim mw.loader would have to be able to request arbitrary RL modules from the parent document, and the trusted JS code in the parent would have to be able to determine whether a given asset is safe to pass in (ie. whether it depends on user identity or not).

Event Timeline

Tgr added a subscriber: Krinkle.

So if i understand correctly, caching works correctly loading the initial iframe document, just not any subresources of that document.

This leads to option number 4: Use iframe sandboxing, but inline all resources into the html document, so that there are no subresources. Hence the whole thing can be cached normally.

A variant of that idea, would be to construct the iframe document dynamically on the client side as a fully complete document, and inject it as a srcdoc iframe.

Neither of these are all that appealing if we want to reused MW code to generate all this. Although I suppose for option 4, we could make some sort of proxy layer that substitutes all needed resources (that are statically known).

This leads to option number 4: Use iframe sandboxing, but inline all resources into the html document, so that there are no subresources. Hence the whole thing can be cached normally.

From the user's perspective, that's not really caching - the article size is just increased by 0.3 MB times the number of graphs in it. And it doesn't work for dynamically loaded ResourceLoader modules.

Although I suppose for option 4, we could make some sort of proxy layer that substitutes all needed resources (that are statically known).

Yeah that's option 3 in the task description (if I understand you correctly). The resources wouldn't necessarily have to be statically known, the proxy layer just needs to be able to determine if they are safe to load. For RL modules, that's just checking whether they are user-specific (in which case there's no legitimate reason to load them anyway) or cacheable. For other static assets, it should be generally safe as long as the request is cleaned from query parameters and such. For images, there is some theoretical info leak channel but probably fine to ignore in practice. But it's not relevant to the Graph use case anyway.

A variant of that idea, would be to construct the iframe document dynamically on the client side as a fully complete document, and inject it as a srcdoc iframe.

Any extension tag that outputs sandboxed content would then have to come with a companion JS module to compile that content on the client side. And it still wouldn't work for any code that wants to load ResourceLoader modules dynamically.

From the user's perspective, that's not really caching - the article size is just increased by 0.3 MB times the number of graphs in it. And it doesn't work for dynamically loaded ResourceLoader modules.

What i mean more, is if you assume that most graphs load the same js, with only a small part that changes between different graphs (The vega json document)

You could have something like <iframe src="vegaLoader.htm" sandbox="allow-scripts"> where vegaLoader.htm contains the 0.3 mb of javascript that supports graph rendering directly inline, and then just send the much smaller vega json document via window.postMessage. The vegaLoader.htm page gets cached, since the caching issue only applies to subresources of the iframed document but not vegaLoader.htm itself.

Its still not a very appealing option because as you said, you can't use RL dynamically with it, and it would be very hard to reuse the existing MW infrastructure to make a loader like that.

Per T334940#9537862, this will not be worked on in the context of the Graph extension. I think it's still meaningful in the wider context of T169027: Provide iframe sandboxing for rich-media extensions (defense in depth) so we can leave the task open.

In the specific case of the Graph extension, I think the impact on servers is not very significant (it's mainly ResourceLoader load.php requests, which are cached in Varnish and scale well) but the bandwidth increase is quite large, about 0.3 MB per graph shown on the screen (that's comparable to the total payload size of visiting a Wikipage page with no images on it, so could be a 100% bandwidth increase per graph).

Isn't that still an acceptable price to pay compared to continuing to not have graphs?