Page MenuHomePhabricator

Provide iframe sandboxing for rich-media extensions (defense in depth)
Open, MediumPublic

Description

Media-rich extensions like TimedMediaHandler, Graphs, Maps, 3D file viewer, etc pull in fairly large JavaScript libraries, give them a bunch of user-provided input, and let them do stuff with the DOM. This gives them powerful capabilities, but increases our security attack surface.

For defense in depth, it may be wise to isolate these sorts of utilities in <iframe>s using the sandbox attribute with allow-script, and/or some CSP settings to lock things down, and/or using a separate domain for the frame contents.

Since media rendering tends to happen in a defined rectangle, and the ability to export for <iframe> embedding on other sites is already desirable, it seems wise to bake the sandboxing in to the embedding export. Some possible overlap with T169026.

This can be extended further towards fancier sandboxing of user-provided scripts as in T131436, but would be beneficial even just with same-origin exclusion to keep XSS attacks from accessing the main context.

This is conceived as providing a standard interface & configuration var for setting up this kind of sandboxing, which relevant extensions can opt in to to improve their security defense-in-depth.

Basic requirements

  • PHP config vars for alternate domain if supported (otherwise must rely on sandbox attribute or CSP)
    • future applications that *require* sandboxing of user code will need to query what security guarantees are available!
  • PHP class to wrap around outputting data in an isolated iframe (cf the existing iframe output in TMH)
    • bare HTML skin; use standard RL to load relevant modules in the frame context
    • suitable interface for setting up fallback behavior when JS not available

Considerations

On non-public sites, we need to authenticate the user at the PHP level (to output an iframe) but not at the JS level (so an XSS attack can't take over your wiki account). Consider ways to do this.

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Pinging the WMDE technical wishlist team, since they had at least one related wish.

Gigantic +1 to this whole thing

On non-public sites, we need to authenticate the user at the PHP level (to output an iframe) but not at the JS level (so an XSS attack can't take over your wiki account). Consider ways to do this.

At first glance, I think using sandbox attribute would accomplish this, provided that allow-same-origin is not one of the tokens provided to the attribute.

Agree this would be great to have. I think this should be an API-level rather than an extension-level feature, because in many cases an extension would need to bridge the communication between inside and outside the frame. The system should allow:

  • set up a separate initial configuration blob
  • set up a separate resource loading for the iframe
  • designate some resources as only available via the iframe loading (to restrict accidental loading by the primary site)
  • establish a clear usage pattern for extensions to follow (this is more of a documentation than coding)
  • ...?

Hello,
I have a student who is going to do her end-of-degree research adding content on advanced physics to Wikipedia and, at the same time, creating some widgets/applets to run interactivily those physical concepts, so the viewer can see how if you change some parameter you have different results. She will be working on Python.

As far as I know, we can’t embed applets at Wikimedia pages, but we can do a fixed images and link it to an external source where the applet would be stored. BUT, this is something we should start considering, as the hands-on idea is very relevant on some scientific areas.

Some days ago I discovered this: https://commons.wikimedia.org/wiki/Commons:WikiProject_WikiWidgets. So yes, inserting Widgets is possible on any given Wikimedia site. And as far as I understand this can be done via javascript. So, the question is: is it really possible to add applets or we don’t have the required technology yet and this two examples at WikiWidgets are only examples?

Some days ago I discovered this: https://commons.wikimedia.org/wiki/Commons:WikiProject_WikiWidgets. So yes, inserting Widgets is possible on any given Wikimedia site. And as far as I understand this can be done via javascript. So, the question is: is it really possible to add applets or we don’t have the required technology yet and this two examples at WikiWidgets are only examples?

Javascript being a programming language, with full access to the page, you can add any kind of applet in it. Whether that is wise or safe or sustainable is a different matter. T190015, T71445, T39230 and T121470 describe some of the problems involved.

The problem is still the same: which is the correct way to implement this? We can have applets that make things interactive (for example, the position of Jupiter's moons) but for that it seems that we need some code review, isn't it? What would the correct jobqeueu be with this?

@Theklan - The Security-Team would probably want to at least perform a concept review (if not an entire "readiness" review) of anything like what's being discussed here. At the very least, we'd want to enumerate the risk and transfer ownership of said risk to the developers/maintainers of the code. Here's our current security review policy, which discusses both concept reviews and more formal "readiness" reviews: https://www.mediawiki.org/wiki/Security/SOP/Security_Readiness_Reviews.

It makes sense, yes. Let's assume we have some good applets to learn physics running on python somewhere. What would be the steps to follow up so we can iframe them on any given Wikipedia?

@Theklan - This is probably getting outside the original scope of this task, so it might make sense to create another task to further discuss your specific proposal. Initially, there are several concerns which come to mind here. I don't believe the technical pieces are that limiting so much as current policy and best practices along with what the future of CSP and similar protections might look like. Currently, external, third party code is not allowed to be embedded via frame, iframe, object, embed or applet tags on any production project pages as it would violate Wikimedia's privacy policy. And even if it were allowed, once Wikimedia projects begin enforcing CSP, such third party code would no longer work. So this approach is most likely a non-starter. Some possible paths forward would include the creation of a production service or MediaWiki extension which would be able to manage and render the content you are describing on Wikimedia project pages, though this can be a long path forward, especially without the sponsorship of a Wikimedia development team.

Presumably, if the sandbox system mentioned in this task existed, it would just be a matter of having a proxy server between the user and the third-party service.

Presumably, if the sandbox system mentioned in this task existed, it would just be a matter of having a proxy server between the user and the third-party service.

I would say, even with an iframe sandbox system, we would still want code to be served from some repo we control, instead of just proxying it. An iframe sandbox (that allows JS) stops some threats, but certainly not all of them (e.g. cryptominers, certain user tracking is probably still possible, etc).

It makes sense, yes. Let's assume we have some good applets to learn physics running on python somewhere. What would be the steps to follow up so we can iframe them on any given Wikipedia?

For starters, having the "applet" be 100% client side (other than some loading/glue code, maybe as a MW extension written in PHP) running entirely on the client side and written in javascript, would probably be more politically palatable than anything that involves server interaction (If for no other reason, less moving parts = less people you have to get on board = less redtape).

chasemp triaged this task as Medium priority.Dec 9 2019, 5:16 PM

T222807#8858868 has some notes on a possible implementation.

Change 948186 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):

[mediawiki/core@master] WIP: Add OutputPage::createIFrame() for creating iframe sandboxes

https://gerrit.wikimedia.org/r/948186

Change 973283 had a related patch set uploaded (by Gergő Tisza; author: C. Scott Ananian):

[mediawiki/core@master] WIP: Output: Add IframeSandbox class

https://gerrit.wikimedia.org/r/973283

Change 973439 had a related patch set uploaded (by Gergő Tisza; author: Gergő Tisza):

[mediawiki/extensions/CentralAuth@master] Do not throw error when cookies are not readable

https://gerrit.wikimedia.org/r/973439

Change 973435 had a related patch set uploaded (by Gergő Tisza; author: Gergő Tisza):

[mediawiki/core@master] Hide empty page titles in SkinApi

https://gerrit.wikimedia.org/r/973435

Change 973885 had a related patch set uploaded (by Gergő Tisza; author: Gergő Tisza):

[mediawiki/core@master] MediaWikiIntegrationTestCase: Set SiteLookup

https://gerrit.wikimedia.org/r/973885

Change 973438 had a related patch set uploaded (by Gergő Tisza; author: Gergő Tisza):

[mediawiki/core@master] mediawiki.cookie: Do not throw error when cookies are not readable

https://gerrit.wikimedia.org/r/973438

Change 973429 had a related patch set uploaded (by Gergő Tisza; author: Gergő Tisza):

[mediawiki/core@master] ContentSecurityPolicy: Clear hooks during tests

https://gerrit.wikimedia.org/r/973429

Change 973431 had a related patch set uploaded (by Gergő Tisza; author: Gergő Tisza):

[mediawiki/core@master] ContentSecurityPolicy: Add test for sendHeaders

https://gerrit.wikimedia.org/r/973431

Change 973432 had a related patch set uploaded (by Gergő Tisza; author: Gergő Tisza):

[mediawiki/core@master] ContentSecurityPolicy: Expose directives

https://gerrit.wikimedia.org/r/973432

Change 973434 had a related patch set uploaded (by Gergő Tisza; author: Gergő Tisza):

[mediawiki/core@master] OutputPage: Make it possible to add CSP as meta tags

https://gerrit.wikimedia.org/r/973434

Change 973430 had a related patch set uploaded (by Gergő Tisza; author: Gergő Tisza):

[mediawiki/core@master] ContentSecurityPolicy: Use ServiceOptions

https://gerrit.wikimedia.org/r/973430

Change 973435 merged by jenkins-bot:

[mediawiki/core@master] Hide empty page titles in SkinApi

https://gerrit.wikimedia.org/r/973435

Change 973430 abandoned by Gergő Tisza:

[mediawiki/core@master] ContentSecurityPolicy: Use ServiceOptions

Reason:

This would require fixing all the tests that mock an OutputPage. Also, it might change behavior in subtle ways because ServiceOptions values get copied when the object is created so changes to globals after that have no effect. On the whole, probably not worth the effort.

https://gerrit.wikimedia.org/r/973430

Change 973438 merged by jenkins-bot:

[mediawiki/core@master] mediawiki.cookie: Do not throw error when cookies are not readable

https://gerrit.wikimedia.org/r/973438

Change 973439 merged by jenkins-bot:

[mediawiki/extensions/CentralAuth@master] Do not throw error when cookies are not readable

https://gerrit.wikimedia.org/r/973439

Change 973429 merged by jenkins-bot:

[mediawiki/core@master] ContentSecurityPolicy: Clear hooks during tests

https://gerrit.wikimedia.org/r/973429

Change 973431 merged by jenkins-bot:

[mediawiki/core@master] ContentSecurityPolicy: Add test for sendHeaders

https://gerrit.wikimedia.org/r/973431

Change 973432 merged by jenkins-bot:

[mediawiki/core@master] ContentSecurityPolicy: Expose directives

https://gerrit.wikimedia.org/r/973432

Change 974707 had a related patch set uploaded (by Gergő Tisza; author: C. Scott Ananian):

[mediawiki/core@master] IframeSandbox: Add integration tests

https://gerrit.wikimedia.org/r/974707

https://gerrit.wikimedia.org/r/c/mediawiki/core/+/973283 generates a fullblown MediaWiki page (using OutputPage and SkinApi) and puts it into the srcdoc of a sandboxed iframe. The performance characteristics are rather terrible: at least in Chrome, requests initiated within the iframe are not cached at all. For the Graph extension (using this patch), that's about 1.25MB times the number of graphs on the page. (I didn't make an effort to pare down ResourceLoader dependencies from things normally loaded on every wiki page to things required to display a graph in the iframe, but I don't think it would make much difference - most of the payload is Vega.) So I think that's a no-go in current form.

I am not 100% sure what's going on. The fetch spec leaves network partitioning (which I think is the only relevant caching restriction in that spec) intentionally unspecified. Chrome has a Partition the HTTP Cache feature which has been enabled for everyone for a while, which uses a two part [top-level registrable domain, frame registrable domain] key for caching, but it doesn't say how opaque origins should be handled. ("Opaque origin" is how the WHATWG HTML spec refers to documents with no real origin, and it says both using srcdoc and using a sandboxed iframe without the allow-same-origin flag will cause the resulting document to have an opaque origin.) But my impression is that an opaque origin is more or less handled as a unique random string (it is only equal to itself) rendering these requests essentially uncacheable. (And the ResourceLoader store is disabled by the sandboxing, which disallows localStorage access.)

I don't have any great idea what to do about this, but these are some options that come to mind:

  • Try to use some equivalent construction that maybe isn't excluded from caching, such as <iframe src="data:..."> or <iframe src="blob:...">. This is very unlikely to work.
  • Use a real iframe instead of an srcdoc-based iframe (and hope that the caching was broken by srcdoc and not the sandboxing). The challenge then will be how to prevent the attacker from just making the victim load the page directly. We could presumably have some kind of "sandbox mode" that needs to be present in the URL (maybe a query parameter, or by using a dedicated special page like Special:Iframe or Special:Graph) and is required for MediaWiki to return risky content like Vega graphs. On modern browsers, such URLs could then be defended by CSP sandboxing (an optional part of the CSP 1 spec and normal part of CSP 2; caniuse suggests it's supported by all browsers since 2013 or so, including IE 10-11 if one uses the X-Content-Security-Policy header, so all grade A and C browsers), or by checking the Sec-Fetch-Dest header on the server side (less browser support, although still decent, so only good as defense in depth). On old/weird browsers you can probably mitigate the worst by refusing to work in the presence of authentication cookies (which would still allow for IP exfiltration).
  • Use a separate but unsandboxed domain (en.wikipedia.wikimedia-usercontent.org or something like that) that internally resolves to the same wiki (with similar effects to the "sandbox mode" mentioned above, except it wouldn't do sandboxing, just relying on the origin / registrable domain being different). This would of course vastly increase the effort required (buying the domain, setting up DNS rules, certificates, a bunch of changes to site configuration...).
  • Do most of the asset loading outside the iframe. Implement some sort of proxy ResourceLoader in the iframe, which uses postMessage to request the asset from the embedding window. This would only work for things loaded via Javascript (so scripts and some styles, but not style modules or images) but that's the bulk of the data so that's probably acceptable. Security-wise it's somewhat scary: the module string would be generated in a context where the user is known, so there is a risk of the sandboxed malicious code being able to simply request a module that embeds the user's identity. ResourceLoader (client-side ResourceLoader, specifically, I think, since we'd want to get the modules from the RL store is possible) would need some very robust and reliable way of differentiating between modules that do / do not depend on user identity.
  • Wait it out. Chrome has a planned change in their caching logic, and their intent to experiment announcement says "Also, one specific side-effect of this change is that iframes with opaque origins (for instance, those created using data: URLs) may now be eligible to have their resources added to the HTTP cache. We aim to measure what changes in performance result from this." (and the Chrome source code suggests that they will just use the same cache for all opaque-origin iframes embedded in the same site). So that would probably resolve this. The feature is in developer trial though, with no movement in the last half year.

Issue 1319056: Sandboxed iframe doesn't cache resources seems to be the relevant Chromium bug (wontfixed).

Unrelated to the whole caching issue, one thing that's probably worth noting is that an srcdoc-based iframe can read the top-level document's URL.

I tested various ways of loading an iframe in Chrome (using this gist); sandboxing disables caching, regardless of other details. How the iframe is loaded (srcdoc / data URI / blob URI / normal src) doesn't seem to have any effect, assets are cached as long as the iframe is not sandboxed (or has the allow-same-origin sandboxing flag).

... that's about 1.25MB times the number of graphs on the page.

330K actually (output compression wasn't properly configured on my local setup).

Use a separate but unsandboxed domain (en.wikipedia.wikimedia-usercontent.org or something like that) that internally resolves to the same wiki (with similar effects to the "sandbox mode" mentioned above, except it wouldn't do sandboxing, just relying on the origin / registrable domain being different). This would of course vastly increase the effort required (buying the domain, setting up DNS rules, certificates, a bunch of changes to site configuration...).

Seems like a good idea. Cookie-less domain would be good for various things. It would make requests more cacheable and safer, if you don't need the session. Like, for example, when you load JS/CSS for gadgets and user scripts (importScript should probably default to this no-cookie domain).

So I thinks using your example this seems like a good idea to me:

<iframe src="https://en.wikipedia.wikimedia-usercontent.org/wiki/graph.php?params=x" sandbox="allow-scripts allow-same-origin">your browsers doesn't support graphs</iframe>

Note that you can reduce costs by using Let's Encrypt wildcard certificates.

I also found this in whatwg spec:

<iframe sandbox src="https://usercontent.example.net/getusercontent.cgi?id=12193"></iframe>
[Warning!] It is important to use a separate domain so that if the attacker convinces the user to visit that page directly, the page doesn't run in the context of the site's origin, which would make the user vulnerable to any attack found in the page.

So it seems like Chrome marked the issue as won't-fix to encourage devs to use a separate domain. They've implemented similar strategies in the past (for example, most new APIs require HTTPS even if it makes no sense security-wise; also TLS deprecation). Therefore, it's possible that the Chrome team may not intend to change this caching behavior in the future, as it promotes the use of a separate domain in iframes.

Issue 1319056: Sandboxed iframe doesn't cache resources seems to be the relevant Chromium bug (wontfixed).

Unrelated to the whole caching issue, one thing that's probably worth noting is that an srcdoc-based iframe can read the top-level document's URL.

So if I understand it correctly, when sandbox'ing is applied, it always gets a transient origin, which means that upon each request the origin effectively changes and thus there is no caching right ? And the only way around it is using a separate domain without sandboxing ?

That is unfortunate and really complicates things a lot more than expected. I guess it would mean that IF we were to use the sandbox approach after all, the iframe would have to be lazy loaded at the very least. The external domain does seem like a better long term direction, but as stated, requires a lot more setup.

So if I understand it correctly, when sandbox'ing is applied, it always gets a transient origin, which means that upon each request the origin effectively changes and thus there is no caching right ? And the only way around it is using a separate domain without sandboxing ?

No, not always. You can do sandboxing, but to get caching to work in Chrome you need to use sandbox="allow-scripts allow-same-origin". Firefox works fine also without those options, but Chromes needs both for cache to work.

The problem with those options is s that when you use both allow-scripts allow-same-origin then the sandbox only works when the domain/orgin is different.

https://html.spec.whatwg.org/multipage/iframe-embed-object.html#attr-iframe-sandbox
Setting both the allow-scripts and allow-same-origin keywords together when the embedded page has the same origin as the page containing the iframe allows the embedded page to simply remove the sandbox attribute and then reload itself, effectively breaking out of the sandbox altogether.

Hence it would only be fullly safe if on enwiki you would have a different domain like this:

<iframe src="https://en.wikipedia.wikimedia-usercontent.org/wiki/graph.php?params=x" sandbox="allow-scripts allow-same-origin">your browser doesn't support graphs</iframe>

This will also work with caching, and will start sandboxed, but will not be safe:

<iframe src="https://en.wikipedia.org/wiki/graph.php?params=x" sandbox="allow-scripts allow-same-origin">your browser doesn't support graphs</iframe>

I believe we are walking up to cross-roads in our content architecture, and I worry that we're taking a turn for the worse with no regards for the impact of that decision on future potential for our platform, severely limits our audience, technical choices, and assumptions going forward, and I think negatively impacts our mission and reach in a way that is in direct contradiction and violation of principles that thus far have been accepted without question. We can talk about questioning those principles, but I think it's important that we do so while well-informed of what that choice actually entails. And, what the cost/benefits are of the alternative looks like. I.e. what are the benefits (and costs) of resolving the issue at hand in a way that doesn't break deep assumptions in the status quo, doesn't violate currently held principles, in a way that removes access to, and reduces reach and resiliency of our content.

Context: Principles

I'll quote some principles, and how I interpret these.

From our guiding principles:

information in Wikimedia projects can be freely shared, freely distributed, freely modified and freely used for any purpose […] in perpetuity.

All material in our projects is available in free formats that do not require the user to execute proprietary software.

[…] we support the right to […] make […] copies […] of Wikimedia content.

Wikimedia Foundation aims to make material […] broadly accessible. […] In prioritizing new products and features, our goal is to impact the largest-possible number of readers and contributors, and to eliminate barriers that could preclude people from accessing […] our projects. We endeavour to create the structural support and the necessary preconditions for bottom-up innovation by others. […] Where possible, we aim to preserve and support frictionless use of the material in the projects, so that people can share it widely and easily.

I consider "information" to be "distributable" if you can download and archive it from our website, and can open that copy in a way that is similarly accessible as the original, yet independent of the online copy. That is, it does not change or stop working if something happens on the website. If it does not meet this criteria, then really all we offer is an (inefficiently) way to share a link to view content on our own website.

In practice, this means if you save the HTML (i.e. from Parsoid, Parse API, action=render, Enterprise API, or some other API or scraping) and recursively download referenced media files, you have (short of the skin and optional interactivity) a functionally equivalent copy of the same information as presented on the site.

  • allow efficient interaction with wiki content, […] easily processed and reused in bulk, […]
  • data […] SHOULD be based on widely used open standards.
  • […] the need for distributing our content though APIs and as dumps, as discussed in the Resilience essay.

This emphasises the need for bulk processing. For example, viewing each online article in a browser and exporting it somehow, would not be considered "easily processed in bulk". And if the export method were "Print to PDF", the resulting PDF would not be "easily reused in bulk" given that PDFs are significantly harder to index, extract, or render in different ways than e.g. HTML-formatted rich text. Imagine using a PDF to offer a decent responsive experience for mobile and desktop viewports, or to extract from a PDF a summary and first image to display in a Siri Knowledge panel.

This also reiterates the need to provide the content in re-usable form as-is. So if others are theoretically able to convert the content into something reusable, but we don't do so proactively ourselves, that would still violate this principle. It is of vital importance that distributers can focus on their own audience-specific problems, and that "getting" the content doesn't involve obstacles in the form of costly and unnecessary/un-batchable rendering or format conversions.

As practical example, we can look how even the most mature and complex web archivers (like Internet Archive's Wayback Machine), despite going through great lengths to not only download referenced media, but also rewrite huge amounts of dynamic JavaScript code, it consistently fails to archive chained, lazy-loaded or otherwise non-trivial JavaScript pipelines. Even even if it did work one day, it is well into the realm of non-trivial processing and would present a very hard filter on who can and will distribute Wikimedia content out into the world.

  • These principles […] enable people, […] and machines to […] access the sum of all knowledge, on and off the Wikimedia sites.

Examples of machines include, for example, enabling Kiwix, IPFS and Apple Dictionary to show render copies of our HTML, including images, audio, and video.

As well as for search engines like DuckDuckGo, Bing, and Google to be able to serve up all our photos, maps, timelines, and other visualisations.

From Compatibility § Browsers

Every web page starts in Basic mode, where only the HTML is rendered. CSS can be assumed to succeed for visual readers and should be used for presentation. The Modern layer defines optional enhancements and interactions written in client-side JavaScript. This layer may fail to load, arrive later, or not at all; including in modern browsers.

And, lastly, from Frontend practices

Performance principles, in order of their relative importance: Users […], Developers […], Servers […].

[…] We aim to be fairly aggressive in raising JavaScript requirements for modern browsers, which reduces costs of development and maintenance, and also reduces [page weight]. This aim is only achievable when components start out with a solid and functional Basic experience, with server-rendered access to information, […].

Embrace that every page starts with basic HTML and CSS, and that JavaScript […] may not arrive [due to]:

  • outside our canonical website:
    • offline re-use of our content, such as Kiwix, archive.org, archive.today, and IPFS.
    • external re-use, such as Apple Dictionary/Lookup, and Apple Siri, third-party mobile apps for Wikipedia, and alternative Wikipedia reader sites
    • search engines
  • personal circumstance or upstream vendor decisions:
    • device age.
    • browser choice. […]
    • intervention by browser
    • network speed
    • […]

This further reiterates that there can be no JavaScript requirement for fundamental access to content.

Note that this principle is deeply integrated into other principles. It goes well beyond our own sites. For example, even if in a hypothetical future where the performance of client-side rendering is acceptable, and where somehow either it no longer costs resources to support older browsers, or where all devices/browsers are magically up-to-date, we still could not require JavaScript to render content. Why? Because if the content required JS execution to render, it wouldn't be accessible in bulk, or indexable by search engines, or available in an open format, or archivable. And even if we'd drop the "bulk" and "machine readability" principles, and invest heavily into a version of the feature that works through a single inline, standalone, archive-friendly inline script tag within the HTML content; it remains unlikely that most content re-users would even permit or have the ability to offer JavaScript execution in the places where they currently interpret or embed Wikipedia HTML.

The moment the map on https://en.wikipedia.org/wiki/South_West_(Western_Australia) ceases to be an <img> in favour of a script tag, external iframe, or empty div; it will disappear from most re-users that display this content.

Context: Applying these principles

To my knowledge, there exists no content in our platform today that was funded and planned to knowingly violate any of these principles. As such, I believe I can say uncontroversially that it is the status quo to follow them and that we would need significant discussion and evaluation to deviate from that.

The continued failure of the Graph extension

There exists exactly 1 feature in our platform today that violates the quoted principles. It is the Graph extension. It wasn't that way originally, though. Like its sibling, the Maps extension powered by Kartographer, the Graph extension also used to render an <img> tag, and used only optional JavaScript enhancements when viewed on Wikipedia.org, to add interactivity.

In 2020, the unmaintained status of the Graphoid backend service led to its undeployment (T242855). This started the situation we have today, where for all intents and purposes, Graph content no longer "exists". As of 2020, when you read articles about Covid in the Apple Dictionary app, there are giant holes where the graphs once were. Similarly in any modern browser where JS fails or doesn't arrive in time. They also no longer surface via Google Image search. They no longer appear in the Internet Archive. They also no longer appear when viewing content offline in our own official Wikipedia Android app, or in Kiwix which is important to communities that rely on peer-to-peer distributions of Wikipedia (as further elaborated on in the Resiliency essay).

Let us review other media integrations:

  • Video (TMH). <video>
  • Music notes (Score) <img><audio>
  • Maps (Kartographer) <img>
  • STL models (3D) <img>

These all render early, quickly, and cheaply in external contexts, apps, and browsers regardless of support level, can be shared by URL in messenger applications, securely copied into a blog or other website to host, found via Image Search, etc. (T272530#8794703)

A bright future for Graph

In addition to the above deeply-rooted principles that inform our general approach to content features (and that ensure graphs are not broken in the official Wikipedia apps when viewing saved articles, and not broken for search engines and other third-party re-users), there is another significant concern unique to the Graph extension.

The Graph extension is currently implemented using Vega specs, and does so in a way that exposes the non-standard Vega format directly to wikitext. This is fundamental design mistake, and is the sole reason the Graph extension was disabled in its entirety this year. The extension was disabled due to a JavaScript security vulnerability that was several years old. We were unable to upgrade Vega in the Graph extension, because our graph descriptions were tied to a specific old version of Vega. It appears the problem we are trying to solve right now, is how to render the old and insecure version of Vega, instead of focussing on the problem of why we were unable to easily upgrade Vega in the first place, like we would do with any other JavaScript library.

For example, we use jQuery, Vue, Moment, Mustache, Pako, and lots of other libraries. Placing these in a sandboxed iframe is in my view not a generalisable or scalable solution to the security problems.

I propose instead that we, like we already do for everything that is not Graph:

  • Ensure JS is optional, so that it doesn't break Wikipedia apps, Kiwix, Internet Archive, image search, etc.
  • Ensure JS is optional, so that in security emergencies it can be turned off.
  • Ensure content is not tied to individual versions of JS libs, so that upgrades can be planned and executed in a timely manner.

There was at no point an informed and intentional management decision to break graphs and violate these principles. As such, I recommend that if there is now (unlike in 2020) funding and interest in making graphs well-supported in our platform, to funnel that desire toward T249419: RFC: Render data visualizations on the server. Specifically in a way that:

  • uses a stable format that is independent of a specific library of version thereof.
  • renders it server-side first (e.g. in PHP, or via shellbox as-needed).

The extension was disabled due to a JavaScript security vulnerability that was several years old. We were unable to upgrade Vega in the Graph extension, because our graph descriptions were tied to a specific old version of Vega. It appears the problem we are trying to solve right now, is how to render the old and insecure version of Vega, instead of focussing on the problem of why we were unable to easily upgrade Vega in the first place, like we would do with any other JavaScript library.

This isn't entirely accurate. While years-old vulnerabilities found within outdated versions of Vega were the initial impetus of the recent Graph disablement, further 0-day vulnerabilities were reported to us within current 5.x versions of Vega, which thus far remain unaddressed upstream. And, in my opinion and others', fixing them are difficult or impossible and likely outside of Vega's intended use-cases and threat-model.

I can't speak for other folks currently involved in the Graph update project, but the sandboxed iframe approach was discussed as a generalizable, middle-ground solution that would support the same system folks had become accustomed to on the projects (this is incredibly important given most proposed migration paths for affected graph content) while providing reasonable security around known-insecure third-party code, with the effort level being a substantial reduction from what T249419 and similar proposals will likely require.

I would like to emphatically support Timo in T169027#9362252 here. And just to re-state what I think is the most critical part of the argument:

We do not need Vega. It was never asked for, and it doesn't serve a fundamental purpose for anyone except a tiny number of graphs that I've promised to personally hand-migrate to a better system. Vega is not the solution we're looking for here, it's something that a much more mature MediaWiki might be good at in the future. Right now, we should focus on doing good product work and building with our communities. If we ignore this and continue forward with Vega in an iframe, we are just adding fundamental technical debt not just to our code but to the content.

So if I understand it correctly, when sandbox'ing is applied, it always gets a transient origin, which means that upon each request the origin effectively changes and thus there is no caching right ? And the only way around it is using a separate domain without sandboxing ?

Yes, although there might be ways to avoid needing requests in the first place. I think this topic is complex enough to deserve its own discussion thread so I've filed T352227: Work around cache partitioning in iframe sandboxing.

Change 973434 merged by jenkins-bot:

[mediawiki/core@master] OutputPage: Make it possible to add CSP as meta tags

https://gerrit.wikimedia.org/r/973434

Change 973283 merged by jenkins-bot:

[mediawiki/core@master] Output: Add IframeSandbox class

https://gerrit.wikimedia.org/r/973283

Re Krinkle In T169027#9362252

All that is compelling in a lot of ways. I think there is a bit of an unanswered product question in what we are trying to accomplish with graphs, which confuses a lot of the discussion.

Reading some of the original writings on the Graph extension, it seems somewhat less about actual graphs and more about an attempt to get interactive content (broadly defined) on to Wikipedia. See for example [[User:Yurik/I_Dream_of_Content|Yurik's essay]]. I think that idea persists to this day in the background for a lot of people.

Use cases I've seen surrounding this topic:

  1. Displaying static graphs
  2. Displaying interactive graphs. e.g. Being able to click on something and drill down [like the flame graphs we use for perf monitoring], having a slider to adjust the map over time.
  3. Big data dreams. By which I mean, we collect large data sets, then visualize it in an open ended way, and then users somehow get something out of it via self-exploration [This is my least favourite use case. People get really inspired by this idea, but i think the value is a mirage and it essentially amounts to at best shoving a primary source in the user's face and at worse an impossible to interpret art project]
  4. Interactive learning applets. e.g. Allowing physics demonstrations. Showing how things work in a way you can virtually touch (like 1). Animations you can interact with.

I think use case 3 and 4 is essentially what inspires people, but use case 1 is essentially what "graphs" without context means. All the problematic stuff that graphs does is essentially in support of use case 3 and 4. The overly complex vega language. Allowing calling out to the MW api. Allowing calling out to the page view api. Allowing calling out to the wikidata query service. JsonConfig.

If all we want to do is solve problem 1 (and maybe 2), we would probably be building something like the Timeline extension, but more modern. Some SVG, with maybe a tiny bit of (not user controllable) js to support minimal interactivity.

Use case 4 is essentially: lets allow users to make maximally complex interactive rich multimedia documents. Or to put it in internet terms - lets give users the ability to do the equivalent of flash applets. There is no way to do that in a scripting optional way (Of course, javascript does not necessarily need to be the scripting language). There is no way to do that where it will be compatible with all reusers.

Sorry, this turned into a bit of a rant. I think the point I'm trying to make here is - I think we got to this point because we were pretending to do use case 1 & 2, but were actually trying to do use case 3 & 4. Or we were trying to use use case 1 & 2 as a backdoor way to get 3 & 4 on the site. Whatever the future of graphs is, I think its important to be very clear what we are trying to do and what we are not trying to do, to avoid ending in the same spot. If we are trying to solve just use case 1 & 2, then we should do that and just that, in the best way possible. If we ever try and solve use case 4 (Which personally i think would be cool, but that's besides the point), then its important to be up front that that is the goal.

IMO option 4 is cool but it requires a developer (paid or volunteer) to hand-craft every single Graph (or InteractiveLearningThingy or whatever the successor) invocation, and that manpower just doesn't exist. It wouldn't work even if interactive use-cases wouldn't be encumbered with the Vega syntax.

Whatever the future of graphs is, I think its important to be very clear what we are trying to do and what we are not trying to do, to avoid ending in the same spot.

This is an important point, but I'm pretty sure it's well understood by everyone who needs to understand it. I think the frustratingly slow progress on this issue is in large parts because deciding what we are / should be trying to do is a lot harder than just fixing a thing that used to work but doesn't work anymore.

(I'd also add sharing data between wikis and making it machine-readable as a use case that Graph supports but doesn't support very well.)

IMO option 4 is cool but it requires a developer (paid or volunteer) to hand-craft every single Graph (or InteractiveLearningThingy or whatever the successor) invocation, and that manpower just doesn't exist.

This is a bit off topic, but I actually do believe this is in theory viable, if it is something that can be delegated to untrusted users. Wikipedians spend thousands of hours crafting featured articles. Wikipedians also write complex programs in the form of lua modules. I fully believe if properly enabled, an ecosystem of on-wiki animators could make interactive animations for high profile articles. No doubt though, enabling such an ecosystem would be a huge undertaking.

Is not that we would like to have 3 and 4. Is that we decided that in the 2030 strategy discussion. We need it.

Change 1008466 had a related patch set uploaded (by Gergő Tisza; author: Gergő Tisza):

[mediawiki/core@master] Revert "Output: Add IframeSandbox class"

https://gerrit.wikimedia.org/r/1008466

Change 974707 abandoned by Gergő Tisza:

[mediawiki/core@master] IframeSandbox: Add integration tests

Reason:

per I85a63fc57f96d5

https://gerrit.wikimedia.org/r/974707

Change 1008466 merged by jenkins-bot:

[mediawiki/core@master] Revert "Output: Add IframeSandbox class"

https://gerrit.wikimedia.org/r/1008466