Page MenuHomePhabricator

Adding a graph to a page doubles JS payload on mobile and desktop
Closed, ResolvedPublic

Description

When graphoid was undeployed, we witnessed a large spike in JS assets (179.8 KiB of JS) for the Facebook page because of the single graph that was included in the page as documented by @phuedx in T271495#6748186

facebook_regression.png (1×2 px, 346 KB)

Per Sam, that 179.8 KiB of JS is introduced by the Graph extension when $wgGraphImgServiceUrl is falsy and there's at least one graph on the page in the form of the ext.graph.vega2 RL module and the wgGraphSpecs JS config variable.

This is an unfortunate download for users that do not interact with the graph, especially considering this is more bytes than all of the images on the page combined.

Expected: The graph code should be loaded when I click on the graph or later in the page's execution (looks in the direction of the performance team for the recommendation).

Event Timeline

Milimetric subscribed.

To clarify on my broken heart, this is what I explained would happen in my RFC to replace graphoid: T249419. I really hope we can prioritize pushing that forward.

This is not a surprise, indeed. We knew that moving this feature client-side only would have this impact, since Vega is huge and monolithic.

My recommendation would be to look into ways the damage can me mitigated.

First of all, how dated are the versions of Vega we're using for this? Maybe newer ones are leaner or more modular.

Is the JS only required for interaction? I thought it was needed for rendering too. If only needed for interaction, then waiting for user interaction makes total sense.

If needed for rendering, I think that an intersection observer, with look-ahead distance, should be used on browsers that support it, in order to delay the load of the bulk of the JS involved here.

And on browsers that don't support intersection observers, I think that a delay beyond the load event is also desirable. I think it's probably uncommon for a graph to be above the fold. We shouldn't overdo the extra delay for those rare articles where a graph IS above the fold, though.

To add to Gilles's points:

  • I still think there's good reason to deprecate all old versions of Vega and support only the latest versions of Vega and Vega lite. They should be backwards compatible for the foreseeable future. I can help migrate graph definitions if that's a problem.
  • Vega is not modular at all, and it is used for rendering, not just interaction.
  • I agree with always loading it with a delay, and ideally using intersection observers, as Gilles describes. The placeholder could describe what's happening, should be usable enough.

But of course we need a longer-term solution, as this content is getting pulled all over (Alexa, Siri, Google, etc.) It's relevant to both Knowledge as a Service and Rich Media.

First of all, how dated are the versions of Vega we're using for this? Maybe newer ones are leaner or more modular.

Is the JS only required for interaction? I thought it was needed for rendering too. If only needed for interaction, then waiting for user interaction makes total sense.

At the time of writing, the current version of Vega is needed to render the graph(s) as well:

Screenshot 2021-02-04 at 12.18.59.png (1×1 px, 341 KB)

If needed for rendering, I think that an intersection observer, with look-ahead distance, should be used on browsers that support it, in order to delay the load of the bulk of the JS involved here.

Alternatively, we could explore requiring the user to click on an overlay in order to load the graph.

Intersection observer makes sense to me as does a generic graph graphic that can be clicked to turn it on.

loading it with a delay, and ideally using intersection observers, as Gilles describes. The placeholder could describe what's happening, should be usable enough.
Alternatively, we could explore requiring the user to click on an overlay in order to load the graph.

What seems to be forgotten, is that currently graphs have no size. Or rather, the size is only known after rendering it, and the server cannot know it (this is one of the things that graphoid did). This causes page reflows, which will be especially noticeable to endusers, if we do not immediately load the graph.

To fix that, we would have to detect discrepancies and actively notify editors during page views, to input the correct width and height into wikicode. Somewhat laborious.

Change 909758 had a related patch set uploaded (by Jdlrobson; author: Jdlrobson):

[mediawiki/extensions/Graph@master] Introduce click to load with error handling for unsupported graphs

https://gerrit.wikimedia.org/r/909758

@Jdlrobson btw. i think users will really dislike this. Why would you click multiple nondescript rectangles in an article, to load, view and compare mostly static graphs ? I'd skip the entire article and go straight to another website to view the covid graphs.

Click to load really cannot work if there is not at least a serverside preview being generated. A compromise might be lazyloading as something comes into view, but... even that seems like it would be unpopular to me.

This is another thing that lazyloaded iframes would solve btw.

I think the performance penalty here is pretty bad for those that never interact with graphs. Given we have no graphs now, it seems like a good time to try this if it may be unpopular :-).

Long term, we can move to use IntersectionObserver at a later date so that it appears when they scroll into view, but this would still be needed for resilience (similar to how we do images on mobile). What do you think?

I think the performance penalty here is pretty bad for those that never interact with graphs. Given we have no graphs now, it seems like a good time to try this if it may be unpopular :-).

Long term, we can move to use IntersectionObserver at a later date so that it appears when they scroll into view, but this would still be needed for resilience (similar to how we do images on mobile). What do you think?

I think you might as well remove them immediately :)

Honestly. Think of the user stories here. Its like clicking to load the wikitext. It makes no sense whatsoever. I'd be like netflix with no series/film thumbnails.

I think the performance penalty here is pretty bad for those that never interact with graphs. Given we have no graphs now, it seems like a good time to try this if it may be unpopular :-).

Long term, we can move to use IntersectionObserver at a later date so that it appears when they scroll into view, but this would still be needed for resilience. What do you think?

Okay you twisted my arm. I'll work on lazy loading now as part of this. It's a pretty small change. :-)

On Wikipedia, we historically drew and edited graphs offline, and upload as media files to Commons, to embed in articles. I think that is the baseline we need to operate from. This baseline includes treating the visuals as first-class reuable content.

"Reusable" in this context means (as partly defined by Arch Principles) authored and exportable in a freely-licensed open standard format, which make it legally hostable and distributable. However, for that to be meaningful it must first be technically capable of being presented outside the narrow case of "online, on our canonical website, in a Grade A browser, with a secure and up-to-date copy of our client-side JavaScript modules, and said JavaScript eventually arrived over a stable network, and executed succesfully without browser interference". The long tail we serve here is detailed under General approach, but this includes for example Kiwix offline browsing and DuckDuckGo/Google Image search. I haven't tried our offical Android app recently, but I imagine it would affect offline reading there, too. Siri seems to take semantic HTML and render in a native way with equivalent components (or maybe a WebView with Parsoid-like HTML).

It would pose a major security hazard, not to mention prove very poor coupling on our part, if we imposed limitations on how Apple Dictionary or Kiwix could present this content, through content depending on optional MW and skin-controlled JavaScript modules. The Parsoid HTML specification, for example, includes stylesheets, but not script tags. JS is a luxery we can inherently only offer in a secure way if it is in an online first-party context.

Let us review other media integrations:

  • Video (TMH). <video>
  • Music notes (Score) <img><audio>
  • Maps (Kartographer) <img>
  • STL models (3D) <img>

Each of these provides a cached server-rendered static image, referenced in the base HTML. The images render early, quickly, and cheaply in external contexts and in browsers regardless of support level, can be shared by URL in messenger applications, securely copied into a blog or other website to host, found via Internet-wide Image Search, etc.

The only reason the Graph extension was worth deploying in the first place, is that it added to this baseline — without taking away from even these basic principles. It enabled a wider audience to create and edit graph visualisations without needing desktop software, including mobile editing. However editing convenience cannot come at the cost of compromising access.

All that stopped the moment the Graphoid service disappeared in 2020 (T211881, T242855). For the past 24 months, we have served articles with holes in it where graphs should be in all these contexts, including e.g. someone with a fresly purchased MacBook on default settings looking up Covid in Siri Knowledge or Apple Dictionary app. The rest of us are seeing those same holes on Wikipedia.org as of yesterday.

If we are to invest anything in graphs at all, I think we need to start at our baseline. We have no precedence for success in content model or media integrations where we started at the other end of the spectrum. It seems possible in theory, but it'd be a pretty expensive way to get where we need to be, with each step increasing costs and effectively a losing battle for accessibiltity and performance. Starting at the baseline creates an incentive to work on adding interactivity and doing so in a way that we're motivated to keep (added) maintenance and client costs low.

The closest sibling to Graph is Maps. It renders an <img> that we can set loading=lazy on, with an optional click handler that attempts to render an interactive canvas on-demand. For Graph, the situation has been even less attractive as unlike Maps, where OpenStreetMaps API is designed from the ground-up for effecient client rendering of maps in native apps, Graph would involve downloading potentially megabytes of raw data, and 200KB of JavaScript, and however long that takes to download and process, to get even the same image on the screen. — All uncached browser compute.

Imagine if New York Times articles (try before JS / with JS off) would serve their graphs that way. Their SEO would likely plummet if those graphs took seconds too load and didn't appear in third-party news distributions, apps, feed readers, newsletters, etc. Their graphs are served as embedded SVG with <details> for collapsibility. First-party online, adds a standalone Svelte-compiled script for interactivity. Given the pre-compiled and standalone nature of such script, it likely wouldn't be worth deferring to after a click. Most interactivity is right there through SVG and CSS, with generated effectively-vanilla JS to connect the dots.

My suggestion would be that we recommend the community revisit our baseline, which many editors never stopped doing given limitations in the Graph extension — until there is an owner for the Graph feature that is committed to at least meeting the baseline (e.g. T249419). The owner can then best decide how and whether to restart this project. I'm not sure we should make major changes on our own right now, especially if doing so reduces access even further.

Yes, the best long term solution here would be to render img tags that become interactive but right now we're pushing for an upgrade to Vega 5. Two issues with the Vega 5 approach is 1) we might need to ship the code unminified (I'm working with Roan on that to hopefully avoid it) and since Vega.js has a different browser support matrix to us, we'll need better error handling (which is captured in the patch). The solution in https://gerrit.wikimedia.org/r/c/909758 can be extended in future to lazy load image tags like you suggest.

Change 909758 merged by jenkins-bot:

[mediawiki/extensions/Graph@master] Introduce click to load with error handling for unsupported graphs

https://gerrit.wikimedia.org/r/909758

Jdlrobson changed the task status from Open to Stalled.Apr 24 2023, 11:50 PM

The Facebook article still uses graphs so we can verify the fix here if and when we deploy the newly updated version of the Graph extension.