Page MenuHomePhabricator

Uncaught TypeError: Cannot read property 'items' of null / TypeError: null is not an object (evaluating 'view.model().scene().items')
Closed, ResolvedPublic

Description

New bug introduced beginning on 30th November/1st December
https://logstash.wikimedia.org/goto/aa703c04fc4978b557c3b645303b49af

Screen Shot 2020-12-02 at 2.47.42 PM.png (520×1 px, 69 KB)

The bug is mostly seen on Turkish Wikipedia which is still on wmf18, so possibly relates to an existing error that is being surfaced by user generated content. Given the connection to covid19 (the Turkish article is a graph relating to covid) and the large volume of errors and originating in Vega I think this should be considered UBN until at least the cause and impact is diagnosed.

at extendEvent  https://tr.m.wikipedia.org/w/load.php?lang=tr&modules=ext.eventLogging%7Cext.graph.vega2%7Cjquery%7Cmediawiki.page.watch.ajax&skin=minerva&version=1nyqp:579:253
at View.<anonymous>  https://tr.m.wikipedia.org/w/load.php?lang=tr&modules=ext.eventLogging%7Cext.graph.vega2%7Cjquery%7Cmediawiki.page.watch.ajax&skin=minerva&version=1nyqp:578:96
at CanvasHandler.prototype.fire  https://tr.m.wikipedia.org/w/load.php?lang=tr&modules=ext.eventLogging%7Cext.graph.vega2%7Cjquery%7Cmediawiki.page.watch.ajax&skin=minerva&version=1nyqp:479:576
at CanvasHandler.prototype.touchmove  https://tr.m.wikipedia.org/w/load.php?lang=tr&modules=ext.eventLogging%7Cext.graph.vega2%7Cjquery%7Cmediawiki.page.watch.ajax&skin=minerva&version=1nyqp:479:289
at HTMLCanvasElement.<anonymous>  https://tr.m.wikipedia.org/w/load.php?lang=tr&modules=ext.eventLogging%7Cext.graph.vega2%7Cjquery%7Cmediawiki.page.watch.ajax&skin=minerva&version=1nyqp:477:854

Event Timeline

Jdlrobson triaged this task as Unbreak Now! priority.Dec 2 2020, 10:50 PM
Jdlrobson updated the task description. (Show Details)
Jdlrobson updated the task description. (Show Details)
Jdlrobson added a subscriber: Milimetric.

Seems to mostly be impacting mobile traffic.
I can replicte this on my Samsung Galaxy S8. The graph does not render on https://tr.m.wikipedia.org/wiki/COVID-19_pandemisi. This error should at least be handled or a fallback shown:
.

Screenshot_20201203-094028_Chrome.jpg (2×1 px, 216 KB)

I'm not sure who owns this extension from https://www.mediawiki.org/wiki/Extension:Graph, so tagging @Milimetric and teams that may be able to help.

Is anybody actively looking into a fix here? The error rate is still high which is worrying going into the holiday period.

I can replicate this on a slow connection on a mobile device by triggering mouse move events on the graph in https://zh.m.wikipedia.org/wiki/2019%E5%86%A0%E7%8B%80%E7%97%85%E6%AF%92%E7%97%85%E7%96%AB%E6%83%85%E6%99%82%E9%96%93%E8%BB%B8 prior to it fully loading. Usually, an indication that event handlers are being bound too early before code has loaded.

Also by simply navigating to https://zh.wikipedia.org/wiki/Template:Interactive_COVID-19_maps/Per_capita_confirmed_cases
I think this is another example of the longstanding T258426

A short term solution if possible might be disable to Vega error handling.

Clarakosi subscribed.

On the maintainers' page, it says the Editing team is responsible for this extension so I'm untagging Platform Engineering

The large volume might be misleading when trying to estimate the severity of the issue, unless the error logging is smarter than I think. I just reproduced the problem on one of the pages and got 700+ instances of this error, with another one happening every time I moved my mouse cursor.

The large volume might be misleading when trying to estimate the severity of the issue, unless the error logging is smarter than I think.

The error logger will log at maximum 5 errors per page view (we introduced this after T257872 was overloading things).

The concern with this bug is that the addition of a graph to a single page created a spike. On larger wikis if a graph is added to a popular page the volume here could be catastrophic and overload the service. We will be deploying to English Wikipedia in January, and I'd rather we proactive then reactive here.

I just reproduced the problem on one of the pages and got 700+ instances of this error, with another one happening every time I moved my mouse cursor.

Could you add some reproduction steps to the task? I've been unable to do this myself. This bug does sound a lot like a race condition similar to T257872 and might be happening more often on slower connections.

@Jseddon is going to work on this per the conversation he and @marcella had offline.

I did some poking around and right now I think the cause of the errors IS on the content side of stuff.

I was able to consistently force this on testwiki just by using two slightly different variants of a template:

Without error (sourced from en-wiki): https://test.wikipedia.org/wiki/Template:Interactive_COVID-19_maps/Cumulative_confirmed_cases-NOBUG
With error (sourced from ms.wiki): https://test.wikipedia.org/wiki/Template:Interactive_COVID-19_maps/Cumulative_confirmed_cases-BUG

A single change, providing a default value seems to fix this:

https://test.wikipedia.org/w/index.php?title=Template:SeddonCovid&diff=460762&oldid=460729

This doesn't resolve the fact that it is possible to produce large amounts of production errors due to errors in content.

Jdlrobson lowered the priority of this task from Unbreak Now! to High.EditedDec 16 2020, 4:27 PM

Thanks @Jseddon for looking into this but I haven't seen any change in the logs. Perhaps caching is a factor?

Screen Shot 2020-12-16 at 8.21.50 AM.png (810×2 px, 185 KB)

https://logstash.wikimedia.org/goto/42672492758843d4c6c09452da939a9b

I think the majority of open graph bugs do relate to user content in some way. Is there a way we could validate such input and present feedback to the editor/reader rather than rely on the error logs? Do these graphs have a specific JSON specification we can validate with?

The numbers are lower today than they've historically been and sub-1000 (presumably because of lower page views) so I'm adjusting this from UBN to high.

Yesterday, I forced a purge of the cache for both the pages and the template and it seems to have had a profound effect on the number of errors.

Screen Shot 2020-12-17 at 8.48.55 AM.png (396×1 px, 65 KB)

You did it! W00t!

Does a ticket exist for a general solution to bad user input? If so this can be resolved.

(I'm only tangentially involved with client-side graphs as they relate to the graphoid replacement I'm trying to prioritize. So I'm happy to see @Jseddon slaying this bug. And I'm working with product to prioritize a bigger effort to maintain graphs going forward; progress is very slow there)

Thanks @Jseddon I appreciate it!

@Milimetric we should talk. I'm also trying to prioritize this but from a different angle :)