Page MenuHomePhabricator

TypeError: can't access dead object
Closed, ResolvedPublic

Description

Error began today

Screen Shot 2021-05-11 at 8.28.47 AM.png (582×2 px, 122 KB)

14,340 since
3,021 in last hour = unbreak now.

Much of the stack trace points to sessionTick (T248987) onVisibilitychange method or setActive methods in modules/ext.wikimediaEvents/sessionTick.js
https://codesearch.wmcloud.org/search/?q=onVisibilitychange&i=nope&files=&excludeFiles=&repos=

https://logstash.wikimedia.org/app/dashboards#/doc/logstash-*/logstash-2021.05.11?id=7w0LXHkBWKe2MTdRS2k1

at $.cookie URL1:338:169
at get URL1:449:241
at run URL1:102:146
at setActive URL1:102:613
at onVisibilitychange URL1:102:951

URL1: https://en.wikipedia.org/w/load.php?lang=en&modules=ext.centralNotice.bannerHistoryLogger%2CchoiceData%2Cdisplay%2CgeoIP%2CimpressionDiet%2CkvStore%2ClargeBannerLimit%2ClegacySupport%2CstartUp%7Cext.centralauth.centralautologin%7Cext.cite.ux-enhancements%7Cext.cx.eventlogging.campaigns%7Cext.eventLogging%2CnavigationTiming%2Cpopups%2CwikimediaEvents%7Cext.growthExperiments.SuggestedEditSession%7Cext.uls.common%2Ccompactlinks%2Cinterface%2Cpreferences%2Cwebfonts%7Cjquery%2Coojs%2Coojs-router%2Csite%7Cjquery.client%2Ccookie%2CmakeCollapsible%2CtextSelection%7Cjquery.uls.data%7Cmediawiki.String%2CTitle%2CUri%2Capi%2Cbase%2Ccldr%2Ccookie%2Cexperiments%2CjqueryMsg%2Clanguage%2Cstorage%2Ctoc%2Cuser%2Cutil%7Cmediawiki.editfont.styles%7Cmediawiki.libs.pluralruleparser%7Cmediawiki.page.ready%7Cmediawiki.ui.button%2Cicon%7Cmmv.bootstrap%2Chead%7Cmmv.bootstrap.autostart%7Cskins.vector.legacy.js%7Cuser.defaults&skin=vector&version=p6kbr

Event Timeline

Jdlrobson triaged this task as Unbreak Now! priority.May 11 2021, 3:35 PM

338:169 @ URL1 is

var cookies=document.cookie.split('; ')

Something's trying to access the cookie jar while the document is being destroyed?

Looks like a Firefox-specific error: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Errors/Dead_object

The JavaScript exception "can't access dead object" occurs when Firefox disallows add-ons to keep strong references to DOM objects after their parent document has been destroyed to improve in memory usage and to prevent memory leaks.

We're not in an add-on here, though...

I've bumped the alerting threshold for now to give us more time to fix since this doesn't seem to be impacting users.

I think it makes set to wrap mw.cookie.set( lastTickTime, now ); in a try/catch block for now.

I noticed for editing URLs, the users editing pages were rollbackers. Date.now().toString is not safe in this scenario, so PageTriage (T272904 ) could possibly be to blame, but that wouldn't explain the dead object message.

It's possible an ad blocker/or privacy tool is removing $.cookie, but I don't think there's much value in debugging this further.

Additional notes:

I think it makes set to wrap mw.cookie.set( lastTickTime, now ); in a try/catch block for now.

I don't think we should modify production code to accomodate unsupported/unsupportable use cases. Especially not in a way that allows code to continue executing in unchartered waters. In an unknown codition like this, the specific module should not continue executing. Exceptions exist for this reason, and clearly the developers of Firefox also thought it made sense to throw in that case so as to halt further execution of related code. For example, do we know that try-catching this won't cause the surrounding code to make incorrect assumptions and continue in a way that may be create worse outcomes for the data end/or end-user experience? (not per se in a way that resutls in an exception)

If we want to ignore certain errors, ignore them, in Logstash. But let's not start a habit of try-catching everything just because a user sent some spam to our logs. The singularity this results in is a lot of wasted staff time, more brittle code that executes in ways we've never thought about or tested, inevitable confusion during refactors for why something exists, and of cource the circle of life in which these refactors do make the sensible decision to remove impossible conditions and then we start again.

It's possible an ad blocker/or privacy tool is removing $.cookie, but I don't think there's much value in debugging this further.

Per Sam's comment before this, the stack trace reports a line inside that method, with code that matches our source. I think that meant the method exists just fine and was not modified. It's thrown from the document.cookie accessor, not from accessing $.cookie (there is no throw statement in that method).

It affects Firefox on Linux, macOS and Windows, and seems diverse also across users, countries, and wikis; and happens at a rate that does not seem insignificant considering it is unsampled, and in relation to the expected pageview traffic from Firefox. Do we actually know that there isn't a problem? Even if it turns out to be caused by a browser extension, at this rate, it seems possible that it might be unintentional and that perhaps we need to reach out to the author's of such extension to understand what the intended behaviour/outcome should be.

I have a theory. I opened a tab yesterday, made sure I was bucketed and minimized it.

I just maximized that tab today and experienced some severe thrashing of resources. I watched it set > 1000 cookies in a short period of time (now it's at 8000 and counting). I suspect there's a point when for protection Firefox throws an error given the error relates to preventing memory leaks (I haven't got to that stage yet, but then my computer is probably more capable than most).

Screen Shot 2021-05-13 at 9.23.21 PM.png (936×2 px, 635 KB)

I should note it also made > 1000 beacon pings in a short space of time:

Screen Shot 2021-05-13 at 9.31.44 PM.png (1×1 px, 300 KB)

I think this script needs to be restricted to clear the interval it sets up at some point, rather than run indefinitely...

I've minimized the tab again. I will report back if I hit the error.

That aside, It doesn't explain why it started spiking suddenly.

Jdlrobson lowered the priority of this task from Unbreak Now! to Needs Triage.May 14 2021, 8:32 PM

The error rate has tailed off. It's still our 2nd most prominent error, after the upstream https://github.com/SamsungInternet/support/issues/56

Such unthrottled dispatching of cookies and beacons seems like a clear bug in the code and a performance problem indeed due to thrashing of CPU and network resources.

@MSantos Could this be prioritised in the current quarter?

Task appears to have fallen in a re-org gap and has not been migrated to a new workboard. Not sure if an isolated issue or not, and I don't know if event instrumentations moved to Product Infra or Product Analytics.

There have only been 69 instances of this error in the last 90 days. We can reopen if the rate increases again.

Sorry, I was using the wrong filter. There were 14031 instances in the last 90 days.
https://logstash.wikimedia.org/goto/d341db097aa24dea158f63f9ee2f9d1d

Jdlrobson claimed this task.

Only 14 instances of this error in last 7 days.