Page MenuHomePhabricator

TypeError: Cannot read property 'top' of undefined
Closed, ResolvedPublic

Description

@Jdlrobson reported an issue in #talk-to-fundraising that I'm pasting here; please fill in any details I missed.

TypeError: Cannot read property 'top' of undefined
at Object.frb.initNag  <anonymous>:2328:38
at HTMLDocument.<anonymous>  <anonymous>:2444:13

At the time of his posting around 22 UTC today 2021-04-29, there were ~ 200 errors on Spanish and Portuguese Wikipedia.

jdlrobson, on slack, you made it seem like the error was new; could you please confirm that? We did put two new banners up a couple hours ago - B2021_0429_ptBR_m_p1_lg_txt_cnt and B2021_0429_ptBR_m_p1_lg_txt_contextual - but neither of them contain that frb.initNag object.

Event Timeline

Thanks for raising the ticket! https://grafana.wikimedia.org/d/000000566/overview?viewPanel=16&orgId=1 shows the uptick for this error compared to normal in case you are interested in impact.

I think it might be this one ? https://meta.wikimedia.org/wiki/MediaWiki:Centralnotice-template-B2021_120500_en6C_dsk_p1_lg_rcr_paypal_twin2 or something loosely based on that code. offset() can return undefined for elements it cannot find.

I found a related error "TypeError: undefined is not an object (evaluating 'hash.offset().top')"

Huh, I thought we'd fixed this at some point but I guess not. It only happens for desktop large and you're correct it's when there's a fragment in the URL which can't be found: either a section that has been deleted/renamed, or a lot of the URLs appear to be MediaViewer

Pcoombe claimed this task.

Okay, this is fixed in current best (diff) plus all the currently live banners in Brazil, Mexico, and Sweden.

Thanks @Pcoombe the errors seem to have stopped now!

Jdlrobson reopened this task as Open.EditedFri, Apr 30, 9:54 PM
Jdlrobson triaged this task as Unbreak Now! priority.
Jdlrobson added a subscriber: thcipriani.

I spoke too soon. This is still active and just triggered our alerting systems. There have been 12,721 errors in the last 12hrs. cc @thcipriani in case you were wondering what's going on.

https://logstash.wikimedia.org/goto/545fd39842e3c16cfa78a0bd057b652f

@Jdlrobson This isn't banners now. There are lots of errors on zh.m.wikipedia.org where we don't even run fundraising banners.

Most of the stack traces look like this, so it looks like something to do with RelatedArticles? I opened a new task T281611

at Object.isElementInViewport  <anonymous>:231:41753
at Object.isElementCloseToViewport  <anonymous>:232:54
at loadRelatedArticles  <anonymous>:103:7018
at HTMLDocument.showReadMore  <anonymous>:104:36
at mightThrow  https://fr.m.wikipedia.org/w/load.php?lang=fr&modules=jquery%2Coojs-ui-core%2Coojs-ui-widgets&skin=minerva&version=udccx:49:833
at process  https://fr.m.wikipedia.org/w/load.php?lang=fr&modules=jquery%2Coojs-ui-core%2Coojs-ui-widgets&skin=minerva&version=udccx:50:498
Jdlrobson reopened this task as Open.EditedFri, Apr 30, 10:27 PM

I don't thin we can discount banners just yet. The isElementInViewport bug is a known bug but usually comes in small numbers. As far as I can see it relates to incompatible scroll events being registered, so if there's a banner registering a scroll event this could be the cause.

A large amount of these errors seem to have a stack trace originating in RelatedArticles but that extension hasn't been touched since 2016.
Given there was no train this week to English Wikipedia, I suspect the banner (same or maybe a different one?) is having side effects on other extensions around scrolling. There are also bugs appearing in other products which use scroll events such as RevisionSlider. Given RelatedArticles is run on every page view, that's the reason the majority would be coming from there.

If you look at the time these errors started they all begin at the same time and since nothing got backported there, I feel this has to be a banner .https://wikitech.wikimedia.org/wiki/Deployments#Thursday,_April_29

Note the banners here seem to be running on mobile and https://meta.wikimedia.org/wiki/MediaWiki:Centralnotice-template-B2021_120500_en6C_dsk_p1_lg_rcr_paypal_twin2 seems to be desktop only so I think that was a red herring.

I'm not 100% sure what's going on here and where the fix lives, but I'm keen to investigate more and how a banner and RelatedArticles could be incompatible. To do that though I need to know what banners went live, where the code is, and how I can replicate them in my browser.

Update: Correction - the 1.37.0-wmf.3 train did roll out in the end (misread Lars's email). I'm currently looking at core code to see if anything may have changed there.

@Jdforrester-WMF could your changes as part of T250068 have impacted the jQuery.offset function ? Would that have rolled out this week?

Another possibility, is something went wrong in T277958 and as a result the HTML is loading but not the javascript

Jdlrobson renamed this task from Error with frb.initNag in some banner(s) on Spanish and Portugese Wikipedias to TypeError: Cannot read property 'top' of undefined.Fri, Apr 30, 10:45 PM

There's no banner changes that line up with that time. Since 16:00 UTC today, our only banners have been on English and Portuguese Wikipedias in Brazil. Here are preview links for the mobile ones:

Other than that, in the past weeks we have only been running banners on Spanish and Swedish Wikipedias.

Okay thanks for ruling that out @Pcoombe I guess that tells us at this point this does relate to the last train.

Change 683999 had a related patch set uploaded (by Jdlrobson; author: Jdlrobson):

[mediawiki/extensions/RelatedArticles@master] Hotfix: loadRelatedArticles should consider existence of container element

https://gerrit.wikimedia.org/r/683999

@phuedx @Jdrewniak my day is wrapping up but my best theory is we might have a caching problem here (but i have no idea how). I recommend we backport the above on Monday at earliest opportunity. I can replicate the error when I do the following:

$( '.read-more-container' ).remove();
var readMore = $( '.read-more-container' ).get( 0 );
mw.viewport.isElementCloseToViewport( readMore, 300 );

That said, I have not seen this in the wild and I have no idea why this would suddenly be happening at such a high rate without any changes to the RelatedArticles client and it's frustrating not knowing the answer to that. If anyone can solve this mystery I will be eternally grateful.

Change 683999 merged by jenkins-bot:

[mediawiki/extensions/RelatedArticles@master] Hotfix: loadRelatedArticles should consider existence of container element

https://gerrit.wikimedia.org/r/683999

Change 684393 had a related patch set uploaded (by Jdrewniak; author: Jdlrobson):

[mediawiki/extensions/RelatedArticles@wmf/1.37.0-wmf.3] Hotfix: loadRelatedArticles should consider existence of container element

https://gerrit.wikimedia.org/r/684393

Change 684393 merged by jenkins-bot:

[mediawiki/extensions/RelatedArticles@wmf/1.37.0-wmf.3] Hotfix: loadRelatedArticles should consider existence of container element

https://gerrit.wikimedia.org/r/684393

Mentioned in SAL (#wikimedia-operations) [2021-05-03T18:19:49Z] <urbanecm@deploy1002> Synchronized php-1.37.0-wmf.3/extensions/RelatedArticles/resources/ext.relatedArticles.readMore.bootstrap/index.js: cf9d9da3bf272d33c2d9b29d9172b1c81bfd8beb: Hotfix: loadRelatedArticles should consider existence of container element (T281547) (duration: 00m 57s)

Jdlrobson lowered the priority of this task from Unbreak Now! to Medium.Mon, May 3, 9:03 PM


The bug has disappeared (note the small spikes to the right are false positives as there are other bits of code that match the error message.
Will check in tomorrow to ensure these have completely disappeared. Thanks for getting that deployed Jan I really appreciate it! <3

28 errors in last 12hrs which I am going to attribute to caching. Am filtering this out as logspam in the dashboard.