Page MenuHomePhabricator

Some pages start to load on Commons and Wikipedia but then go blank
Closed, ResolvedPublic

Description

Author: Eugene.Zelenko

Description:
Hi!

User and user talk pages are slow to load on Commons for several last days. Probably problem relates to Commons Gadgets.

See also http://commons.wikimedia.org/wiki/Commons:Village_pump#Pageview problems

Eugene.


Version: unspecified
Severity: major

Details

Reference
bz47457

Event Timeline

bzimport raised the priority of this task from to Unbreak Now!.Nov 22 2014, 1:23 AM
bzimport set Reference to bz47457.

bidgee-wiki wrote:

Example of a userpage that loads but then redirects to wyciwyg://000/url

Also see http://commons.wikimedia.org/wiki/Commons:Village_pump#Anyone_else_have_problem_with_opening_user_pages_in_Firefox.3F

Attached:

Screen_Shot_2013-04-21_at_11.03.23_PM.png (162×553 px, 47 KB)

Using the "Firebug" tool in Firefox, http://commons.wikimedia.org/wiki/User:Huntster opens within 726ms here.
Any chance to use a debug tool that displays which parts load slowly?

Plus after reading the thread on Commons:VP I'm not convinced that this is a server-side problem if this only happens in Firefox... :)

Being logged in to Wikipedia, I could reproduce the problem in Firefox 18 by going to https://commons.wikimedia.org/wiki/Commons:Village_pump#Anyone_else_have_problem_with_opening_user_pages_in_Firefox.3F and trying to open User:Bidgee and User:Moogsi
It was reproducible until I enabled "Preferences > Privacy > Tracking > Tell websites I do not want to be tracked" and disabled that Firefox setting again.

Also, when I could still reproduce the problem I had the feeling that the page started to load for a moment and then loading stopped. When I opened the page in a new tab and tried to start Firebug ("F12" key) in that tab immediately, it seemed to load successfully.
Note that on this affected machine, I use the "RefControl 0.8.16" Firefox extension ("Control what gets sent as the HTTP Referer on a per-site basis") with a default of "<Block>".

On another machine with Firefox 20 and "Tell websites I do not want to be tracked" enabled, going to https://commons.wikimedia.org/wiki/Commons:Village_pump#Anyone_else_have_problem_with_opening_user_pages_in_Firefox.3F and trying to open User:Bidgee and User:Moogsi I have no problems.

Note that two users on the VP thread also report issues with Safari resp. IE.

Personally I was thinking of a bad side effect of bug 25886 ("Make bits.wikimedia.org Cross-Origin Resource Sharing (CORS) compatible") but that is not merged yet. Bug 47202 ("wikibits: FF2Fixes.css should not be loaded in Firefox 20") is not merged yet either.

camjsb7j9g wrote:

I can still reproduce the problem in Firefox 20.0.1. It appears to depend on a combination of gadgets enabled in preferences, see http://commons.wikimedia.org/wiki/Commons:Village_pump#Workaround

bidgee-wiki wrote:

Disabling WikiLove in the "Preferences" seems to fix the issue (for both Firefox and Safari). "Tell
websites I do not want to be tracked" was only a temporary fix, since the issue returned, whether it was selected or not.

Disabling WikiLove in my Wikimedia Commons preferences solved all my problems on the Commons with opening user pages and user talk pages in the same tab or in a new tab. WikiLove can be disabled in preferences on the miscellaneous tab:
*[[commons:Special:Preferences]]

This works for me in all browsers tested whether logged in or not. I have tested Firefox, Explorer, Chrome, and Opera browsers.

In Firefox I have the box checked for "Tell websites I do not want to be tracked".

Again summarizing:

To reproduce: Start Firefox, be logged in, go to
https://commons.wikimedia.org/wiki/Commons:Village_pump#Anyone_else_have_problem_with_opening_user_pages_in_Firefox.3F
and search for the string

User:Bidgee and User:Moogsi

Make both links open in a new tab.
They are shown for a short moment but then get stuck reloading.
The same happens in "Privacy mode" in Firefox, but only after logging into Wikipedia.

This does not happen for the pages

User:Mattbuck, and User:Rillke

(Search for that string too), so this likely has to do with stuff(TM) on the specific user pages.

I cannot reproduce in Opera 12.15 and Google Chrome 24.

Refering to https://commons.wikimedia.org/wiki/Commons:Village_pump ,
basically all users use Firefox (on Windows, Linux, Mac OS).
One user states he can reproduce in IE10.
One user states it happens in Safari, but no other mentions of Safari.

There are a number of people that state that "disabling WikiLove in preferences solves all problems".

A screenshot about this bug would just be an empty loading "Connecting..." tab in Firefox, hence not helpful.

No idea what's causing this, but there have been no significant changes to WikiLove since January.

I can't reproduce this bug. I followed the reproduction steps exactly. Both User:Moogsi and User:Bidgee's user pages load in 2-3 seconds for me in new tabs in Firefox. And I do have WikiLove turned on.

I'm in Firefox 20.0 on MacOSX.

Can anyone reproduce this bug with a completely new Commons account? If so, please post your OS and browser version so I can reproduce and start debugging. Any other help narrowing it down would be appreciated.

I was able to reproduce this bug only on Firefox 20.0 with https enabled. Using FF17, Chrome, or FF20 via http all work, but https + ff20.0 gets stuck.

Insert the following into User:yourusername/common.js
<code>mw.loader.using('ext.gadget.editDropdown', function () {

mw.libs.commons.ui.addEditLink(
        '//toolserver.org', 
        "Toolserver", 
        'p-Toolserverlink', 
        "Toolserver. Driven by Wikimedia Germany"
);

});</code>

I'm not entirely sure why this is happening yet, but its something with ff20 + https. Will look more into it.

Commons VP is reporting that this problem disappears if users disable WikiLove

"After I disabled WikiLove in Special:Preferences#mw-prefsection-misc, it is also working."

From the native FF web console (With wikiLove enabled):
[00:54:51.281] GET https://bits.wikimedia.org/commons.wikimedia.org/load.php?debug=false&lang=en&modules=ext.wikiLove.local&skin=vector&version=20130419T023547Z&* [HTTP/1.1 304 Not Modified 155ms]
[00:54:51.402] ReferenceError: mw is not defined @ https://bits.wikimedia.org/commons.wikimedia.org/load.php?debug=false&lang=en&modules=ext.wikiLove.local&skin=vector&version=20130419T023547Z&*:1

The problem now also popped up on en.wp
https://en.wikipedia.org/wiki/Wikipedia:Village_pump_(technical)#Hanging_userpages

Interesting: If I open in a new tab in Firefox and quickly switch to that tab to give it the focus, the page *does* render. If I don't, it doesn't.

I have provided some more info at Bug 47607 (especially the HTML source is interesting). Didn't notice this one.

  • Bug 47607 has been marked as a duplicate of this bug. ***

sorry unduping. initial report is something else.

(In reply to comment #18)
After reading http://commons.wikimedia.org/w/index.php?title=Commons:Village_pump&oldid=94870901#Pageview_problems I am not entirely sure whether it is about slow pages or pages that go blank after loading.

There definitely is something weird going on.

Perhaps something is overwriting mw object, or maybe something in FF 20's load sequence has changed causing the document.write of our bootloader to break ?

(the fact that it often occurs when opening as a tab could be explained by the fact that redraw's are likely deferred when you open in another tab, which would significantly effect execution time/order of scripts I presume).

The fact that it is only wikiLove could possible be explained by the fact that mw is not passed into the closure for wikilove (whereas most of our scripts do have that in the closure). That means that if something contaminates the mw global, that it would break (per Krinkle's wisdom).

It's a complicated issue for sure.

(In reply to comment #20)

something is overwriting mw object

Don't agree. The script affected in Bug 47607 uses an IIFE passing "mediaWiki" and "jQuery" in, and an error occurs: "jQuery is undefined"
This would also not redirect to a WYCIWYG URL nor would it explain that page loading seems never to finish.

that it is only wikiLove

Again, as described in Bug 47607, it also occurs at other places, though it is harder to (re)produce.

something contaminates the mw global

... and setting it to undefined? No. Really.

It's a complicated issue for sure.

Please don't mystify it :P

This is also occurring on English Wikipedia:
[[Wikipedia:Village_pump_(technical)#Hanging_userpages]]

I am seeing this bug in Firefox 20 on Xubuntu linux but Konqueror 4.8.5 on the same system everything works as it should do. It does not affect viewing my own talk page, but it does every other one I've tried.

The above is referencing the English Wikipedia. English Wiktionary seems unafected.

Spent some more time debugging this, I still dont know exactly whats going on but I shrank it down to the following test case.

To get this test case shown in the brower i used paros to perform an SSL MITM on myself and replace the page contents. If i inline the first <script> tag twords the bottom it has if (isCompatible()) { document.write(...); }. Commenting out that document.write also stops the bug from occuring localy.

I can't fix this, but hopefully it gets someone else closer.

<!DOCTYPE html>
<html>
<head>
<title>Bug demo</title>
<!-- Bug only occurs with debug=false -->
<script src="https://bits.wikimedia.org/commons.wikimedia.org/load.php?debug=false&amp;lang=en&amp;modules=startup&amp;only=scripts&amp;skin=vector&amp;*"></script>
<script>if(window.mw){ mw.loader.load(["mediawiki.page.startup"]); }</script>
</head>
<body>
Bug Demo
<script>
if(window.mw){

// Bug is only triggered when initializing wiki Love
mw.loader.load(["ext.wikiLove.init"],null,true);
// Bug also seems to only be triggered when this using statement exists
mw.loader.using('ext.gadget.editDropdown', function () {
    // Intentionally blank, the using statement is needed to trigger bug
    // but no content is necessary
});

}
</script>
</body>
</html>

I forgot to mention the steps to reproduce for above.

  1. Use FF20
  2. Clear the cache completely
  3. Tell firefox to use your MITM proxy
  4. Visit any page on commons, replace the content with the above test case via proxy. Dont replace the headers, we need future loads to return 304 Not Modified
  5. First time through it will load just fine and display 'Bug Demo' on page
  6. Mouse into url bar, push enter to reload page
  7. At this point the page should go blank with the loading spinner running but nothing happening.

One thing that is somewhat unique about WikiLove is that it fires its own mw.loader.using() from ext.wikiLove.init.js (to load the local WikiLove settings).

I can reproduce it easily enough in Firefox 20 on Commons. When it occurs, Firebug's "net" window shows 7 connections to various https://commons.wikimedia.org/w/index.php?title=$TITLE&action=raw&ctype=text/javascript URLs, and occasionally one with the same pattern at meta. A tcpdump shows no attempts to actually load these pages, though. I also see the error mentioned in comment #14, and confirm that disabling WikiLove causes the bug to not occur.

I've also found some other conditions:

  • There seems to be some sort of cache involvement: if I clear the browser's cache, the first attempted load of one of the affected userpages works fine while subsequent attempts fail. Also, the error from comment #14 doesn't occur in this case.
  • It only occurs when the page is loading in a background tab. A normal click works fine, as does opening in a foreground tab. OTOH, a normal click and then quickly switching tabs before the page starts to load *will* show the bug, as will hitting enter in the URL bar and then quickly switching tabs. And opening in a background tab and then quickly switching to that tab before the page starts to load does not show the bug.
  • HTTPS is *not* required here, it occurs for me when using HTTP as well.
  • Unfortunately, adding "?debug=true" to one of the affected userpage links makes the bug stop occurring.

But the most interesting bit is that it seems the document.write inside resources/mediawiki/mediawiki.js is somehow going awry: inspecting the DOM of the bugged page shows that the DOM contains only a script tag, rather than all the HTML of the page.

After some further debugging with a local proxy that rewrites the downloaded jQuery on the fly to inject console.log statements, it appears that this is more or less the same thing as bug 31895: somehow or other the document.write call is sneaking in between the "closing" of the document and the calling of the DOMContentLoaded callback that sets $.isReady.

I suspect that gerrit change Ib54a3e78 somehow or other caused this, but the real bug is that $.isReady (even after the fix for bug 31895) is still not reliable for whether document.write can be safely used. I don't know whether it's a correct fix, but I do note that adjusting the "if ( $.isReady || async )" on line 849 of resources/mediawiki/mediawiki.js to "if ( document.body || $.isReady || async )" makes it so I can no longer reproduce the bug.

(In reply to comment #25)

If i inline the first <script> tag
twords the bottom it has if (isCompatible()) { document.write(...); }.
Commenting out that document.write also stops the bug from occuring localy.

Doesn't that basically prevent jQuery from being loaded at all?

(In reply to comment #29)

(In reply to comment #25)

If i inline the first <script> tag
twords the bottom it has if (isCompatible()) { document.write(...); }.
Commenting out that document.write also stops the bug from occuring localy.

Doesn't that basically prevent jQuery from being loaded at all?

Yes that part loads jQuery. I hadn't looked close enough yet to see what exactly that was loading, was still trying to prune down the # of loaded files to get a better idea of which part is causing it. Really I didn't get very far at all once in-lining the script, so i backed up and posted the test case i was working with pre-inlining. Looks like your on the right path, good luck!

Perhaps the best solution is just to remove document.write from addScript alltogether ? I don't think there is a usecase for it anymore. Krinkle said something like that this afternoon.

(In reply to comment #31)

Perhaps the best solution is just to remove document.write from addScript
alltogether ? I don't think there is a usecase for it anymore. Krinkle said
something like that this afternoon.

That should certainly work, if there's no usecase.

  • Bug 47625 has been marked as a duplicate of this bug. ***

Retargeting again to MW/RL despite Comment 0

There's one use case for the blocking (document.write) code path. Namely the top loading queue.

Though I suppose it is harmless there as there is no way the document will be ready there.

My intention is to refactor mw.loader to default to async and only use the synchronous method when explicitly told so (no more magic switching based on document-ready).

This issue is quite disruptive. Should we disable WikiLove until this is fixed ? It doesn't fix the rootcause, but at least it makes it less annoying for most people and will avoid that half the editors disable WikiLove in their prefs and never re enable again.

Perhaps so, or disable ext.gadget.editDropdown. I'm not sure what editDropdown is used for though, or which other gadgets might use that module.

Related URL: https://gerrit.wikimedia.org/r/60947 (Gerrit Change Ie88ceefd5fd0d63de4d5767efeee8bbdc5a323de)

I'm going to go ahead and disable WikiLove on all projects except test.wikipedia for now. I'll also send out a note to wikitech-ambassadors about the outage. These are the projects that will be affected:
arwiki
commonswiki
enwiki
fawiki
fawiktionary
hewiki
hiwiki
huwiki
incubatorwiki
iswiki
itwikiquote
itwikivoyage
jawiki
mediawikiwiki
mkwiki
mlwiki
nowiki
officewiki
orwiki
ptwiki
sewikimedia
siwiki
srwiki
svwiki
svwikinews
trwiki
urwiki
zhwiki

https://gerrit.wikimedia.org/r/60947 (Gerrit Change Ie88ceefd5fd0d63de4d5767efeee8bbdc5a323de) | change APPROVED and MERGED [by Kaldari]

I've disabled it on the wikis listed above. It's still enabled on test.wikipedia.org for debugging purposes.

(In reply to comment #37)

disable ext.gadget.editDropdown

This is a non-default module which is at Commons only.

After WikiLove being disabled, this starts appearing on [[Special:Preferences]] on zhwiki.

Per report from other Wikipedia users, affected pages include more, for example, Special:UserLogin with useNew=1 on zhwiki.

(In reply to comment #1)

Example of a userpage that loads but then redirects to wyciwyg://000/url

(In reply to comment #14)

From the native FF web console (With wikiLove enabled):
[00:54:51.281] GET
https://bits.wikimedia.org/commons.wikimedia.org/load.
php?debug=false&lang=en&modules=ext.wikiLove.
local&skin=vector&version=20130419T023547Z&*
[HTTP/1.1 304 Not Modified 155ms]
[00:54:51.402] ReferenceError: mw is not defined @
https://bits.wikimedia.org/commons.wikimedia.org/load.
php?debug=false&lang=en&modules=ext.wikiLove.
local&skin=vector&version=20130419T023547Z&*:1

This also happened to some users for Special:LanguageStats https://www.wikidata.org/w/index.php?title=Wikidata:Contact_the_development_team&oldid=31233143#Problem_with_getting_results

Assigning to Brad right now as Timo is out today.

One thing to keep in mind: now that WikiLove is disabled, but it appears it either broke things worse or *at least* didn't fix things, let's try to figure out how to get it turned back on asap without causing more confusion in the debugging.

I don't think turning off WikiLove made it worse. The underlying problem wasn't in WikiLove, it's just that WikiLove surfaced the bug on a large number of pages. I imagine the bug can affect any extension or special page that uses mw.loader.

At this point I would suggest trying to find which recent revision triggered the bug and revert it for the time being.

Related URL: https://gerrit.wikimedia.org/r/61057 (Gerrit Change Ic3d0c937268d0943d2f770f3ca18bcf4e1eed346)

btw what about the idea to add a short snippet of JavaScript at the end of <body> flipping $.isReady (or any other similar flag) so it's done before document.write becomes unsafe but after every other script requiring sync execution has been executed?

https://gerrit.wikimedia.org/r/61057 (Gerrit Change Ic3d0c937268d0943d2f770f3ca18bcf4e1eed346) | change ABANDONED [by Anomie]

Krinkle -2ed my changeset, pointing to some unspecified "relevant" bug that is not this one. Fine, he can fix it.

(In reply to comment #52)

Krinkle -2ed my changeset, pointing to some unspecified "relevant" bug that
is not this one. Fine, he can fix it.

The bug title changed and there's over 50 comments. I didn't find the comment I was referring to so I presumed it was a different bug. However it looks like it was this bug indeed. Here's the comment I was referring to:

(In reply to comment #35)

There's one use case for the blocking (document.write) code path. Namely the
top loading queue.

Though I suppose it is harmless there as there is no way the document will be
ready there.

My intention is to refactor mw.loader to default to async and only use the
synchronous method when explicitly told so (no more magic switching based on
document-ready).

Earlier on April 25 TheDJ mentioned this bug to me and the above is how I recommended fixing it. Later James assigned the bug, but while I implemented the above locally already, testing locally in Firefox did not seem to fix the bug.

The change Ic3d0c937268d0943d2 implements a similar approach, though by using an additional global mw.loader state and <script> tag to set it instead of simply passing it as an argument (like it already does for the wgResourceLoaderExperimentalAsyncLoading feature flag).

Aside from it being a slightly less ideal approach, as I mentioned, it didn't fix it for me (though it is a race conditional, so it might fix it).

Also note that even though it does not fix it in for the masses because our html output is statically cached for 30 days.

I'' re-open the change and amend it to work the way I proposed it 4 days ago in the comment quoted above.

https://gerrit.wikimedia.org/r/61057 (Gerrit Change Ic3d0c937268d0943d2f770f3ca18bcf4e1eed346) | change RESTORED [by Krinkle]

The method mw.loader.load already supports an 'async' flag which is currently false by default (as used by the top queue) and turned on by the bottom queue.

So swapping the defaults or introducing a global state doesn't change anything except for calls not in the html output.

I believe that all cases where this bug happens are cases where a module is calling mw.loader (again) from inside the javascript execution and does so at initial run-time (not after an event but immediately at mw.loader.implement).

Adding the global state variable would not fix it for cached pages.

Flipping the default will since ResourceLoader will deploy the module in ~ 5 minutes globally.

(from internal email)

The bug not being fixed for users of statically cached html output (e.g. Squid cache at Wikimedia for anonymous users) isn't considered a priority at this point. So far we've only triggered this bug (calling mw.loader from an async script) in modules loaded for logged-in users. Though that is a pure coincidence it is current state.

However I just discovered another problem with this change. See also comments on Ic3d0c9372. I suspect we might break the top queue for all cached output as what makes the top queue blocking will now be in the html.

This means for Wikimedia that anonymous users will experience the top queue as non-blocking as well for the next 30 days. That could cause all kinds of unpredictable bugs.

Krinkle and Roan are discussing a fix for this one now in the #wikimedia-dev IRC channel.

(In reply to comment #48 from Kaldari)

At this point I would suggest trying to find which recent revision triggered
the bug and revert it for the time being.

Unfortunately, I believe that would mean reverting gerrit change Ib54a3e78, which is the attempted fix for bug 46401, which was also a problem for WikiLove.

Related URL: https://gerrit.wikimedia.org/r/61494 (Gerrit Change Ic3d0c937268d0943d2f770f3ca18bcf4e1eed346)

Related URL: https://gerrit.wikimedia.org/r/61495 (Gerrit Change Ic3d0c937268d0943d2f770f3ca18bcf4e1eed346)

Change is being deployed now, and then we'll re-enable WikiLove after a final verification. Should be finished in under an hour, assuming the scap works this time.

Related URL: https://gerrit.wikimedia.org/r/61504 (Gerrit Change Ifdf7442e814680dd356ca1db00796db3645d17fe)

https://gerrit.wikimedia.org/r/61504 (Gerrit Change Ifdf7442e814680dd356ca1db00796db3645d17fe) | change APPROVED and MERGED [by Anomie]

(In reply to comment #54)

https://gerrit.wikimedia.org/r/61057 (Gerrit Change
Ic3d0c937268d0943d2f770f3ca18bcf4e1eed346) | change RESTORED [by Krinkle]

This fix causes breakage of any user of site scripts which assume that they're loaded before $(document).ready(). Maybe someone can move jQuery.ready() call to a position just before </body> and after all other script references?

And it also breaks the old 'mwCustomEditButtons' interface to add custom buttons to the old edit toolbar, which apparently everyone and their dog are using across the wikis. (Since the user JS is loaded from the bottom queue, but the toolbar script either from the top or using inline <script> tags with $.ready inside.)

Reopening. Moving the jQuery.ready() call after the bottom queue seems like a good idea.

(In reply to comment #63 by Liangent)

This fix causes breakage of any user of site scripts which assume that
they're loaded before $(document).ready().

Liangent: Any idea how often that is the case?

(In reply to comment #63 by Liangent)

Maybe someone can move jQuery.ready() call to a position just before
</body> and after all other script references?

(In reply to comment #64 by matma.rex)

Moving the jQuery.ready() call after the bottom queue seems like a good idea.

anomie / Krinkle: ?

This should really be a new bug instead of reopening this one that _is_ actually fixed.

(In reply to comment #65)

(In reply to comment #63 by Liangent)

Maybe someone can move jQuery.ready() call to a position just before
</body> and after all other script references?

(In reply to comment #64 by matma.rex)

Moving the jQuery.ready() call after the bottom queue seems like a good idea.

anomie / Krinkle: ?

We tried that, see patchset 5 in https://gerrit.wikimedia.org/r/#/c/61057/5. It didn't work.

But unless the browsers were delaying the DOMContentLoaded event until after the async-loaded scripts in the bottom queue were loaded, I don't know how this wasn't already a problem. Krinkle, any ideas?

(In reply to comment #67)

See bug 47872.

Oh, good. Then we can re-close this incorrectly-reopened bug.