Page MenuHomePhabricator

Special:NewPagesFeed intermittently fails on beta cluster; causes test failure
Closed, ResolvedPublic

Description

http://en.wikipedia.beta.wmflabs.org/wiki/Special:NewPagesFeed shows error "
An error occurred while loading the interface from the API. Please try reloading the page." and never returns the page.


Version: unspecified
Severity: major
See Also:
https://bugzilla.wikimedia.org/show_bug.cgi?id=50623

Details

Reference
bz50622

Event Timeline

bzimport raised the priority of this task from to Needs Triage.Nov 22 2014, 2:04 AM
bzimport set Reference to bz50622.
bzimport added a subscriber: Unknown Object (MLST).

NewPagesFeed seems to appear correctly for logged-in users, but anonymous users get the error noted above and the browser console shows

Exception thrown by ext.pageTriage.views.list: Cannot call method 'replace' of undefined load.php?debug=false&lang=en&modules=jquery%2Cmediawiki%2CSpinner%7Cjquery.…%7Cmw.MwEmbedSupport&only=scripts&skin=vector&version=20130629T142420Z:151
TypeError
get stack: function () { [native code] }
message: "Cannot call method 'replace' of undefined"
set stack: function () { [native code] }
proto: Error

aude added a comment.Jul 2 2013, 11:08 PM

i get this in firefox (logged out) but in chrome, the feed is okay (logged in/out)

aude added a comment.Jul 2 2013, 11:09 PM

logged in with firefox, then it works

aude added a comment.Jul 2 2013, 11:11 PM

logged back out in firefox and the feed works as an anon

I managed to load the page http://en.wikipedia.beta.wmflabs.org/wiki/Special:NewPagesFeed

I found out last week that the resource loader url (load.php) was pointing to the text cache ( en.wikipedia.beta.wmflabs.org/w/load.php ) instead of bits ( bits.beta.wmflabs.org/en.wikipedia.beta.wmflabs.org/w/load.php ).

When resourceloader cache is unvalidated, there is no purge sent to the text cache so we had an old Javascript version being delivered. That most probably caused the issue reported there.

I have deployed a change on beta a few minutes ago that points load.php to bits.beta.wmflabs.org : https://gerrit.wikimedia.org/r/#/c/70322/ . I guess that solve it.

I am still seeing this behavior *most* of the time, along with Bug 50623.

Safari 6.0.5, empty cache : it works fine.

Firefox 18, empty cache : I get a JSON.parse error. An API post is made with the parameters:

action pagetriagetemplate
format json
template listItem.html
view list

In ApiSandbox that is:

http://en.wikipedia.beta.wmflabs.org/wiki/Special:ApiSandbox#action=pagetriagetemplate&format=json&view=list&template=listItem.html

The related GET request is http://en.wikipedia.beta.wmflabs.org/w/api.php?mainmodule=1&modules=query&format=json&action=paraminfo

So maybe it is broken under firefox when using a POST ?

(08:12:23 AM) chrismcmahon: Reedy: speaking of silly issues, do you have any theory as to why a POST to api.php in beta labs would return just the API HTML doc and not actually do an API call? It's bugzilla 50622 and 50623

(08:12:41 AM) Reedy: Usually because the request is wrong

I'm stumped. I can't reproduce the bug locally or anywhere besides beta labs, and the API request should be identical on all the different wikis. Chris suggested there might be some race condition where the POST is being sent without a payload, but I couldn't find anything in the code that would lead to such a condition. Unfortunately, I'm not going to be able to spend any more time on this, but if anyone else wants to take a look, the code is in PageTraige/modules/ext.pageTriage.util/ext.pageTriage.viewUtil.js and PageTraige/api/ApiPageTriageTemplate.php.

Created attachment 12887
details of POST to api

Attached:

This remains an issue on beta labs, any suggestions for questions to ask, or whom to ask them of is welcome. I attached details of the POST to the api above.

Chris and I paired on reproducing the error yesterday. We have created this simple Ruby script that visits new pages feed (a hundred times) and reports if the error occurred or not.

require "watir-webdriver"

site = "http://en.wikipedia.beta.wmflabs.org/"
#site = "http://test2.wikipedia.org/"
url = "#{site}wiki/Special:NewPagesFeed"

puts url

(1..100).each do |i|

browser = Watir::Browser.start url, :firefox
Watir::Wait.until { browser.div(id: 'mwe-pt-list-view').text == "Please wait..." } if browser.div(id: 'mwe-pt-list-view').text == "Please wait..."
sleep 1 if browser.div(id: "mwe-pt-list-errors").text == ""
puts "#{i}: #{browser.div(id: 'mwe-pt-list-errors').text}"
browser.close

end

New pages feed did not return any errors when http://test2.wikipedia.org/wiki/Special:NewPagesFeed was visited 100 times.

The script visited http://en.wikipedia.beta.wmflabs.org/wiki/Special:NewPagesFeed just 13 times and reported 6 errors, so approximately 50%. I will make the script more robust next week (including taking screen shots of every visit) and run it again.

Change 75093 had a related patch set uploaded by Zfilipin:
WIP script that checks how often NewPagesFeed page breaks

https://gerrit.wikimedia.org/r/75093

I have an idea about how this is happening, but I don't know why it's happening. Here's what I'm seeing...

I'm tailing the api.log file for beta like so:

@deployment-bastion:/data/project/logs$ tail -f api.log

Every time I incur the error on NewPagesFeed in beta labs, I immediately see one and only one entry in that log file, for ULS and not for PageTriage.

2013-07-23 20:53:50 deployment-apache32 enwiki: API GET 68.108.251.139 68.108.251.139 T=28ms action=ulslocalization language=en
2013-07-23 21:07:47 deployment-apache32 enwiki: API GET 68.108.251.139 68.108.251.139 T=16ms action=ulslocalization language=en

Is is possible that ULS is interfering with the ability of PageTriage to call the API properly? And if so, why would we not see that in production on enwiki?

Note that ULS is configured on beta similarly to production, but differently on test2wiki wrt the cog icon vs. a user-preferences control. http://www.mediawiki.org/wiki/Universal_Language_Selector/Deployment/Planning#Configuration

This seems to have miraculously fixed itself. I looked through the recent merges but didn't see any likely reasons.

Change 75093 abandoned by Zfilipin:
WIP script that checks how often NewPagesFeed page breaks

Reason:
related bug is fixed

https://gerrit.wikimedia.org/r/75093