Page MenuHomePhabricator

Missing watchlist stars, search suggestions and more on English Wikipedia mobile site
Closed, ResolvedPublic

Description

Beginning October 1, I noticed multiple issues on the English Wikipedia mobile site. None of the issues were present the day before.

  • The text Search Wikipedia is gone from the search box. Also gone are the search suggestions.
  • On all the pages which can be added to my watchlist, the stars which can be clicked on to add those pages to the watchlist are gone.
  • The edit links are gone.

The day I noticed the issues (October 1) I cleared my browser's cache and cookies, but the issues still persist. I had not updated my browser or changed any of its settings between the time I last used English Wikipedia (September 30) and when I noticed the issues (October 1).

I edit English Wikipedia from my Samsung Galaxy S6, which runs version 5.1.1 of Android Lollipop. The web browser I use is Google Chrome, its version is 45.0.2454.94.

Additional information:
Reports on Blackberry (https://twitter.com/Comptinator/status/650742786917027840)
Android Chrome 42 (https://twitter.com/Paul_012/status/649647096753078272) and possibly related? (https://twitter.com/keithmulvaney/status/650405167154769920)

Event Timeline

Jesant13 created this task.Oct 4 2015, 1:14 AM
Jesant13 updated the task description. (Show Details)
Jesant13 raised the priority of this task from to Needs Triage.
Jesant13 added a subscriber: Jesant13.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptOct 4 2015, 1:14 AM

After submitting this report and mentioning I had done so at Wikipedia:Village pump (technical), I went to the MediaWiki site and discovered that the same search box issues are present. Yet when I was logged out of MediaWiki prior to creating this report, the issues weren't present. This means the issues are with MediaWiki, not just English Wikipedia which confirms my suspicion that they originated with the most recent update.

Krenair set Security to None.

This sounds like a JavaScript issue of some kind. Are you able to look at Chrome's error console?

Korea Wikipedia mobile version has same problem.

Chrome (and other WebKit browsers such as Naked Browser), as well as Opera, are unaffected for me, but Firefox (version 41 on Android 5.1, Motorola Moto X) exhibits this behaviour.

Fnvedgnve added a comment.EditedOct 4 2015, 12:42 PM

Microsoft Internet explorer has same problem. Chrome, and Safari has same problem too.

This sounds like a JavaScript issue of some kind. Are you able to look at Chrome's error console?

I don't know if that's possible to do through Google Chrome on Android. I looked around but didn't see an option.

Update: Chrome and Naked Browser are now broken too. I guess I had old styles and scripts cached or something.

This sounds like a JavaScript issue of some kind. Are you able to look at Chrome's error console?

I don't know if that's possible to do through Google Chrome on Android. I looked around but didn't see an option.

Firefox is affected and has remote debugging support.

Jdlrobson triaged this task as Unbreak Now! priority.Oct 5 2015, 5:14 PM
Jdlrobson added a subscriber: Jdlrobson.

I am still unable to replicate this :( :(. If anyone is tech savvy could they debug this and see if any errors are reported in the developer console to shed light on this issue?

The only thing I can think of is this is caused by T99096.
We recently pushed a fix for T113289 which removed a bunch of deprecated functions. Given this bug is touching all functionality I can't help but think this must be the cause and that we are serving older versions of JavaScript causing this mass breakage

@Krinkle is there some way we can rule this out? Would simply touching the static resources force a cache flush?

I'm not able to replicate either. Tested on Chrome and Firefox on Linux.

I think i've just replicated this and am debugging. Anyone who is experiencing this issue - can you answer me a question?

Were you still able to interact with the menu icon in the top left corner? Did clicking that work without loading an entire new page?

mw.loader.getState('skins.minerva.scripts')

This is returning error for me.

When I debugged a little more it turned out there was an error in QuickSurveys. This is not live on enwiki so is not the cause, but it seems that some code may be dying silently in skins.minerva.scripts

When I wrapped the offending code in a try/catch the bug went away https://gerrit.wikimedia.org/r/244094 and subsequent code ran.
(@Krinkle is that a known problem with ResourceLoader async? If not do you want me to raise a bug)

So my current theory... I looked at other uses of mw.track and it seems we do some sampling with it... if this explain why some users are seeing this bug and some are not.

@bmansurov and @phuedx can you see if there is anything we're sampling on enwiki (maybe that needs to be turned on to kick in) that might be breaking experience for some of our users and not easily reproducable?

When I debugged a little more it turned out there was an error in QuickSurveys.

Did you raise a bug? I'd love to see what error that was.

I left a comment on your patch. I think I'll be able to help debug the issue once I can replicate locally.

Were you still able to interact with the menu icon in the top left corner? Did clicking that work without loading an entire new page?

Yes, that works.

I replicated it! Yay!

Seems my theory was correct.. now just need to work out why..

Error: Module not found: loggingSchemas/SchemaMobileWebSearch Error: Module not found: loggingSchemas/SchemaMobileWebSearch(…)

Change 244178 had a related patch set uploaded (by Jdlrobson):
Fix factory schema so that MobileWebSearch can be retrieved

https://gerrit.wikimedia.org/r/244178

So turns out we caught this but didn't SWAT it. After it's live I'm going to work out a way we can avoid this happening again (locally we shouldn't be sampling schemas)

Change 244178 merged by jenkins-bot:
Fix factory schema so that MobileWebSearch can be retrieved

https://gerrit.wikimedia.org/r/244178

@HairyDude I just confirmed it's fixed for me. Can you verify the same?

@Jdlrobson Wikipedia does indeed work again. Thanks! As a side effect, I'm also seeing seeing usage hints and the first image repeated at the top of the page, and search suggestions are also showing wikidata descriptions - I've seen some of these before but none immediately before the breakage iirc.

@HairyDude, you must be in the beta mode if you see wikidata descriptions in search results.

@Tfinc @Deskana @ fyi I suspect this will have damaged traffic to the MobileWebSearch schema in the past week in case you noticed any blips or unusual behaviour (I don't monitor this data).

So to quickly summarise what happened here and why we had so much trouble debugging and how we will not let this happen again:

  • Bug impacted a 0.1% sample of our users who were using the MobileWebSearch schema
  • Anyone in that bucket of users who clicked a search result would have set up a search beacon to log the event. This code was broken, so for any pages visited after that search would no longer work.
  • This is what caused the bug to be very difficult to debug.

Fix:

  • Now in dev environments/browser tests we will turn off sampling for all devs to make it easier for us to surface these bugs.
  • I've added a browser test which would have failed for this scenario https://gerrit.wikimedia.org/r/244196
Jdlrobson closed this task as Resolved.Oct 7 2015, 5:45 PM
Jdlrobson claimed this task.

Have had confirmation in various places that this is now fixed. Will keep hanging around before sign off to make sure we covered everyone.

@Jdlrobson, section headers still aren't collapsing for me. Chrome 42.0.2311.111 on HTC Desire S running Android 4.0.4. It's an older device; I haven't been able to find a working driver for debugging.

@Paul_012 thanks for the update! Does the problem exist in incognito mode? Hopefully we'll get to the bottom of this! :)

@Jdlrobson: Yes; same behaviour as in normal mode:

@Paul_012... this is most strange. I did a little bit more debugging and can't see any reason why this would still be a problem. There was a deployment yesterday afternoon, after your post, did that improve things for you?