Page MenuHomePhabricator

204 summary responses cause Android app to crash
Closed, ResolvedPublic

Description

I think we need to revisit the 204 behavior in the summary endpoint since those responses don't go too well with the Android app. It crashes when viewing a link preview which returns 204. Not sure how to iOS app behaves but the Android app behavior is probably a showstopper for the summary switchover since we cannot control the rollout of Android app updates.

Example pages

Any page not in main namespace.

Beta cluster

https://en.wikipedia.beta.wmflabs.org/wiki/Wikipedia:Community_portal
click on the department directory link. (https://en.wikipedia.beta.wmflabs.org/wiki/Wikipedia:Department_directory).

Alternatively: Local RB install pointing to prod wikis

Make sure the RB config.yaml has entries for the summary like this:

mobileapps:
  host: http://appservice.wmflabs.org
  #http://localhost:6927
summary:
  protocol: https
  implementation: mcs
  host: https://appservice.wmflabs.org

Pages: https://en.wikipedia.org/wiki/Help:Contents, https://en.wikipedia.org/wiki/User:BSitzmann_(WMF)/sandbox or anything else with a link to a non-main namespace page.

Android info

dev build with commit c67d784 (but that should not matter)
Dev Settings set to Beta Cluster or local RB install.

Stack trace

org.wikipedia.dev E/AndroidRuntime: FATAL EXCEPTION: main
Process: org.wikipedia.dev, PID: 26681
java.lang.NullPointerException: Attempt to invoke interface method 'boolean org.wikipedia.dataclient.page.PageSummary.hasError()' on a null object reference

at org.wikipedia.page.linkpreview.LinkPreviewDialog$2.onResponse(LinkPreviewDialog.java:292)
at retrofit2.ExecutorCallAdapterFactory$ExecutorCallbackCall$1$1.run(ExecutorCallAdapterFactory.java:70)
at android.os.Handler.handleCallback(Handler.java:739)
at android.os.Handler.dispatchMessage(Handler.java:95)
at android.os.Looper.loop(Looper.java:158)
at android.app.ActivityThread.main(ActivityThread.java:7224)
at java.lang.reflect.Method.invoke(Native Method)
at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:1230)
at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:1120)

Event Timeline

bearND triaged this task as High priority.Feb 9 2018, 11:06 PM
bearND created this task.
bearND updated the task description. (Show Details)Feb 9 2018, 11:23 PM

Change 409477 had a related patch set uploaded (by Dbrant; owner: Dbrant):
[apps/android/wikipedia@master] Fix possible crash when receiving Summary content.

https://gerrit.wikimedia.org/r/409477

Dbrant added a subscriber: Dbrant.Feb 10 2018, 12:04 AM

Stupid one-liner... There was even a Lint warning about it :(

@Dbrant Nice! Maybe there is hope for the current spec after all. When do you think this could be released? Could this be back-ported to older Android versions which don't get the updates?

@Dbrant I think we want to not show a link preview in this case instead of the error I'm seeing with this patch:

The link preview window pops up before it starts fetching the content, so it has no way of knowing whether it will receive valid content or not. I think it would be too jarring if the link preview popped up, then immediately disappeared if it received a 204. So, we can either show an error, or a blank window (or some custom verbiage specifically tailored for 204 responses).

@JoeWalsh Heya!

@bearND mentioned a change to the summary endpoint will be going live Monday morning - will return 204 No Content in some cases: https://www.mediawiki.org/wiki/Page_Previews/API_Specification#For_a_page_outside_of_the_wiki's_content_namespaces

I overrode a link preview URL to go against beta cluster for a non-existent summary (which BearND recommended using to test):
http://en.wikipedia.beta.wmflabs.org/api/rest_v1/page/summary/Wikipedia:Department_directory

The app didn't crash, but it did show an empty panel.

Sorry for late notice - I just found out. Is there anything on iOS we'd need to change before that endpoint change goes live?

Well, it's really up to @Fjalapeno and @phuedx to hash out if we do go ahead on Monday or later. One possible option would be to change the spec to return 200 but a special type for when not to show previews. That should be BWC for the apps. Or we allow more time for the apps to patch things. Or we say that this only affects a few pages and is rare enough that we can go ahead with this on Monday. (Although the Android crash is bad, of course.)

Mhurd added a comment.Feb 10 2018, 2:31 AM

@bearND Guessing it's probably fine and @JoeWalsh is already aware, but wanted to leave note just in case - he was already done for the day but starts super early east coast time - so he can double-check if needed before change goes live :)

Maybe the interaction of page previews and link previews is different enough to warrant more thought/discussion about the behavior differences. There's a delay before page previews appear on web but none before link previews in the Android app because if Link Previews are enabled that's currently the only way to follow most links. We don't want a delay here because that would slow down navigation in the Android app. (The iOS app is still another story since one has to hard press the link to get the link preview there.)

What would be a better model for the Android app link previews? I guess one possible solution would be that the Android app expands its knowledge of when a link would result in no link preview to include the definitions found in the page preview spec. Of the possible 204 responses in the spec only the non-main namespace can be reasonably predicted, though. The other ones (content type/model, redirects) would be hard to guess.

I think not using any 204 No Content responses would be the quickest solution. We could use 200s and provide a special value in the type field instead, e.g. something like none. If necessary we could revisit the 204s for a 2.0 release.

phuedx added a comment.EditedFeb 12 2018, 10:16 AM

I think not using any 204 No Content responses would be the quickest solution. We could use 200s and provide a special value in the type field instead, e.g. something like none. If necessary we could revisit the 204s for a 2.0 release.

Agreed. AIUI the current Page Previews implementation will handle HTTP 200s with empty or non-existent extract properties the same way. The HTTP 204 response was meant to formalise this at the server level.

Thanks! How about we change the spec to return 200s and empty extracts in the cases we were returning 204s?
Also let the apps still prepare for 204s in the future.

Fjalapeno added subscribers: Charlotte, Nirzar.EditedFeb 12 2018, 3:06 PM

Ok, I'll chat about this with Sam today, but I think we have some pretty clear things to do:

  1. Near term, we need to do "something' to the API to the Android app does not crash. (it seems like we have some ideas here, but want to chat with Sam - and follow up here)
  2. Moving forward, all clients need to handle all HTTP codes gracefully - specifically, clients should not crash or go into an inconsistent state. We need to push these fixes the ASAP so clients can get updated so future APIs and API versions can use proper http semantics. @Dbrant @JoeWalsh can you file tickets to this effect for your respective apps? @Dbrant since yours is a crasher, this is very high priority - I'll follow up with @Charlotte about timing on this
  3. Building on the previous point, after we (quickly) get clients handling all codes gracefully, we need to ensure that they are handling common codes "semantically correct" (We also need to make a list of common codes that clients should check based on possible codes that may be delivered by MCS, PCS, RL).
  4. For previews specifically, we need to standardize the behavior for showing previews when no preview is available. Whether this is fetching before showing, or showing a better message, dismissing or something else - this needs to be figured out. We need @Nirzar to weigh in here with some input from the specific client designers.
  5. Digging a bit further into design, it sounds like we should make sure we flag empty states to designers while implementing new features

Anything here sound unreasonable? Or anything missing?

@Fjalapeno Sounds reasonable. I moved Android's prio meeting to Wednesday to accommodate the meeting you set up today, but if @Dbrant will make a ticket then let's get it done asap. I accept that it's high priority because it's blocking switchover, but it's also just good practice to handle all HTTP codes gracefully, so it's worth doing anyway.

@Fjalapeno @Charlotte I've already made a patch (linked in this task) that fixes the crash. This task itself can also be used for Design input for the Android case, since this task is already Android-specific.

@Fjalapeno Sounds reasonable. I moved Android's prio meeting to Wednesday to accommodate the meeting you set up today, but if @Dbrant will make a ticket then let's get it done asap. I accept that it's high priority because it's blocking switchover, but it's also just good practice to handle all HTTP codes gracefully, so it's worth doing anyway.

@Charlotte Thanks, this is basically my thinking as well…

  1. Moving forward, all clients need to handle all HTTP codes gracefully - specifically, clients should not crash or go into an inconsistent state. We need to push these fixes the ASAP so clients can get updated so future APIs and API versions can use proper http semantics…

Filed T187081 for iOS

@Fjalapeno @Charlotte I've already made a patch (linked in this task) that fixes the crash. This task itself can also be used for Design input for the Android case, since this task is already Android-specific.

@Dbrant @JoeWalsh Thanks for getting these going. Echoing what @Charlotte said above: I want to make sure these tasks are wider than the scope of just the summary API and just this crash. We don't want to deal with crashes on other end points in the future for similar reasons.

To be specific, my proposal for these tasks is to "Ensure iOS/Android apps do not crash on any HTTP code from any API". Not to get into implementation, but I would think that in your base level networking code you check a white list of HTTP codes that you know how to handle, and if you receive a code out of that range, you should convert it to an error (or something to this effect).

@JoeWalsh for your ticket, basically I am asking to change it to "Ensure the iOS app gracefully handles non-200 responses from all API calls"

@Dbrant for you, I'm asking if you can file a ticket to the same effect as the above in addition to this crash: "Ensure the Android app gracefully handles non-200 responses from all API calls"

Sound ok?

I think not using any 204 No Content responses would be the quickest solution. We could use 200s and provide a special value in the type field instead, e.g. something like none. If necessary we could revisit the 204s for a 2.0 release.

Agreed. AIUI the current Page Previews implementation will handle HTTP 200s with empty or non-existent extract properties the same way. The HTTP 204 response was meant to formalise this at the server level.

Thanks! How about we change the spec to return 200s and empty extracts in the cases we were returning 204s?
Also let the apps still prepare for 204s in the future.

Just talked to @phuedx - we are on board with your suggestions. This should make things "work" for all clients until any app updates get to an acceptable level of penetration. If you can make these changes and still get these into the deployment window, go for it. Thanks!

Sounds good. I think we should be able to get that one in by today's deployment window if we all agree on what value we should use for the type field in the cases we were returning 204 before. I brought up none as a strawman proposal. Is that ok or should we use a different value?

Maybe no-extract? none could be read to suggest there is no type, which would be confusing.

Maybe no-extract? none could be read to suggest there is no type, which would be confusing.

no-extract WFM.

Change 409477 merged by jenkins-bot:
[apps/android/wikipedia@master] Fix possible crash when receiving Summary content.

https://gerrit.wikimedia.org/r/409477

Mholloway closed this task as Resolved.Feb 20 2018, 9:52 PM
Mholloway claimed this task.