Page MenuHomePhabricator

Investigate ANR issue in NotificationPollBroadcastReceiver
Closed, ResolvedPublicBUG REPORT

Description

In the Google Play developer console, we can see the ANR reports related to NotificationPollBroadcastReceiver have increased in the most recent app versions.
https://play.google.com/console/u/1/developers/6169333749249604352/app/4976363884102945010/vitals/crashes/7129964d/details?installedFrom=PLAY_STORE&days=30&versionCode=50387%2C50385

The codes related to the ANR:
https://github.com/wikimedia/apps-android-wikipedia/blob/f108036633967ca6e9954699a13643df2a1c391e/app/src/main/java/org/wikipedia/notifications/NotificationPollBroadcastReceiver.kt#L51-L63

This may impact the users when receiving notifications from the server. Due to a longer response time, it will show an "Application is not responding" dialog and may cause a crash.
ANR: https://developer.android.com/topic/performance/vitals/anr

Event Timeline

Some thoughts:

  1. Maybe it relates to the Notification class serialization changes we have made recently.
  2. Some older devices may have a longer responses time so that we may need to optimize the logic of polling notifications.

@cooltey Where are you seeing that this ANR has increased in our most recent versions? Looking at the console over the last 60 days, it looks like this ANR has been flat, and maybe even trending down slightly:

image.png (326×1 px, 42 KB)

But you are correct -- we are likely doing too much work within the BroadcastReceiver itself. The correct thing to do is probably to make it so that the BroadcastReceiver spawns a separate background worker or service, and hands back control to the system as soon as possible. (The heavy lifting should be done in a service thread, not within a system callback.)

@Sharvaniharan Since you've picked this up, my suggestion would be to make the polling of notifications into a JobIntentService or whatever the modern equivalent is (just like we do with syncing reading lists -- ReadingListSyncAdapter), and trigger that service from the BroadcastReceiver.

Final note on this:
Although we have now properly decoupled our background worker from the BroadcastReceiver, the rate of ANRs doesn't appear to have changed. We can conclude that the ANR is just an unfortunate consequence of the capability (or lack thereof) of certain specific devices.