Page MenuHomePhabricator

Investigate increase in pageviews with Android app v190
Closed, ResolvedPublic


Coinciding with the release of version 190, the app saw a substantial increase in pageviews (around 15-20%, without a corresponding increase in daily active users), which has persisted until this day:

Possible reasons include:

  • A real increase in reader's use of Wikipedia, in which case we should find out what changes in the app caused it
  • A counting anomaly (requests being counted as pageviews that don't correspond to actual content consumption by the user), which should be fixed
  • Extraneous traffic caused by a bug (like in the case of a pageview spike on the iOS app spotted earlier this year: T154735, where the reason turned out to be an infinite loop condition)

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

For the record: @Dbrant took a first look over the code changes in v190 some weeks ago and didn't spot anything that could conceivably have introduced spurious pageviews.

I ran a query to see if the rise was concentrated on certain pages (limited to English Wikipedia - per a quick check on Pivot, the rise appears to have occurred across all or most Wikipedias, including enwiki).

That wasn't the case, meaning that we can rule out it was something like the aforementioned iOS bug; although other variants of alternative 3 are still possible.

So it's a bit more likely now that v190 caused a real change in reader behavior (alternative 1).

(CC @mpopov )

SELECT page_title, changeratio, viewsfeb FROM
  (SELECT page_title, 
  ROUND( 100*SUM(IF(month = 3 AND day <= 28, view_count, null)) / SUM(IF(month = 2, view_count, null)) -100,1) AS changeratio, 
  SUM(IF(month = 2, view_count, null)) AS viewsfeb
  FROM wmf.pageview_hourly
    year = 2017 AND (month = 2 OR month = 3)
    AND access_method = 'mobile app'
    AND user_agent_map['os_family'] = 'Android'
    AND agent_type = 'user'
    AND project = 'en.wikipedia'
    GROUP BY page_title)  AS pagelist
WHERE viewsfeb > 10000 
GROUP BY page_title, changeratio, viewsfeb 
HAVING changeratio > 0
ORDER BY ABS(changeratio) DESC LIMIT 50;  

page_title	changeratio	viewsfeb
Logan_(film)	514.0	32688
WrestleMania_33	348.0	10162
Moana_(2016_film)	86.3	11255
The_Walking_Dead_(season_7)	54.5	11875
Fastlane_(2017)	48.0	13591
Doctor_Strange_(film)	41.0	11933
The_Walking_Dead_(TV_series)	39.9	10936
Get_Out_(film)	36.4	41568
Deaths_in_2017	28.8	87413
Portal:Current_events	28.0	14467
This_Is_Us_(TV_series)	18.7	11043
Riverdale_(2017_TV_series)	13.1	14601
India	11.4	10216
United_States	9.9	10011
Main_Page	7.1	2154919
Queen_Victoria	5.8	14351
Elizabeth_II	2.5	13721
17 rows selected (354.77 seconds)

@Tbayer: I'm trying to figure out the -100 in ROUND(100*SUM(IF(month = 3 AND day <= 28, view_count, null)) / SUM(IF(month = 2, view_count, null)) -100,1) and no success 😕

@mpopov Oh, that -100 is just to express the change as a relative difference (so that no change means 0% rather than 100%). Perhaps I should have named the result something like changepercentage instead of changeratio.
I.e. Logan_(film) had 514% more pageviews in (the first 28 days of) March 2017 than in February. v190 was rolled out on February 27.

Sorry for the opaque CC - I wanted to loop you in for awareness and in case you have other ideas how to identify the source of this massive pageview gain.

Tbayer added a comment.EditedMar 9 2018, 6:30 PM

One other observation: It seems the increase only happened on devices on Android 5 (Lollipop) and later versions, not on KitKat and earlier versions (which still made up a substantial amount of pageviews at that time.

(source: Pivot)

FWIW, I also looked at the distribution of the increase per namespace (conscious of the fact that the app does not yet send the correct namespace ID for all pageviews - that's a separate task that still needs to be filed). The increase was indeed higher for namespace -1 (Special pages), although that's not enough to explain the entire effect.

SELECT namespace_id, changeratio, viewsfeb FROM
  (SELECT namespace_id, 
  ROUND( 100*SUM(IF(month = 3 AND day <= 28, view_count, null)) / SUM(IF(month = 2, view_count, null)) -100,1) AS changeratio, 
  SUM(IF(month = 2, view_count, null)) AS viewsfeb
  FROM wmf.pageview_hourly
    year = 2017 AND (month = 2 OR month = 3)
    AND access_method = 'mobile app'
    AND user_agent_map['os_family'] = 'Android'
    AND agent_type = 'user'
    AND project = 'en.wikipedia'
    GROUP BY namespace_id)  AS namespacelist
WHERE viewsfeb > 10000 
GROUP BY namespace_id, changeratio, viewsfeb 
ORDER BY ABS(changeratio) DESC LIMIT 50;  

namespace_id	changeratio	viewsfeb
-1	26.2	5118205
NULL	19.2	50011122
2 rows selected (575.725 seconds)

Hey all! @Tbayer and I were brainstorming why this spike may have occurred very casually today and he encouraged me to share my thoughts. I'm positive the Android team has a handle on this so really feel free to completely ignore this comment if it's not helpful.

Beyond the infinite loop bug mentioned in the ticket (which I think Android has experienced too on a couple specific devices in the past), I was wondering if this may have had to do with lifecycles or caching:

  • IIRC, there's a service or two that can run in the background to synchronize data including populating the cache with pages which could inflate page views if it's run more or less regularly. Early on in development, we had some issues with encountering errors that caused the service to run too regularly. The characteristic of this in the data would probably be lots of repeat views.
  • The browsing screen is the most resource heavy in the app and so the most likely to be reclaimed by the system when backgrounded. Depending on whether caching favors network or cached responses (I believe it's the former), if the resources consumed by the app changes, this could impact the number of pages requested from the server. For example, if the Activity consumes greater resources and favors network responses, the Activity may not be persisted by the system for long and so a new page is requested from the server whenever the user returns to it. This can occur even when changing Activities within the same app. This would be like a user manually refreshing the current page.
  • The app may use RESTBase or MediaWiki backends which may impact the number of requests made.
  • As a client choosing to honor the server headers, the app doesn't have great options for how to handle caching. IIRC, we always preferred to ask the server for the page when the user is online.

Sorry for the noise.

MBinder_WMF renamed this task from Investigate increase in pageviews with v190 to Investigate increase in pageviews with Android app v190.May 3 2018, 8:19 PM
MBinder_WMF triaged this task as Medium priority.
MBinder_WMF moved this task from Triage to Doing on the Product-Analytics board.
MBinder_WMF moved this task from Doing to Tracking on the Product-Analytics board.
Dbrant added a comment.May 7 2018, 5:45 PM

OK! After much digging, it looks like the increase in pageviews is very likely caused by this patch, which effectively alters the local caching behavior that we were using in the app. Prior to that patch, when revisiting an article that was previously viewed in the app, it could potentially serve up a cached (stale) copy of the article, without contacting the server to check for changes. And after the patch, the app would explicitly query the server, and then fall back on the local cache. This would surely result in an increased amount of pageviews as seen from the server's perspective.

In fact, that entire week of development saw a flurry of patches related to updating and reworking our caching behavior, and is therefore extremely likely to be the explanation for the difference in pageviews.

Since these patches have made our caching behavior more correct and in line with expected behavior, we can consider the post-v190 pageview numbers to be more accurate than they were previously.
@Tbayer please feel free to close this task if the explanation is a satisfactory one.

Tbayer closed this task as Resolved.May 7 2018, 10:42 PM
Tbayer claimed this task.

Congratulations, Sherlock! ;)
To add: From our conversation on Friday, I also understand that the new behavior is now a bit closer to how a web browser would handle it, i.e. the app views are now more comparable to the pageviews we are registering on the web.
For our core metrics reporting for Q3, I think the takeaway is that the year-over-year comparison is still not valid yet until the next quarter, but perhaps we can limit it to March 2017 vs. March 2018 - CC @mpopov.