Page MenuHomePhabricator

API usage: break out internal vs external
Closed, ResolvedPublic

Description

As we found in our quarterly metrics prep, we saw that the API has nearly 200MM searches on a daily basis. We want to break out that stat a bit to determine how many of those requests are from a wiki project or site and how many searches are from external sources.

Screen Shot 2017-08-03 at 4.03.32 PM.png (442×782 px, 126 KB)

Event Timeline

Change 371980 had a related patch set uploaded (by Chelsyx; owner: Chelsyx):
[wikimedia/discovery/golden@master] Breakdown search API requests by referer class and use GetSearchRequestTypeUDF

https://gerrit.wikimedia.org/r/371980

This patch cannot be merged now because the new UDF hasn’t been released to production. Also remember to add the new column to the existing data before merge.

The UDF update will be done by the Analytics team but we're not sure of the exact timeframe.

I just checked the refinery commits log and the UDF is available now :) "Add refinery-source jars for v0.0.51 to artifacts" https://github.com/wikimedia/analytics-refinery/commit/712bf13a8689fda40530c072384d355b1dd694d5

Change 371980 merged by Bearloga:
[wikimedia/discovery/golden@master] Breakdown search API requests by referer class and use GetSearchRequestTypeUDF

https://gerrit.wikimedia.org/r/371980

Change 374387 had a related patch set uploaded (by Chelsyx; owner: Chelsyx):
[wikimedia/discovery/rainbow@develop] Use new UDF and break api calls down by referer class

https://gerrit.wikimedia.org/r/374387

Marking search_api_usage for a recount and then recounting using the new UDF so we have referrer breakdown for the past 60 days:

reportupdater/rerun_reports.py --report search_api_usage modules/metrics/search 2017-06-29 2017-08-15
nice ionice reportupdater/update_reports.py -l info "modules/metrics/search" "/srv/published-datasets/discovery/metrics/search"

Change 374387 merged by Bearloga:
[wikimedia/discovery/rainbow@develop] Use new UDF and break api calls down by referer class

https://gerrit.wikimedia.org/r/374387

Change 374442 had a related patch set uploaded (by Chelsyx; owner: Chelsyx):
[wikimedia/discovery/rainbow@develop] Add a tab to track morelike search usage

https://gerrit.wikimedia.org/r/374442

Change 374442 merged by Bearloga:
[wikimedia/discovery/rainbow@develop] Add a tab to track morelike search usage

https://gerrit.wikimedia.org/r/374442

@chelsyx there should also be a tab that shows the total usage (across all APIs) broken down by referrer with the option to switch between raw counts and %s

Change 374669 had a related patch set uploaded (by Chelsyx; owner: Chelsyx):
[wikimedia/discovery/rainbow@develop] Breakdown API calls by referer class

https://gerrit.wikimedia.org/r/374669

The view looks good - but the legend covers up a portion of the graph:

Screen Shot 2017-08-30 at 2.57.12 PM.png (383×659 px, 52 KB)

Change 374669 merged by Bearloga:
[wikimedia/discovery/rainbow@develop] Breakdown API calls by referer class

https://gerrit.wikimedia.org/r/374669

Change 374924 had a related patch set uploaded (by Chelsyx; owner: Chelsyx):
[wikimedia/discovery/rainbow@develop] Order legends according to the last observed values

https://gerrit.wikimedia.org/r/374924

Change 374924 merged by Bearloga:
[wikimedia/discovery/rainbow@develop] Order legends according to the last observed values

https://gerrit.wikimedia.org/r/374924

Change 375074 had a related patch set uploaded (by Chelsyx; owner: Chelsyx):
[wikimedia/discovery/rainbow@develop] Fix legend positions and rename type of API calls

https://gerrit.wikimedia.org/r/375074

Change 375074 merged by Bearloga:
[wikimedia/discovery/rainbow@develop] Fix legend positions and rename type of API calls

https://gerrit.wikimedia.org/r/375074

To do:

  1. Add interpretation of referrer class on dashboard
  2. Key findings

From the API usage by referrer dashboard, we can see half of the API calls are referred by internal sites, and the other half are direct API calls which has empty referrer string. This concerns us because we don't know who sent half of the API calls directly. Further investigation shows that half of those direct traffic use our MoreLike search feature through mobile domains, which accounts for 25% of all search traffic (~60 million API calls per day).

As far as we know, traffic that use MoreLike feature should all be RelatedArticle, which are internal usage and should have a referrer. We also rule out the possibility that these traffic are generated by app. In fact, we checked several users' activities and they looked normal -- referred by google to an article and then see the related articles info when they scroll down on their mobile devices, but the API calls which get them related articles don't have referrer.

Therefore, we think that there may be some problem on the mobile side (browser or something else) that fail to send the referrer string when they use RelatedArticle. @JKatzWMF and @ovasileva , could the mobile team confirm that?

For now, I will add a note to this dashboard to explain the definition of each referrer class and point out that some of the direct traffic could possibly be misclassified internal traffic.

Change 378067 had a related patch set uploaded (by Chelsyx; owner: Chelsyx):
[wikimedia/discovery/rainbow@develop] Interpretation and general findings for API dashboards

https://gerrit.wikimedia.org/r/378067

Change 378067 merged by Bearloga:
[wikimedia/discovery/rainbow@develop] Interpretation and general findings for API dashboards

https://gerrit.wikimedia.org/r/378067

Regarding the problem mentioned in T172452#3593393, see T176433 and this write-up for more details.