As we found in our quarterly metrics prep, we saw that the API has nearly 200MM searches on a daily basis. We want to break out that stat a bit to determine how many of those requests are from a wiki project or site and how many searches are from external sources.
Description
Details
Related Objects
Event Timeline
Change 371980 had a related patch set uploaded (by Chelsyx; owner: Chelsyx):
[wikimedia/discovery/golden@master] Breakdown search API requests by referer class and use GetSearchRequestTypeUDF
This patch cannot be merged now because the new UDF hasn’t been released to production. Also remember to add the new column to the existing data before merge.
The UDF update will be done by the Analytics team but we're not sure of the exact timeframe.
I just checked the refinery commits log and the UDF is available now :) "Add refinery-source jars for v0.0.51 to artifacts" https://github.com/wikimedia/analytics-refinery/commit/712bf13a8689fda40530c072384d355b1dd694d5
Change 371980 merged by Bearloga:
[wikimedia/discovery/golden@master] Breakdown search API requests by referer class and use GetSearchRequestTypeUDF
Change 374387 had a related patch set uploaded (by Chelsyx; owner: Chelsyx):
[wikimedia/discovery/rainbow@develop] Use new UDF and break api calls down by referer class
Marking search_api_usage for a recount and then recounting using the new UDF so we have referrer breakdown for the past 60 days:
reportupdater/rerun_reports.py --report search_api_usage modules/metrics/search 2017-06-29 2017-08-15 nice ionice reportupdater/update_reports.py -l info "modules/metrics/search" "/srv/published-datasets/discovery/metrics/search"
Change 374387 merged by Bearloga:
[wikimedia/discovery/rainbow@develop] Use new UDF and break api calls down by referer class
Change 374442 had a related patch set uploaded (by Chelsyx; owner: Chelsyx):
[wikimedia/discovery/rainbow@develop] Add a tab to track morelike search usage
Change 374442 merged by Bearloga:
[wikimedia/discovery/rainbow@develop] Add a tab to track morelike search usage
@chelsyx there should also be a tab that shows the total usage (across all APIs) broken down by referrer with the option to switch between raw counts and %s
Change 374669 had a related patch set uploaded (by Chelsyx; owner: Chelsyx):
[wikimedia/discovery/rainbow@develop] Breakdown API calls by referer class
Change 374669 merged by Bearloga:
[wikimedia/discovery/rainbow@develop] Breakdown API calls by referer class
Change 374924 had a related patch set uploaded (by Chelsyx; owner: Chelsyx):
[wikimedia/discovery/rainbow@develop] Order legends according to the last observed values
Change 374924 merged by Bearloga:
[wikimedia/discovery/rainbow@develop] Order legends according to the last observed values
Up on beta: http://discovery-beta.wmflabs.org/metrics/#referer_breakdown
Good work, @chelsyx!
Change 375074 had a related patch set uploaded (by Chelsyx; owner: Chelsyx):
[wikimedia/discovery/rainbow@develop] Fix legend positions and rename type of API calls
Change 375074 merged by Bearloga:
[wikimedia/discovery/rainbow@develop] Fix legend positions and rename type of API calls
From the API usage by referrer dashboard, we can see half of the API calls are referred by internal sites, and the other half are direct API calls which has empty referrer string. This concerns us because we don't know who sent half of the API calls directly. Further investigation shows that half of those direct traffic use our MoreLike search feature through mobile domains, which accounts for 25% of all search traffic (~60 million API calls per day).
As far as we know, traffic that use MoreLike feature should all be RelatedArticle, which are internal usage and should have a referrer. We also rule out the possibility that these traffic are generated by app. In fact, we checked several users' activities and they looked normal -- referred by google to an article and then see the related articles info when they scroll down on their mobile devices, but the API calls which get them related articles don't have referrer.
Therefore, we think that there may be some problem on the mobile side (browser or something else) that fail to send the referrer string when they use RelatedArticle. @JKatzWMF and @ovasileva , could the mobile team confirm that?
For now, I will add a note to this dashboard to explain the definition of each referrer class and point out that some of the direct traffic could possibly be misclassified internal traffic.
Change 378067 had a related patch set uploaded (by Chelsyx; owner: Chelsyx):
[wikimedia/discovery/rainbow@develop] Interpretation and general findings for API dashboards
Change 378067 merged by Bearloga:
[wikimedia/discovery/rainbow@develop] Interpretation and general findings for API dashboards
Regarding the problem mentioned in T172452#3593393, see T176433 and this write-up for more details.