As we found in our quarterly metrics prep, we saw that the API has nearly 200MM searches on a daily basis. We want to break out that stat a bit to determine how many of those requests are from a wiki project or site and how many searches are from external sources.
I just checked the refinery commits log and the UDF is available now :) "Add refinery-source jars for v0.0.51 to artifacts" https://github.com/wikimedia/analytics-refinery/commit/712bf13a8689fda40530c072384d355b1dd694d5
Marking search_api_usage for a recount and then recounting using the new UDF so we have referrer breakdown for the past 60 days:
reportupdater/rerun_reports.py --report search_api_usage modules/metrics/search 2017-06-29 2017-08-15 nice ionice reportupdater/update_reports.py -l info "modules/metrics/search" "/srv/published-datasets/discovery/metrics/search"
From the API usage by referrer dashboard, we can see half of the API calls are referred by internal sites, and the other half are direct API calls which has empty referrer string. This concerns us because we don't know who sent half of the API calls directly. Further investigation shows that half of those direct traffic use our MoreLike search feature through mobile domains, which accounts for 25% of all search traffic (~60 million API calls per day).
As far as we know, traffic that use MoreLike feature should all be RelatedArticle, which are internal usage and should have a referrer. We also rule out the possibility that these traffic are generated by app. In fact, we checked several users' activities and they looked normal -- referred by google to an article and then see the related articles info when they scroll down on their mobile devices, but the API calls which get them related articles don't have referrer.
Therefore, we think that there may be some problem on the mobile side (browser or something else) that fail to send the referrer string when they use RelatedArticle. @JKatzWMF and @ovasileva , could the mobile team confirm that?