Wed, Oct 18
Tue, Oct 17
Thanks @mpopov ! No rush!
Yep, I'm working on it.
- for the sister projects—do you know why there were more clickthroughs on about half the wikis for the test group than for the control group?
Which chart are you referring to? I didn't see a big difference in sister projects clicks between the two groups.
@EBernhardson @TJones For hewiki, I fetched several query strings with zero result from ltr-1024 group of hewiki on 9/20 and 9/21 (the first two days of this experiment when ltr-1024 had very high zero result rates). I ran them in hewiki and most of them returns some results. So I think there may be some bugs in the test configuration for those days on hewiki, which result in event_hitsReturned is null.
I removed all sessions with more than 100 searches (Previously, I removed sessions only when they have more than 100 searches AND only have SERP events.), now the dewiki distribution is not bimodal anymore. Yay!
Mon, Oct 16
Codebase and output: https://github.com/wikimedia-research/SDoC-Initial-Metrics/tree/master/T177354
I bootstrapped from the preprocessed data for 1000 times, and compute the distribution of the search-wise CTR. Then I changed the re-sample size from 1k to 10k, and then create joy plots for every wiki. Here is the most interesting one:
Fri, Oct 13
@mpopov yup, I will put my stuff in the repo.
According to T176464#3669451, this bug didn't cause the decrease in logged events on March 29th 2017.
@Niedzielski Looks ok to me. Thank you all very much for the help! :D
The following two graphs breakdown the number by month:
Updated: On Oct 12, 2017, the number of files uploaded by bots is 9,390,721 (22.03%), and the number of files uploaded by users is 33,241,541 (77.97%). The following table break down the counts by media type:
The auto-report is updated: https://analytics.wikimedia.org/datasets/discovery/reports/CirrusSearch_MLR_AB_test_on_18_wikis.html
Thu, Oct 12
Auto-generated report is up: https://analytics.wikimedia.org/datasets/discovery/reports/CirrusSearch_MLR_AB_test_on_18_wikis.html. There are still some bugs in the report I need to fix and I will update the report later.
Wed, Oct 11
@mpopov Looks like the file type categorization on commons is messier than we thought...
For example, File:Krazy_Kat_Bugolist_1916_silent.ogv is an ogv file, but its img_minor_mime is ogg, img_major_mime is application, and img_media_type is video. This is the same for other ogv files. While for ogg files like File:Whitenoisesound.ogg, its img_minor_mime is ogg, img_major_mime is application, and img_media_type is audio.
Hey @chelsyx - what time frame does this cover?
Jumping in to say this looks like it's from launch of Commons to now.
Thanks @mpopov ! Yes, this is the file counts on Oct 10.
@mpopov Agree. That would be less confusing as well.
The number of files uploaded by bots is 9,390,408 (22.04%), and the number of files uploaded by users is 33,222,828 (77.96%). The following table break down the counts by media type:
Sat, Oct 7
Fri, Oct 6
Update: From the dashboard, we noticed that MobileWebSearch events increased drastically on Sep 29, back to the same level before March 29. We will keep watching.
Now live on beta: http://discovery-beta.wmflabs.org/metrics/#mobile_events
Wed, Oct 4
We examined the query P5973 carefully and didn't find anything that would change the full-text search usage pattern on mobile web. More interestingly, when we focus on users who went through the "prefix -> full-text" funnel, we can see that while the number of users and the number of prefix search are higher on weekends, this same group of users open more full-text search result pages on weekdays:
Sorry I'm late for the party.
Mon, Oct 2
Fri, Sep 29
Thu, Sep 28
Wed, Sep 27
Breakdown by top 10 OS:
Here is the breakdown of direct morelike API calls from mobile web by top 10 browsers (on Aug 1st, excluding spider):
Tue, Sep 26
On September 21st, 2017, we have a meeting with @phuedx and discuss some issues related to mobile web search. The link to the etherpad is https://etherpad.wikimedia.org/p/MobileWebSearch_Sync.
Fri, Sep 22
In order to check whether these direct (no referrer) MoreLike API calls are from unidentified bots, I pulled the webrequest data for users (identified by IP+user agent) who had sent direct morelike api calls on Aug 1 2017, and check whether they had any pageview, whether their landing event is a pageview and whether their landing event were referred by search engine.
Thu, Sep 21
Sep 20 2017
Great job @mpopov !
Sep 19 2017
Thank you very much @TJones !
Sep 14 2017
Sep 11 2017
Update: Got email from @phuedx and I realize that I may have some misunderstanding about how Schema:MobileWebSearch work. I will have a meeting with an RW engineer to go through the implementation of the MobileWebSearch instrumentation to probe for any other issues before proceeding.
Sep 9 2017
For now, I will add a note to this dashboard to explain the definition of each referrer class and point out that some of the direct traffic could possibly be misclassified internal traffic.
From the API usage by referrer dashboard, we can see half of the API calls are referred by internal sites, and the other half are direct API calls which has empty referrer string. This concerns us because we don't know who sent half of the API calls directly. Further investigation shows that half of those direct traffic use our MoreLike search feature through mobile domains, which accounts for 25% of all search traffic (~60 million API calls per day).
Sep 7 2017
We did not find any error on the dashboard side. In fact, we can see the weekend bumps on dashboard from time to time (e.g. Jan - Feb 2017). That being said, there are several things we can do to improve this dashboard:
Sep 5 2017
- Add interpretation of referrer class on dashboard
- Key findings
Sep 1 2017
Aug 31 2017
Aug 30 2017
@debt Fixed in the new patch :)
According to config.yaml files in golden, the following reports have a max_data_points limit, and we are going to create duplicated reports with max_data_points removed for tracking purpose: