Fri, Mar 17
@TJones I'm happy to help. :)
@TJones, Thank you for the analysis! I read through your write up and learned a lot! (I learned how messy my language is :P)
Just want to confirm that STConvert and ZhConversion are closed enough. We can't do anything with the disagreement listed in this table. For example, "著" can be either traditional or simplified according to its different meaning; and mismatch like "馀 vs 餘 vs 余" and "钟 vs 锺 vs 鍾 vs 鐘" are messiness introduced when Chinese were simplified, some rules are still changing nowadays.
Wed, Mar 15
Sun, Mar 12
@mpopov Great job! I've sent comments by email.
Wed, Mar 8
Thu, Mar 2
First draft: http://wikimedia-research.github.io/Discovery-Research-Portal/swap2and3/
Still trying to figure out how to interpret the result...
Feb 16 2017
Hi @JKatzWMF, thanks for pointing that out! You are right, this issue does not seem to limited to mobile safari, but also desktop safari (pivot), chrome mobile (pivot), firefox (pivot), etc, while T148780#2890139 says this issue seems localizable to mobile browsers. I have no clue about the possible cause of this issue at this point... :(
Feb 15 2017
The following chart shows that we have around 500k internal referrers that use Safari everyday -- they are not categorized as unknown.
Jan 26 2017
Results: (source = "web", and I didn't exclude automata)
Jan 19 2017
Jan 18 2017
Jan 7 2017
PDF version is uploaded: https://commons.wikimedia.org/wiki/File:Second_BM25_AB_Test_Analysis.pdf
Jan 6 2017
Thank you @mpopov !!!
Jan 4 2017
@dcausse, @TJones and @mpopov, thank you all for comments and suggestions! The second draft is up: https://wikimedia-research.github.io/Discovery-Search-2ndTest-BM25_jazhth/
Jan 3 2017
Dec 22 2016
The schema changed from TestSearchSatisfaction2_15700292 to TestSearchSatisfaction2_15922352 on Oct 25, so we ended up having 7896 sessions recorded in TestSearchSatisfaction2_15700292 and 1 session recorded in TestSearchSatisfaction2_15922352 which last longer than 10 seconds. Golden failed to pick up data points from both tables.
Dec 21 2016
Dec 19 2016
Dec 17 2016
Dec 15 2016
Dec 9 2016
Dec 8 2016
First draft of full write-up: https://wikimedia-research.github.io/Discovery-Search-2ndTest-BM25_jazhth/
Dec 1 2016
@Volker_E, I guess indentation + gray bar on the left is probably more common for Chinese block quote as shown below. People use the “graphical” blockquote you mentioned as well. I don't see a lot of discussion about this problem on Chinese websites, so I guess people are okay either way.
In some browsers, the block quoted text will be italicized automatically. Chinese users are not happy about that. Also, for traditional Chinese, I've never seen people use big "graphical" quotation mark like 「…」or 『…』for blockquote (not 100% sure though...).
Nov 29 2016
I updated the draft with query reformulation analysis: https://wikimedia-research.github.io/Discovery-Search-2ndTest-BM25_jazhth/#query_reformulation
Nov 22 2016
I added the dwell time of visit page and proportion of visit with scroll: https://wikimedia-research.github.io/Discovery-Search-2ndTest-BM25_jazhth/#dwell_time_per_visit_page
Nov 18 2016
@TJones , I just added the breakdown by wikis: https://wikimedia-research.github.io/Discovery-Search-2ndTest-BM25_jazhth/
Looks like zhwiki have the largest discrepancies between control and test group for all metrics... Maybe that has something to do with the tokenizer? At least from this example, the Chinese tokenizer works bad...
Nov 17 2016
@Deskana, oops sorry I made a mistake in the last comment: ZRR of test group is actually lower. Fixed it.
I replicated @mpopov 's analysis without query reformulation for this second test, and the results are here: https://wikimedia-research.github.io/Discovery-Search-2ndTest-BM25_jazhth/
For the test group, ZRR is lower, but Clickthrough rate is significantly lower and PaulScores are slightly lower.
Nov 16 2016
Thank you so much @EBernhardson ! Let me do the analysis first without query reformulation as @TJones suggested, in order to provide some information for the decision between Plan A and Plan B, then see what I can do with the tokenizer. :)
Nov 15 2016
Thanks @TJones ! I will finish the analysis as soon as possible.
In the analysis for the first BM25 test, @mpopov use Levenshtein (edit) distance adjusted by overlapped results to identify query reformulation. However, since most Chinese words only contain 1-3 characters, Levenshtein distance is not suitable for computing distance (Japanese has the same issue as well, not sure about Thai though).
Nov 14 2016
@debt, here is the pdf version of the report:. I will upload it to commons if it looks good to you. :)
Nov 10 2016
You are very welcome @TJones ! :)
Nov 9 2016
A new version of the dashboard is up: http://discovery-experimental.wmflabs.org/poultry/
Nov 8 2016
Third draft is up: https://wikimedia-research.github.io/Discovery-Search-QueryFeatures-201610/
Nov 7 2016
@mpopov I cannot reproduce your results with the code above: for fit_total, I got drift = 0.003480768; for fit_google, I got drift=-0.003127302. From your result above, trend=0 is within +1/-1 standard errors, which means drift is not significant and there are no trend in both series.
Nov 4 2016
@mpopov Sorry, I've upgraded to 3.3.2 on my laptop...
Nov 3 2016
@debt Do I need to create a pdf of this report and upload it to commons?
Second draft is up: https://wikimedia-research.github.io/Discovery-Search-QueryFeatures-201610/
Nov 2 2016
Nov 1 2016
@mpopov Got it! Thanks!
@mpopov @debt , first draft is here :) https://wikimedia-research.github.io/Discovery-Search-QueryFeatures-201610/
Oct 29 2016
Thank you @elukey !
Oct 20 2016
Oct 13 2016
Thank you for the suggestions! I will work on it after I finish other ongoing projects.
Oct 12 2016
I'm sorry I didn't realize the problem until yesterday. Our portal pageview data is problematic:
- Android pageviews accounts for 70-80% of the total pageviews, which too much...
- We don't have any portal pageviews whose access method is mobile web, and there are only 141 pageviews whose access method is mobile app during the 60 days period
Oct 11 2016
Just sent an email to the requester with some instructions. :)
Oct 7 2016
Thank you @Volker_E for inviting me to the discussion. I'm happy to help! :)
Oct 5 2016
Oct 4 2016
Thank you @debt! :)
@mpopov Great report!!! I've sent comments by email. :)
Thanks @debt! Updated on Commons!
@debt Please let me know if there is anything else need to be changed.
Oct 3 2016
Thanks everyone! I've uploaded the report to the commons: https://commons.wikimedia.org/wiki/File:Exploration_on_the_Use_of_WDQS_-_Breakdown_by_Geography,_User_Agent_and_Referer_Class.pdf
Sep 30 2016
@Smalyshev what do you mean by "error responses"?
Here is an example of my query:
Sep 29 2016
@mpopov and @debt, please let me know what you think: http://discovery-experimental.wmflabs.org/poultry/