Page MenuHomePhabricator

analysis of results from A/B/C test for displaying sister projects in search results
Closed, ResolvedPublic6 Estimated Story Points

Description

This ticket is to have the Discovery-Analysis team take a look at the test after it's launched to be sure data is flowing correctly and is enough. It's also to complete the analysis of the data after the test is completed.

Event Timeline

mpopov set the point value for this task to 6.

This test is expected to stop on Feb 22 (or shortly thereafter).

The test was turned off during the afternoon of Feb 21 during swat: T157942

First draft:

@debt @Deskana: lmk if I forgot anyone in the authors list. Did David work on this project?

@EBernhardson @Jdrewniak please let me know if I got any of the technical stuff wrong.

@chelsyx @TJones: would you be able to review it? Please and thank you!

I'm sorry if it's bad. I've been making and staring at it long enough that at this point I'm probably looking past obvious flaws, but that's what first drafts are for! :D

@mpopov—It looks good! It's hard to draw firm conclusions with so little data, and I think we forgot to take into account the possible lack of results from sister projects when choosing languages to test on, and Figure 9 highlights well why that's a problem.

There are major numbering problems for figures and tables for some reason—many references in the text are +9—so readers should keep a running count until that's fixed.

Other comments sent by email.

Thanks, @mpopov - I've sent my comments via email! :)

@mpopov Great job! I've sent comments by email.

@mpopov : This is from a naive lay viewpoint, so it might not make sense:

I'm really curious how (or whether) the number of sister links offered affected the sister click rate.Is there a correlation between the number of sister links offered and the number of clicks performed?

I'm also curious whether the sister results cannibalized the main results. That is, take the case where there was just one click on some results, but it was on a sister project, rather than a same-wiki result. From an engagement perspective, that's net neutral, but from the standpoint of putting a value on this feature, it's a win. Maybe the user would have been content with the same-wiki result, but they were happier with the sister wiki result. That might be covered, but if it is, it's currently too scientific for me to digest.

@debt, @TJones, @chelsyx, @ksmith thank you all for your feedback! Hopefully I've addressed it all in the second draft (web: https://wikimedia-research.github.io/Discovery-Search-Test-CrosswikiSidebar/, PDF: https://github.com/wikimedia-research/Discovery-Search-Test-CrosswikiSidebar/blob/master/docs/index.pdf) which now includes additional analysis per Kevin's curiosity :)

Hi @mpopov - looks good to me! Three comments/concerns:

  • Should we mention that a second test (T160004) has been queued up to be done (and is actually going to be turned on today - March 16, 2017) with the two fixes that were mentioned in this report that might have affected the outcome (links not shown in blue and not enough wikis tested on)?
  • Can you link the wikipedia image that was used in the report to this file on commons, please? The attribution is in the report, but not a link to it. :)
  • Also, is it worth mentioning that wikispecies was not part of the testing (T156254)? I'm not sure that it is statistically relevant...so, completely up to you.

Just FYI, here's an early link to the new test that will be turned on today and a screen capture as well:

cawiki-second_ab_test-sister_project_search_results-T160004.png (2×1 px, 1 MB)

Thanks @mpopov . I actually read the whole report this time, rather than skimming. It's looking good.

The paragraph starting with "Namely" seems like it needs some kind of intro or other transition from the previous paragraph.

In the following paragraph, it says "We suspect this is responsible", but that seems to contradict the executive summary, which primarily blames the UI. Perhaps "partly responsible" would be more accurate here?

Below table 3, should the phrase "how more likely each respective test group is" really be "how much more likely each respective test group is"?

Would it be possible to label the axes in table 4?

Table 5 is super confusing to a non-stats person until reading the second explanatory paragraph, but the last sentence there seems to tell me what I wanted to know.

Either in the conclusion, or near figure 6, do you want to mention the possibility that the user got all the info they needed from the preview, and therefore didn't click?

@ksmith I'll see if I can address some/all of those in a revision. Thanks!

Revision uploaded. Thanks again, @ksmith!

Thanks @mpopov. I just checked what you changed in response to my previous comment, and it looks good to me.