Page MenuHomePhabricator

Check that data coming from cross-wiki test is valid
Closed, ResolvedPublic2 Estimated Story Points

Description

This should be done after 1-2 days of T149806 going live. Things to check for:

  • Group sizes are similar across wikis (e.g. we don't want to see that 80% of the users are in control group on Italian Wikibooks vs 20% on Polish Wikisource)
  • Events are of type: 'click' (click on one of the normal results), 'iwclick' (click on a textcat-powered result from another language), or 'ssclick' (click on one of the sister projects displayed in the sidebar)
  • extraParams field should contain URL of the clicked result (including "More from" links -- see note below for those) and the position of the box (wiki) in the sidebar. Note:

For a click through to an article this would be, for example:

/wiki/s:History_of_Iowa_From_the_Earliest_Times_to_the_Beginning_of_the_Twentieth_Century/3/Counties/Clayton

That would indicate they clicked an individual results, and that result was for wikisource.

  • Merging TSS2 data with CirrusSearchRequestSet data to get list of wikis that yielded results and how many
  • "More from" links:

For a click through to the 'more from' button this would be:

https://en.wikibooks.org/wiki/Special:Search?search=guttenburg&fulltext=1

It looks like that will consistently be 'Special:Search' regardless of the wiki it's run on, but we should re-validate that after deployment. The 'ssclick' events for these will have the same position argument, indicating the position of the selected wiki in the ordering of the results.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Good stuff from T149806#2984135 for event logging:

Example urls to join a particular test. After visiting one of these urls your browser will stay in the test until

no_sidebar: http://sistersearch.wmflabs.org/w/index.php?forceSearchSatisfaction=no_sidebar&cirrusUserTesting=no_sidebar&title=Special:Search&fulltext=1&search=test
recall_sidebar_results: http://sistersearch.wmflabs.org/w/index.php?forceSearchSatisfaction=recall_sidebar_results&cirrusUserTesting=recall_sidebar_results&title=Special:Search&fulltext=1&search=test
random_sidebar_results: http://sistersearch.wmflabs.org/w/index.php?forceSearchSatisfaction=random_sidebar_results&cirrusUserTesting=random_sidebar_results&title=Special:Search&fulltext=1&search=test
Note that random still gives the same user the same order on every search, so for a single user it is consistent (to not confuse them), but different users will get sidebar results in a different order.

FYI, this test should go live on February 9, 2017.

This test is live - let's take a look to be sure we're collecting data! :)

@EBernhardson: could you please up the sampling rate to 1 in 50 across all 4? Here are the current numbers with the 1 in 200 rate:

ControlTest (Random)Test (Recall)
Catalan Wikipedia36.364% (N=4)18.182% (N=2)45.455% (N=5)
Italian Wikipedia21.875% (N=14)37.500% (N=24)40.625% (N=26)
Persian Wikipedia30.769% (N=4)23.077% (N=3)46.154% (N=6)
Polish Wikipedia29.787% (N=14)44.681% (N=21)25.532% (N=12)

will 1:50 even be enough? That's 4 days of traffic, so we are seeing roughly

days(tues- at 4x the rate will still only be ~20 per bucket in the smaller wiki's, and 80 or so on the busier wikis. I think the numbers work out to (may not be correct, as the randomness only evens out with larger numbers iiuc)

Samples in 4 daysper day @1:200additional with 10 more days @1:50approx per bucket final
Catalan112.7511040.3
Italian6416640234.6
Persian133.2513047.6
Polish4711.75470172.3

My uninformed opinion is catalan and persian may need to go higher to give a meaningful amount of data?

Either way i can push out the change to 1:50 at 11am SF swat tomorrow, and we can go from there.

Change 337533 had a related patch set uploaded (by EBernhardson):
Increase sampling of sistersearch AB test

https://gerrit.wikimedia.org/r/337533

After discussing in IRC decided to increase from 1:200 EL, 1:10 test (1:2000 overall to test) on two low volume wikis (catalan and persian) to 1:10 EL, 1:2 test, for 1:20 overall (100x more sessions). The remaining two medium volume wikis (italian and polish) increased to 1:50 EL, 1:2 test, for 1:100 overall (20x more sessions).

@EBernhardson: I have a single 'ssclick' event right now and it has a NULL event_position so I can't say there's definitively a problem but I do want to make this note for myself to check later tomorrow after the sampling rate patch deploys.

Change 337533 merged by jenkins-bot:
Increase sampling of sistersearch AB test

https://gerrit.wikimedia.org/r/337533

Change 337608 had a related patch set uploaded (by EBernhardson):
Increase sampling of sistersearch AB test

https://gerrit.wikimedia.org/r/337608

i forgot there is no 11am swat on tuesdays, due to a new train going forward on tuesdays. The update will have to go out at the 4pm (sf) swat.

Change 337608 merged by jenkins-bot:
Increase sampling of sistersearch AB test

https://gerrit.wikimedia.org/r/337608

Much better (read as "easier to learn something from") numbers post-patch ^_^

download.png (600×1 px, 27 KB)

groupeventsister projectdestinationevents
Controlsame-wiki clickNAArticle211
ControlSERPNAArticle908
Test (Random)same-wiki clickNAArticle211
Test (Random)SERPNAArticle934
Test (Random)sister-project clickCommonsFile4
Test (Random)sister-project clickWikibooksArticle2
Test (Random)sister-project clickNAArticle1
Test (Recall)same-wiki clickNAArticle272
Test (Recall)SERPNAArticle1103
Test (Recall)sister-project clickCommonsFile6
Test (Recall)sister-project clickWikibooksArticle1
Test (Recall)sister-project clickWikinewsArticle1
Test (Recall)sister-project clickWikiquoteArticle2
Test (Recall)sister-project clickWikiquoteMore Results1
Test (Recall)sister-project clickWikivoyageMore Results1

@debt @EBernhardson @Jdrewniak I think I'm done with this ticket :)

Great! glad to hear we are collecting enough data now.

Probably doesn't matter, but the image says sampling was changed on the 15th at 4pm pacific, that was the 15th in UTC at midnight, the 14th pacific @4pm.

Awesome!

Can the image be changed to reflect the correct date, @mpopov?

Let's hereby call the start of this section of the test as Feb 15 and we'll run it for a week, with the goal of turning it off via configuration change on or shortly thereafter Feb 22nd. Yay!