Page MenuHomePhabricator

[A/B/C Test] Add cross-wiki search results in a right sidebar
Closed, ResolvedPublic

Description

Based on the finalized design from T146663 and T139310, this A/B/C test will have two test groups that will be shown a new right sidebar that will contain relevant search results that were found from cross-wiki searching. A control group will see the currently existing search results page. The two test groups will be shown results using two different randomizations.

This test is expected to last at least a week and will be run on Persian, Italian, Catalan and Polish Wikipedias (their selection was based on community input).

Test group users will see:

  • additional search results from sister wikis in a right sidebar
  • each result for the sister wiki(s) will display:
    • the top ranked result from any wiki that contain relative search results
    • an icon that denotes which wiki the result is from
    • article name of the search result
    • description of the search result
    • typical bolding of the search result term(s)
  • link below the search result that is labeled 'more results'
    • this link will open a new browser tab and display a search results page for the original search term on that sister wiki
  • separate section for multimedia results above the other sister wiki results
    • up to 3 images will be displayed that are relevant to the original search term
    • display a link that will open a new browser tab and display a search results page of multimedia for the original search term from the native wikipedia that the user is on
      • for example, if a user searched for 'gutenberg' on English Wikipedia, and clicked on the more multimedia link, the user will be displayed search results for multimedia for 'gutenberg' on English Wikipedia in a new tab.

Order of projects will be randomized:

  • one group of users will see results based on recall - most to least number of articles returned from each project
  • one group of users will see results based on a random order
  • results from Commons will always be displayed first
  • Wikispecies will most likely not be included in this test cycle

Bucket testing logic generally is as follows:

  • 1 in 200 users are included in EventLogging
  • Of those 1 in 200 users, 1 in 10 are included in the test
  • Of those 1 in 10 users
    • 1/3 will go in a test group, labeled "recall_sidebar_results"
    • 1/3 will go in a test group, labeled "random_sidebar_results"
    • the remaining 1/3 of users will go in a control group, labeled "no_sidebar"
  • The remaining chunk of the original bucketed 200 users will get a NULL (the string null, or the MySQL null, we can detect either).

Eventlogging needs to capture:

  • if the user clicked on an individual result and what wiki project that result came from
  • what position in the list was the selected result
  • if the user clicked on the 'more from' on any wiki project result that was displayed
  • important to compare control group that has sister wiki results vs test group that also has sister wiki results

Eventlogging data will be joined against CirrusSearchRequestSet logging to capture:

  • if results were shown and from which wiki projects

Notes to take into account:

  • for Italian wiki - note that for those users that aren't selected in the bucketing, we'll need to show the existing sister wiki search results as currently existing.
  • a few days after the test starts, we'll need to take a look at the initial results:
    • do we need to increase the sampling rate?
    • do we need to increase the amount of wikipedias the test is being run on?
      • list of languages that have all or nearly all projects that we might want to also test on (in no particular order):
        • Arabic
        • Czech
        • English
        • Finnish
        • French
        • German
        • Greek
        • Hebrew
        • Russian
        • Portuguese
        • Swedish
        • Chinese
        • Ukrainian

Draft sample image of what this test could look like:

a-b_test-serp_sidebar_cross-wiki_results.jpg (1×1 px, 876 KB)

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Change 334491 had a related patch set uploaded (by EBernhardson):
[WIP] Enable Sister project search AB test

https://gerrit.wikimedia.org/r/334491

Change 334673 had a related patch set uploaded (by DCausse):
[WIP] Configure A/B test for CrossProject search results sidebar

https://gerrit.wikimedia.org/r/334673

Change 334685 had a related patch set uploaded (by EBernhardson):
Convert InterwikiSearcher::MAX_RESULTS into variable

https://gerrit.wikimedia.org/r/334685

Another clarification, in the spec we state:

Eventlogging needs to capture:
* if results were shown and from which wiki projects
...

I think we decided that this information would be obtained by joining the EventLogging data against the CirrusSearchRequestSet data. The link between the two is already collected, and the CirrusSearchRequestSet logs already contain the list of wiki's queried and the # of results returned for each query.

Yup, that's the plan.

Also, more clarifications. For if the user clicked on an individual result and what wiki project that result came from along with if the user clicked on the 'more from' on any wiki project result that was displayed we agreed to record the URL of the link the user is clicking.

For a click through to an article this would be, for example:

/wiki/s:History_of_Iowa_From_the_Earliest_Times_to_the_Beginning_of_the_Twentieth_Century/3/Counties/Clayton

That would indicate they clicked an individual results, and that result was for wikisource.

Since that doesn't follow the usual format we're used to dealing with (e.g 'enwiki' => 'english'+'wikipedia'), is there a table somewhere for us to see how '/wiki/s' => '[language of project the search was made on]'+'wikisource'?

For a click through to the 'more from' button this would be:

https://en.wikibooks.org/wiki/Special:Search?search=guttenburg&fulltext=1

It looks like that will consistently be 'Special:Search' regardless of the wiki it's run on, but we should re-validate that after deployment. The 'ssclick' events for these will have the same position argument, indicating the position of the selected wiki in the ordering of the results.

Thanks for pointing that out! Yeah, that should definitely be one of the things we look for in the validation step.

Change 334721 had a related patch set uploaded (by EBernhardson):
Setup sister search prefix display types

https://gerrit.wikimedia.org/r/334721

Change 334685 merged by jenkins-bot:
Convert InterwikiSearcher::MAX_RESULTS into variable

https://gerrit.wikimedia.org/r/334685

I should add, it looks like adding the click types will require updating the search satisfaction schema revision number. As long as we are doing this is there anything else we want to change about the recorded event?

Another clarification, in the spec we state:

Eventlogging needs to capture:
* if results were shown and from which wiki projects
...

I think we decided that this information would be obtained by joining the EventLogging data against the CirrusSearchRequestSet data. The link between the two is already collected, and the CirrusSearchRequestSet logs already contain the list of wiki's queried and the # of results returned for each query.

Yup, that's the plan.

Also, more clarifications. For if the user clicked on an individual result and what wiki project that result came from along with if the user clicked on the 'more from' on any wiki project result that was displayed we agreed to record the URL of the link the user is clicking.

For a click through to an article this would be, for example:

/wiki/s:History_of_Iowa_From_the_Earliest_Times_to_the_Beginning_of_the_Twentieth_Century/3/Counties/Clayton

That would indicate they clicked an individual results, and that result was for wikisource.

Since that doesn't follow the usual format we're used to dealing with (e.g 'enwiki' => 'english'+'wikipedia'), is there a table somewhere for us to see how '/wiki/s' => '[language of project the search was made on]'+'wikisource'?

Sadly there is not a single definitive answer here, which is quite annoying :( The sister project prefixes are the same for all sites at least though:

wikt: wiktionary
b: wikibooks
n: wikinews
q: wikiquote
s: wikisource
v: wikiversity
voy: wikivoyage.

Generally it's safe to assume es.wikipedia.org/wiki/n:Something will send the user to es.wikinews.org/wiki/Something. I'm not aware of any exceptions here but I wouldn't be surprised if there are a few. Translating that into a db name is a little harder, as there are exceptions (but i can't think of any off the top of my head). generally es.wikinews.org should be eswikinews, and es.wikipedia.org should be eswiki, but there isn't a strong guarantee there just that's how it's typically done.

I think for the purposes of analysis though, since we can't collect metrics from the wiki on the other end anyways, just having the name (via the map above) should be good enough?

For a click through to the 'more from' button this would be:

https://en.wikibooks.org/wiki/Special:Search?search=guttenburg&fulltext=1

It looks like that will consistently be 'Special:Search' regardless of the wiki it's run on, but we should re-validate that after deployment. The 'ssclick' events for these will have the same position argument, indicating the position of the selected wiki in the ordering of the results.

Thanks for pointing that out! Yeah, that should definitely be one of the things we look for in the validation step.

The sistersearch site has been updated will all the latest code (including un-merged patches). The number of live hacks is now down to 1, and that won't be needed in production. So basically this should now be pretty close to what we expect to ship.

EventLogging debugging is enabled, so if you are added to one of the satisfaction tests a notification will popup on the top right every time a new event is sent. Additionally events are logged to the javascript console. The best way to see the events triggered by clicks is to open the javascript console and enable 'preserve logs' (for chrome, other browsers likely have something similar). This will keep the events logged to console from disapearing between page loads.

Example urls to join a particular test. After visiting one of these urls your browser will stay in the test until

Note that random still gives the same user the same order on every search, so for a single user it is consistent (to not confuse them), but different users will get sidebar results in a different order.

@mpopov with chelsy out, do you want to review the data collection here and ensure we are collecting everything necessary? Links above will opt you into the test on sistersearch and show eventlogging as described.

Also note that because we are in the progress of updating relforge instances from trusty -> jessie it's running on half capacity right now and results will be slow to come back (unless the necessary data just happens to be in disk cache already).

@mpopov with chelsy out, do you want to review the data collection here and ensure we are collecting everything necessary? Links above will opt you into the test on sistersearch and show eventlogging as described.

yup! I'm actually so tired of software engineering with the reportupdate thing that I actually asked her for dibs on data verification and stuff :)

Just checked and everything looks good! I'm guessing the Multimedia box at the top is in place of Commons result?

Change 334673 merged by jenkins-bot:
Configure A/B test for CrossProject search results sidebar

https://gerrit.wikimedia.org/r/334673

I've pushed the configuration part of this live. It can be seen on beta and once the train rolls forward this will be available on the production wikis to verify everything is working before shipping the javascript that turns on the test. Note that because this is only the trigger, and not the the javascript, future searches will not be in the test. You have to manually add the 'cirrusUserTesting=recall_sidebar_results' query string to searches to get the new rendering.

https://en.wikipedia.beta.wmflabs.org/wiki/?search=~something&cirrusUserTesting=recall_sidebar_results
https://en.wikipedia.beta.wmflabs.org/wiki/?search=~something&cirrusUserTesting=random_sidebar_results

It looks like there is an i18n message that might need to be switched from ->escaped() to ->parsed(). Small patch will be up shortly.

Change 336328 had a related patch set uploaded (by EBernhardson):
Switch search-interwiki-caption i18n to parsed

https://gerrit.wikimedia.org/r/336328

Change 336328 merged by jenkins-bot:
Switch search-interwiki-caption i18n to parsed

https://gerrit.wikimedia.org/r/336328

I'm not seeing in the task description, exactly which wiki's are we enabling the test on at first? Only enwiki?

I'm not seeing in the task description, exactly which wiki's are we enabling the test on at first? Only enwiki?

list of languages that have all or nearly all projects that we might want to also test on (in no particular order):

  • Arabic
  • Czech
  • English
  • Finnish
  • French
  • German
  • Greek
  • Hebrew
  • Russian
  • Portuguese
  • Swedish
  • Chinese
  • Ukrainian

@EBernhardson enwiki for sure. How about ruwiki, arwiki, frwiki, dewiki, zhwiki, itwiki? I have no idea how trivial or non-trivial it is to enable this test on those and other wikis.

It's pretty easy to set a list of wikis to test on, i just need the list. I'll update the patch with your list. If we want different sampling on different wikis things get a bit more complicated, but mostly just in deciding the samplings to use.

I'm OK with the same sampling on those wikis and OK with the current sampling rates we have in the description.

This test is expected to go live on February 9, 2017.

@EBernhardson: nevermind what I wrote earlier :) here's the actual answer, from the task description: "This test is expected to last at least a week and will be run on Persian, Italian, Catalan and Polish Wikipedias (their selection was based on community input)."

Yes, please - just those four because we have not yet communicated to other wikipedias that a test might be run on them. :) We want to just run on Persian, Italian, Catalan and Polish Wikipedias at this time. Thanks!

This comment was removed by debt.

Ok, test updated to run on fawiki, itwiki, cawiki and plwiki

cawiki is part of group1, so wmf.11 went live there today and is testable:

https://ca.wikipedia.org/wiki/?fulltext=1&search=cossolar&cirrusUserTesting=recall_sidebar_results

There is a problem with their pre-existing sidebar, we will probably want to remove it somehow during the test but I'm not sure (yet) where it comes from. Will investigate.

Oh nice - but, we definitely need to fix not showing the original sidebar. I thought we had already put code in for this case - maybe it's specific only to itwiki? @Jdrewniak can you help?

cawiki-first_sister-search_sidebar_results.jpg (1×1 px, 754 KB)

that particular sidebar is specific to cawiki (catalan), but i might have seen it on other wiki's. I'm having a bit of trouble tracking down where it comes from though ...

I found it, but i'm not sure what we can do about it that isn't a complete hack. The sidebar itself it is a hack, they've embedded it into the i18n message displayed when your search string doesn't exist as a page on the wiki:

https://ca.wikipedia.org/wiki/MediaWiki:Searchmenu-new

About the only thing we can do is disappear it with javacsript, giving a bit of a FOUC. We can't suppress this from being output, unless we hack up something in core to kill the message entirely on cawiki.

Change 336843 had a related patch set uploaded (by EBernhardson):
Temporary hax to hide cawiki's hacked in search sidebar

https://gerrit.wikimedia.org/r/336843

Change 336843 merged by jenkins-bot:
Temporary hax to hide cawiki's hacked in search sidebar

https://gerrit.wikimedia.org/r/336843

Change 336855 had a related patch set uploaded (by EBernhardson):
Temporary hax to hide cawiki's hacked in search sidebar

https://gerrit.wikimedia.org/r/336855

Change 334491 merged by jenkins-bot:
Enable Sister project search AB test

https://gerrit.wikimedia.org/r/334491

Change 336855 merged by jenkins-bot:
Temporary hax to hide cawiki's hacked in search sidebar

https://gerrit.wikimedia.org/r/336855

Mentioned in SAL (#wikimedia-operations) [2017-02-09T20:05:13Z] <thcipriani@tin> Synchronized php-1.29.0-wmf.11/resources/src/mediawiki.special/mediawiki.special.search.interwikiwidget.styles.less: SWAT: [[gerrit:336855|Temporary hax to hide cawiki hacked in search sidebar]] T149806 (duration: 00m 40s)

Change 336896 had a related patch set uploaded (by EBernhardson):
Enable Sister project search AB test

https://gerrit.wikimedia.org/r/336896

cawiki looks good now. group2 hasn't rolled forward yet so unable to test, but the patch is cherry picked for deploy in ~3 hours. If group2 hasn't rolled forward by then will need to delay the test until monday (but not expecting any problem).

Change 336896 merged by jenkins-bot:
Enable Sister project search AB test

https://gerrit.wikimedia.org/r/336896

Mentioned in SAL (#wikimedia-operations) [2017-02-10T00:32:11Z] <thcipriani@tin> Synchronized php-1.29.0-wmf.11/extensions/WikimediaEvents: SWAT: [[gerrit:336896|Enable Sister project search AB test]] T149806 (duration: 00m 45s)

Change 334721 merged by jenkins-bot:
Setup sister search prefix display types

https://gerrit.wikimedia.org/r/334721

Mentioned in SAL (#wikimedia-operations) [2017-02-10T01:43:11Z] <ebernhardson@tin> Synchronized wmf-config/CirrusSearch-common.php: Setup sister search prefix display types T149806 (duration: 00m 48s)

Mentioned in SAL (#wikimedia-operations) [2017-02-10T01:48:11Z] <ebernhardson@tin> Synchronized wmf-config/CirrusSearch-common.php: Setup sister search prefix display types T149806 (duration: 00m 40s)

Test is running, only minor glitch is the search-interwiki-caption message has never been translated in catalan. Someone will need to do that to get simple names instead of the "Resultats de ca.wikisource.org:" messages

Thanks so much for your work on this, @EBernhardson!

I did some looking around other wikis using the generalized link and found that several other languages that don't have the project names translated.

Is there a better way to display the project name when it hasn't been translated?

@Jdrewniak - we should also refine the text to be as similar as possible - Wikitionary has: Word definitions from Wiktionary but Wikibooks has In Wikibooks and Wikiversity has From Wikiversity

The other available option is the project-localized-name-cawikiquote, which in english would be ''Catalan Wikiquote". The annoying part is I don't know that the renderer here has access to a map from interwiki prefix to the dbname used in these message names. There doesn't seem to be individual messages that just translate the project names without the language as well.

I'd like to keep this ticket open and in the In Progress column until the test is completed, sometime next week if all the data needed is received.

Love it.
1 emphatic request: Please make the main links use the normal Blue color, as per almost all standard links on our sites.

  • Context: I first did a mouse-over of (more) because it was the only blue link (but then I saw it was just another local search). After I realized that "Irlanda" was the direct link, I tried clicking on ALL the bold black text (and icon), because I was suddenly unsure which parts were links, and which were just bold text...

cws.png (948×529 px, 83 KB)

Looks nice so far! However the multimedia results seem to be frequently irrelevant, perhaps a result of searching based primarily on title... Titles of images are often not descriptive, are in other languages, or contain irrelevant details that are sorted/scored higher than relevant images that appear in articles.

It's also quite odd that the links look like non-linked section headers; recommend using standard link colors here.

There's also a "flashing" behavior where the sidebar moves down a fraction of a second after loading, when the multimedia bits are loaded; this is a bit distracting.

The display of the metadata from Wikisource is kindof ugly in that preview: its doing both the title and the file name, and some other stuff.

Screenshot 2017-02-14 at 4.31.04 PM.png (883×596 px, 255 KB)

The three surfaced images appear to be shown backwards from the top 3 results, resulting in the 3rd result being shown largest and the top result being shown smallest:

Screen Shot 2017-02-14 at 9.38.12 PM.png (944×2 px, 705 KB)

vs
Screen Shot 2017-02-14 at 9.38.21 PM.png (794×1 px, 464 KB)

Change 336843 had a related patch set uploaded (by EBernhardson):
Temporary hax to hide cawiki's hacked in search sidebar

https://gerrit.wikimedia.org/r/336843

 /* Evil temporary hax for cawiki */
 #sisterproject {
	display: none;
 }

For once, I suppose this was fine. But its quite far out of scope for MediaWiki core master (WikimediaEvents or a wmf-only patch would be better).

Assuming there will likely be more conflicts with future roll outs, better use a class on a common ancestor. Then this could placed in local wiki's Common.css, e.g.:

.mw-searchresults-has-iw #sisterproject {
  display: none;
}

I suppose such class would be useful in general as well to make other minor layout adjustments where needed, to accommodate the sidebar. It seems interwikiwidget.styles.less already makes such adjustment, but without a scope class - under the assumption the styles won't be loaded unless the mode is activated, which is a pattern better avoided.

Change 339333 had a related patch set uploaded (by Krinkle):
Revert "Temporary hax to hide cawiki's hacked in search sidebar"

https://gerrit.wikimedia.org/r/339333

Change 339333 merged by jenkins-bot:
Revert "Temporary hax to hide cawiki's hacked in search sidebar"

https://gerrit.wikimedia.org/r/339333

Change 340111 had a related patch set uploaded (by Jdrewniak):
Fixing search results percentage width

https://gerrit.wikimedia.org/r/340111

This is now closed out - T160004 is the second running of this test.