Page MenuHomePhabricator

Investigate the full-text desktop search CTR decline on Wikimedia Commons
Closed, ResolvedPublic

Description

In T187827, @MNeisler replicate what I (@chelsyx) did in T177534 to compute some search metrics on Wikimedia Commons, in order to get familiar with the data analysis tools we are using (see the epic ticket T185363 for more details). She found that the desktop full-text search-wise CTR is only 3.17% on Commons in February 2018 (see her report for more details), which drops a lot comparing to the first analysis in November 2017 (10.42%).

We'd like to find out what causes the decline. I (@chelsyx) will help @MNeisler in this process.

Potential direction for the investigation:

  • Get the search-wise CTR from Nov to now. Is it a gradual decline or sudden drop?
  • Check the number of search events and sessions, and then compute other versions of CTR (session-wise CTR, total clicks / total impressions). Do we see the same pattern?
  • Before and after the drop, does the distribution of the users (in terms of browser and operating systems) change a lot?
  • Check the log of several sessions with no click, is there any pattern?
  • Is this happening on other wiki as well?

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptFeb 27 2018, 6:27 PM
Ramsey-WMF added subscribers: Abit, Cparle.

A potential change that was made in that time period was to include multimedia files as part of default search on commons, instead of before where default search only searched "content" and files had to be searched for by clicking the 'multimedia' tab on the search results page. This was deployed december 14th: https://gerrit.wikimedia.org/r/#/c/398394/

chelsyx updated the task description. (Show Details)Feb 27 2018, 11:34 PM

Thanks @EBernhardson ! We will see if the drop happened around that date.

Thanks @chelsyx and congrats to @MNeisler on your first discovery!

Here is a plot of the search-wise and session-wise CTR from Nov 2017 to now on Commons. It shows a sudden drop on Dec 14th which looks in line with @EBernhardson 's comment re the deploy date of the change to include multimedia files as part of default search on commons. @EBernhardson Let me know if you have any additional thoughts. I'll work with @chelsyx to investigate further.

chelsyx added a comment.EditedMar 7 2018, 2:09 AM

After discussing with @MNeisler today, we think it is very possible that after the change on Dec 14, users are more likely to find the multimedia files they are looking for and view the files on search result pages via Media Viewer by clicking on the thumbnails (credit to @MNeisler for this idea). So the clickthroughs to the file page drops. To verify this assumption, we will:

  • Confirm with @EBernhardson that opening the media viewer by clicking on the thumbnails don't get logged by TestSearchSatisfaction2
  • Join TestSearchSatisfaction2 and webrequest to see how many users who didn't clickthrough to the file page open the media viewer, and how many of them go to the file page from media viewer
  • Compare the proportion of log-in users who enable media viewer between users who clickthrough on search result page, vs users who didn't clickthrough

Any thoughts or suggestions? Any other dataset regarding the media viewer usage we can look into? Thanks!

Abit added a subscriber: atgo.Mar 7 2018, 2:45 AM

opening the media viewer by clicking on the thumbnails

Does this happen? When I click on the media thumbnail, I'm taken to the file page.

Another reason that the CTR dropped could be that people were previously opening multiple tabs or clicking and then coming back, looking for what they needed. (Credit to @atgo for this idea.) Dunno how we might test that, though.

chelsyx added a comment.EditedMar 7 2018, 4:14 AM

opening the media viewer by clicking on the thumbnails

Does this happen? When I click on the media thumbnail, I'm taken to the file page.

@Abit If you go to your "Preference", click on the "Appearance" tab, under the "Files" section, you can enable media viewer. Also if you log out and do the search again, clicking on the thumbnails will open the media viewer. I think it's because media viewer is enabled by default for all Wikimedia sites (I tested on English Wikipedia too).[1] But my experience was the same as you at first and I had to change my preference setting too. Maybe it's not the default setting for log-in users?

Another reason that the CTR dropped could be that people were previously opening multiple tabs or clicking and then coming back, looking for what they needed. (Credit to @atgo for this idea.) Dunno how we might test that, though.

Right click and open on another tab (instead of left click and open on the same tab) should be counted as a clickthrough as well in TestSearchSatisfaction2. Need to confirm with @EBernhardson.

[1] https://www.mediawiki.org/wiki/Help:Extension:Media_Viewer#How_can_I_use_Media_Viewer?

If you click on the thumbnails, media viewer will open; if you click on the text, then the file page will open:

What seems to be directly related is the html generated for file results vs text page results.
Text page results will have a div with a class mw-search-result-heading.
Image results don't have this one.
According to https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/extensions/WikimediaEvents/+/master/modules/all/ext.wikimediaEvents.searchSatisfaction.js#670
we rely on this div to identify SRP clicks.
At a glance I'd say that we ignore all image clicks (not only the clicks on the thumbnails).
Thanks for catching the problem!

Thank you very much @dcausse !

I created a ticket for this bug: T189242

According to David in the email:

It's likely that we see a huge increase in CTR after we fix the problem, CTR will certainly be way higher than what it was prior to Erik's patch.

@MNeisler and I will estimate the real CTR using backend data (webrequest and CirrusSearchRequestSet), and measure the impact of the change on Dec 14.

debt awarded a token.Mar 9 2018, 5:59 PM
MNeisler added a comment.EditedApr 5 2018, 4:31 AM

Here’s an estimate of the ctr on Commons compared to English Wikipedia found by joining the webrequest and CirrusSearchRequestSet (Thanks @chelsyx for all the help and suggestions!). This ctr includes both clicks to open pages and clicks on thumbnails to open media viewer.

Due to the data available at the time of the query, we are unable to get the ctr prior to the Dec 14th patch; however, we found a much higher desktop full-text ctr on Commons (27.89%) compared to the search-wise ctr found in November 2017 (10.42%) or February 2018 (3.17%) found with event logging data.

The ctr on Commons by namespace shows a higher ctr to media file pages compared to main articles or category pages. This would explain the low ctr found in T187827 since clicks to image files were ignored by searchSatisfaction.

Github repo: https://github.com/MeganNeisler/CTR_Commons_Investigation/tree/master/ctr_webrequest

Good job @MNeisler !
We will re-do the analysis to see how things change using eventlogging after T189242 is fixed.

mpopov moved this task from Triage to Doing on the Product-Analytics board.Apr 23 2018, 10:58 PM
Abit awarded a token.Apr 23 2018, 11:02 PM
MNeisler moved this task from Doing to Epics on the Product-Analytics board.Jun 28 2018, 8:29 PM
MNeisler moved this task from Epics to Doing on the Product-Analytics board.Jan 15 2019, 7:45 PM
MNeisler added a comment.EditedJan 21 2019, 7:23 PM

T189242 was fixed in late September 2018. To see how metrics have changed, I re-computed several desktop search metrics on Wikimedia Commons with available eventlogging data from October 18, 2018 to January 16, 2019. Metrics were compared to English Wikipedia desktop searches.

Results show an increase in the full-text search clickthrough rate on Commons from 3.17% in February 2018 (pre-bux fix) to 22.89% in data reviewed between October 2018 to January 2019 (post-bug fix). There was a slight decrease in the zero results rate from 7.05% to 6%. The proportion of searches with clicks to other search result pages stayed roughly the same with only a very slight increase from 13.2% to 13.48%.

Summary of Changes

Search metricFebruary 2018 (pre-bux fix)Oct 2018 - Jan2019 (post-bug fix)
Zero results rate7.05%6%
Clickthrough rate3.17%22.89%
Searches with clicks to see other serach result pages13.20%13.48%

The full-text search clickthrough rate increased to 22.89% on Commons. On English Wikipedia, the overall clickthrough rate decreased only slightly from 35.74% in February 2018 to 34.07% from October 2018- January 2019.

6.0% of full-text searches on desktop did not yield any results on Wikimedia Commons, which is lower than the 9.85% zero results rate found on English Wikipedia during the same time period.

As also found in February 2018, users on Commons are much more likely to click to see other pages of search results on Commons (13.48%) compared to English Wikipedia (0.28%).

Let me know if you have any questions or comments.

Codebase

Thanks @MNeisler ! @Abit @Ramsey-WMF please let us know if you have any question.

chelsyx closed this task as Resolved.Feb 5 2019, 11:05 PM

Closing this ticket since the work is done. Feel free to re-open it if you have any question.