Begin to quantify why people use Search Engines instead of Wikimedia search
Closed, ResolvedPublic3 Estimated Story Points
Actions

Assigned To

Authored By

	TJones
	Sep 8 2015, 8:20 PM

Description

The initial hypothesis is that there are two big reasons why people use Google (or other search engines) instead of Wikimedia search: (1) external search engines give better results than Wikimedia search, and (2) it's just habit, more convenient, they don't know about our search capabilities, etc.

We could begin to quantify this by looking at queries that come from search engines and lead to Wikimedia pages (at least those with referrers) and test whether those same queries (possibly minus "wiki" or "wikipedia" and similar search terms) give the destination page as a result using our search (say, top 5 results).

If they don't, then it gives credence to the idea that people are using external search engines because they give better results.

If they do, then it's habit or convenience of using an external engine or ignorance of our search capabilities.

In the former case, we have examples of what we need to work on in our search engine. In the latter case, we need to work on our advertising!

Initial scope: we could limit our initial investigation to some set of specific search engines (Google, DuckDuckGo, Bing) and/or to some specific set of Wikis (the large wikis, or ones we have in labs) to see if there's a big effect from the biggest search engines to the biggest wikis.

Results should probably be broken down by search engine and wiki or at least by language—maybe Google users are using difficult queries to get into the Hungarian Wiktionary, but not into German Wikipedia, while DuckDuckGo users create difficult queries to get content out of French Wikipedia, but not Finnish Wikipedia. These are jokey examples, but there's some insight to be gained from this breakdown, especially by language. Maybe our support of English is great, but our support of Finnish is much weaker than Bing's, for example.

Caveat: There are other reasons why someone might use an external search engine—like searching multiple wikis at once—that we won't necessarily be able to detect here. However, we should be able to find obvious shortcomings, such as language support, typo correction, and magical inference of user intent.

Step 1: extract queries ("gorge clooney wiki"), sources ("Google"), and destination ("en.wikipedia.org/wiki/George_Clooney")
Step 2?: normalize queries (should we drop "wiki" if it's not the only query term, since that seems to be info for Google, or, run both with and without "wiki", etc.)
Step 3: run the referring queries agains the relevant wiki
Step 4: analysis (profit?!)

Event Timeline

TJones created this task.Sep 8 2015, 8:20 PM

TJones raised the priority of this task from to Needs Triage.

TJones updated the task description. (Show Details)

TJones added a project: CirrusSearch.

TJones added subscribers: TJones, • Deskana, Ironholds.

Restricted Application added a project: Discovery-ARCHIVED. · View Herald TranscriptSep 8 2015, 8:20 PM

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Ironholds moved this task from Needs triage to Analysis on the Discovery-ARCHIVED board.Sep 8 2015, 8:21 PM

Ironholds added a project: Discovery-Analysis (Current work).Sep 8 2015, 8:25 PM

Ironholds set Security to None.

Ironholds edited a custom field.

• Tfinc subscribed.Sep 8 2015, 9:25 PM

Ironholds claimed this task.Sep 9 2015, 6:09 PM

Ironholds moved this task from Backlog to In progress on the Discovery-Analysis (Current work) board.

main.R1 KBDownload

Done! Code here; @TJones the file is in stat1002 at /home/ironholds/matched_google_searches.tsv

Ironholds moved this task from In progress to Done on the Discovery-Analysis (Current work) board.Sep 10 2015, 2:04 PM

• ksmith moved this task from Analysis to On Sprint Board on the Discovery-ARCHIVED board.Sep 10 2015, 8:12 PM

Nemo_bis subscribed.Sep 16 2015, 7:16 AM

• Deskana closed this task as Resolved.Sep 24 2015, 4:06 AM

• Deskana moved this task from Done to Resolved on the Discovery-Analysis (Current work) board.

My write up is here:
https://www.mediawiki.org/wiki/User:TJones_(WMF)/Notes/Why_People_Use_Search_Engines

• Deskana moved this task from Inbox to Resolved/Invalid/Declined/Legacy on the CirrusSearch board.Dec 31 2015, 5:07 AM

Restricted Application added a subscriber: StudiesWorld. · View Herald TranscriptDec 31 2015, 5:07 AM

Begin to quantify why people use Search Engines instead of Wikimedia searchClosed, ResolvedPublic3 Estimated Story PointsActions

Description

Event Timeline

Begin to quantify why people use Search Engines instead of Wikimedia search
Closed, ResolvedPublic3 Estimated Story Points
Actions