Page MenuHomePhabricator

relevant search result excerpts: prefer first sentence of article
Closed, ResolvedPublic

Description

Author: sumanah

Description:
I'm on English Wikipedia and I've turned on "new search" in the beta features.

I used the search box to search for "mackinnon":

https://en.wikipedia.org/w/index.php?search=mackinnon&title=Special%3ASearch&fulltext=1

The excerpt that shows up on the search results page now, with CirrusSearch, comes from citations 5 and 6 in the article on Catharine MacKinnon:

"Highly Cited Author - Catharine A. MacKinnon ^ Catharine MacKinnon 2005 Fellow of Stanford's Center for"

This feels like a less optimal result. I think we should prefer the first sentence in an article, maybe especially for articles that are biographies. For instance, the body of the article on Catharine MacKinnon starts:

"Catharine Alice MacKinnon (born October 7, 1946) is an American feminist, scholar, lawyer, teacher and activist."

Under the old search, that's what shows up on the search results page, albeit with some extraneous spaces around the commas. This is more relevant to a user who is trying to find a particular person's biography. In general, I suspect that the first sentence of a wiki page is more likely to help a searcher gauge relevance than is an excerpt from footnotes.


Version: unspecified
Severity: normal
Whiteboard: Experimental_Highlighter
URL: https://en.wikipedia.org/wiki/Catharine_MacKinnon
See Also:
https://bugzilla.wikimedia.org/show_bug.cgi?id=63729
https://bugzilla.wikimedia.org/show_bug.cgi?id=66045

Details

Reference
bz61669

Event Timeline

bzimport raised the priority of this task from to High.Nov 22 2014, 2:55 AM
bzimport added a project: CirrusSearch.
bzimport set Reference to bz61669.

This is a good idea. The old search boosted the lead section of an article. We should do this too in Cirrus. It'll lead to better results and better snippets.

Swapping out upstream/Elasticsearch_Open_Bug with Experimental_Highlighter because it supports boosting early parts of the article when picking the snippet.

Change 137521 merged by jenkins-bot:
Boost results that contain hits in the opening

https://gerrit.wikimedia.org/r/137521