Page MenuHomePhabricator

Look into producing a list of frequent 'zero result' search terms on Wikimedia projects
Closed, DeclinedPublic

Description

From a community member: "One of the easy wins in search is to publish lists of popular search terms that don't currently have an obvious wikipedia article. In some case people will be able to create redirects to resolve them."

I believe the completion suggester helps with this in some capacity.

However, the idea of a list of common search queries that show zero matching results could be useful to editors and search engineers in determining why certain queries result in no matches and possible ways to improve the search results.

This task seeks to discern the technical, legal, and privacy concerns related to creating such a list.

A few initial questions.

  • Is it feasible to create a useful list of top queries with zero results for a wiki?
  • What are the technology, privacy, and security concerns?
  • How difficult would it be to automate something like this?

Goals:

One outcome is a "No" - the concerns and technical implementation are insurmountable.

The other possible outcome will be a "Yes", with clear understanding of what it would take in resources to accomplish. If it is something the Discovery team wishes to take on, a plan for implementation would be pursued.

Concerns:
Privacy - we don't want to reveal any private information by accident. People can accidentally copy/paste sensitive information into the search box and have that be included in any index.

Benefits:
I've heard this asked by a few folks in the community as a way of identifying opportunities to create new articles or reword/redirect popular terms to wiki articles.

Event Timeline

CKoerner_WMF raised the priority of this task from to Needs Triage.
CKoerner_WMF updated the task description. (Show Details)
CKoerner_WMF added a project: Discovery.
CKoerner_WMF added a subscriber: CKoerner_WMF.

Related tasks:

These are actually tasks about slightly different things, however the same privacy principles apply; see T8373#1856036 for more information.

CKoerner_WMF renamed this task from Look into producing a list of frequent search terms on Wikimedia projects to Look into producing a list of frequent 'zero result' search terms on Wikimedia projects.Jul 5 2016, 3:00 PM
CKoerner_WMF updated the task description. (Show Details)
CKoerner_WMF set Security to None.

@TJones performed an investigation into the top unsuccessful search queries. The results show that generating a list is not only difficult, but any results would be of low value.

"I think the problem with all of these strategies is that so many high-frequency queries would be eliminated by any of them that any useful mining would be down to slogging through the low-impact long tail."

https://www.mediawiki.org/wiki/User:TJones_(WMF)/Notes/Top_Unsuccessful_Search_Queries

debt added a subscriber: debt.

Thanks for the analysis, @TJones ! I'll go ahead and close this ticket as declined for @CKoerner_WMF