Full review of small sample (~1K) of full text queries to categorize them all
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	TJones
	Aug 5 2015, 4:11 PM

Description

Rather than looking for big patterns, we also need to identify categories that can't be readily detected other than by manual inspection (e.g., typos and gibberish) to gauge their extent.

This also gives us a sample of typos sent through the API to see how many would get suggestions if suggestions were enabled.

Event Timeline

TJones created this task.Aug 5 2015, 4:11 PM

TJones claimed this task.

TJones raised the priority of this task from to High.

TJones updated the task description. (Show Details)

TJones added a project: Discovery-Search (Current work).

TJones added subscribers: EBernhardson, dcausse.

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptAug 5 2015, 4:11 PM

TJones moved this task from Incoming to not in use - please delete on the Discovery-Search (Current work) board.Aug 5 2015, 4:21 PM

Done:
https://www.mediawiki.org/wiki/User:TJones_(WMF)/Notes/Survey_of_Zero-Results_Queries#Full_manual_review_of_a_1K_enwiki_sample

Highlights:
About 14.6% typos, plus another 8.9% that look like incomplete strings (almost all from API, so they are probably apps trying to do prefix searches, but I don't know for sure).

Of the typos, 13.1% had mistakes in the first two characters of a search term, so a reverse index might be helpful!

Of the typos, 50% got the obviously correct results with autosuggestions. 25% got something, and 25% got nothing.

7.2% of queries were in or mostly in a foreign language.

28.0% were not encyclopedic in my estimation.

5.0% was junk.

And about 1% was someone searching for addresses in Las Vegas.

TJones moved this task from not in use - please delete to Needs Reporting on the Discovery-Search (Current work) board.Aug 6 2015, 6:45 PM

TJones closed this task as Resolved.Aug 10 2015, 1:31 PM

TJones set Security to None.

• Deskana moved this task from Needs Reporting to Resolved on the Discovery-Search (Current work) board.Sep 9 2015, 2:25 AM

Full review of small sample (~1K) of full text queries to categorize them allClosed, ResolvedPublicActions

Description

Event Timeline

Full review of small sample (~1K) of full text queries to categorize them all
Closed, ResolvedPublic
Actions