Analysis of Method 2 Suggestion results
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	TJones
	Feb 10 2020, 10:43 PM

Description

Gather suggestion output from Elastic-based suggestions and Method 2 (CJK) suggestions for a collection of data, and analyze the results.

Analysis will include counting how often Elastic-based suggestions are made, how often Method 2 suggestions are made, how often both are made, and a manual review of a sample when both are made to see which does better—which is the same as what we did for M0.

I'll be getting help from speakers of Chinese, Japanese, and/or Korean to review the sample where both make suggestions, and any other review that seems to be necessary. (M‍1 had extra stuff that needed review, for example.)

Related Objects
Search...

Status	Assigned	Task
Open	None	T212884 [EPIC] Improve Search Suggestions with NLP (Did You Mean / Glent)
Open	None	T212891 [EPIC-ish][Milestone 2] Implement NLP Search Suggestion Method 2 for CJK languages
Resolved	TJones	T244800 Analysis of Method 2 Suggestion results
Resolved	TJones	T265081 Fix Glent M2 CJK suggestion tokenization
Resolved	EBernhardson	T277213 Eliminate old M2 suggestions with improper tokenization
Resolved	TJones	T267971 Analyze Speaker-Reviewed M2 Data for Chinese

Event Timeline

TJones triaged this task as Medium priority.Feb 10 2020, 10:43 PM

TJones created this task.

TJones edited projects, added Discovery-Search (Current work); removed Discovery-Search.

TJones moved this task from Incoming to Waiting on the Discovery-Search (Current work) board.Mar 23 2020, 5:35 PM

VulpesVulpes825 moved this task from Backlog to Research on the Chinese-Sites board.Jun 22 2020, 9:30 AM

TJones moved this task from Waiting to In Progress on the Discovery-Search (Current work) board.Sep 10 2020, 9:04 PM

Data has been wrangled and prepped for review. I have a Japanese reviewer, a likely Korean reviewer, and I'm waiting to hear back on Chinese. Because of a technical glitch, I only have older Japanese data (from Feb), but it should be fine.

TJones mentioned this in T265081: Fix Glent M2 CJK suggestion tokenization .Oct 8 2020, 7:25 PM

TJones added a subtask: T265081: Fix Glent M2 CJK suggestion tokenization .Oct 8 2020, 7:49 PM

TJones moved this task from In Progress to Waiting on the Discovery-Search (Current work) board.Oct 9 2020, 6:00 PM

Completed analysis of Japanese and Korean suggestions, reviewed by speakers—thanks, Jerry & Lisa!

Korean and Japanese follow a similar pattern:

~70–80% of queries are in the expected writing system(s).
~10–20% of queries are in Latin (and the rest are a mixed bag).
~8–12% of queries get suggestions from the current production DYM, they are generally mediocre (~⅓–½ are rated as good).
- ~⅓–½ of suggestions made are for Latin queries, and they are generally poor (~¼–⅓ are rated as good).
- Suggestions in the expected writing system(s) are generally mediocre (up to ½ are rated as good).
M‌2 provides a small impact (~¾–2½%), but with some non-trivial increase in coverage (~8¾%–22%).
However, M‌2 suggestions are generally poor (~30% are rated as good).

The results aren't great, but the new M‌2 suggestions are largely orthogonal to the existing prod/phrase suggester suggestions, and of roughly similar quality. We should run an A/B test and then decide whether the additional effort to implement M‌2 is worth whatever increase in clickthrough we see.

Full write up (for Korean and Japanese, so far) is on MediaWiki.

Chinese to come, once speaker review is done.

TJones moved this task from Waiting to Needs review on the Discovery-Search (Current work) board.Oct 29 2020, 3:09 PM

Under Korean Stats (and Japanese stats). Should identical be unique?

here are 312,698 queries in our test corpus. 276,745 (88.502%) of them are identical after (basic) normalization.

As to the report and it's recommendations, overall this seems reasonable. As stated the results aren't looking particularly promising, but the existing suggestions are also similarly bad. Since they seem to provide suggestions to disjoint sets deploying it could still be an overall improvement (although continuing the trend of slightly embarasing suggestions in some cases).

In T244800#6610111, @EBernhardson wrote:

Under Korean Stats (and Japanese stats). Should identical be unique?

No, it's identical, as in the normalization had no effect. Not too surprising for CJK, since lowercasing often does nothing. Normalizing whitespace could have an effect, though. It's something I put in there in the early days to see how much normalization matters.

In T244800#6611733, @TJones wrote:

In T244800#6610111, @EBernhardson wrote:

Under Korean Stats (and Japanese stats). Should identical be unique?

No, it's identical, as in the normalization had no effect. Not too surprising for CJK, since lowercasing often does nothing. Normalizing whitespace could have an effect, though. It's something I put in there in the early days to see how much normalization matters.