Author: mizuno.jun
Description:
a patch for CJKFilter.java and its test.
With language=ja setting,
CJKFilter wrongly tokenize CJK string
if this string starts with non-CJK characters.
Example:
A string "abC1C2C3", where C1 C2 C3 mean a CJK characters, is tokenized into
a token stream (abC1, C1C2, C2C3).
This should be (ab, C1C2, C2C3, C3C4).
This behavior causes an odd snippet in search result.
A token stream (abC1, C1C2, C2C3) is combined into a word "abC1C1C2C3".
Version: unspecified
Severity: normal
attachment cjkfilter.patch ignored as obsolete