Japanese texts almost never use spaces, so VisualEditor falls back to other general word separation rules. However, there's a flaw in the way these general rules apply to Japanese. Take, for example, the word «大学», which means "university" and is composed of Chinese characters which individually in Japanese mean "big" («大») and "study" («学»). When you place your cursor at the end of this word and click the link button, only the second character is selected by the link text. Japanese users feel [[https://ja.wikipedia.org/w/index.php?title=Wikipedia%3A%E3%83%93%E3%82%B8%E3%83%A5%E3%82%A2%E3%83%AB%E3%82%A8%E3%83%87%E3%82%A3%E3%82%BF%E3%83%BC%2F%E3%83%95%E3%82%A3%E3%83%BC%E3%83%89%E3%83%90%E3%83%83%E3%82%AF&type=revision&diff=55724736&oldid=55721366|both should be selected]].
I talked to @HaithamS about this, and he said he found this behavior annoying as well. His proposed solution was that when looking for a word in Japanese, the word should be the longest continuous run of characters from any one character set (hiragana, katakana, or Chinese). This is essentially the extra word separation rule that [[http://unicode.org/cldr/trac/browser/trunk/common/segments/ja.xml#L29|CLDR adds for Japanese]].
Is there a way we can address this without resorting to project-specific or language-specific word separation rules? Possibly not; for example, this rule for dealing with Chinese characters probably doesn't apply in Mandarin or Cantonese. If not, would it be crazy to have language-specific rules, with the language of a piece of text determined by its language annotation or, failing that, the project language?