Page MenuHomePhabricator

Test and analyze Kuromoji Japanese language analyzer
Open, MediumPublic13 Estimated Story Points

Description

User Story: As a user of a Japanese-language wiki, I'd like better language processing than overlapping bigrams. The Kuromoji analyzer might well be up to the task now.

It's been a bit more than five years since we last looked at Kuromoji (T166731). In that time, it has probably gotten better, and I expect my ability to deal with shortcomings in analyzers has also gotten better.*

────────
   * Experience is something you don't get until right after you need it.
 

Acceptance Criteria:

  • A write up of findings on the Kuromoji analyzer
  • Either...
    • ...include reasons why Kuromoji is unacceptable in the write up, or
    • ...a patch implementing the Kuromoji analyzer

Event Timeline

TJones set the point value for this task to 13.Sep 26 2022, 4:01 PM

I'm on the fence between 8 & 13 story points (can I say 10?), so I'm going with the bigger number until we talk about it at a later meeting.

Moving this back to the backlog to focus on more straightforward unpacking. CJK analyzer unpacking for Japanese (T326822) is still underway.

TJones triaged this task as High priority.Jan 13 2023, 9:28 PM
TJones lowered the priority of this task from High to Medium.Mar 6 2023, 6:26 PM