Page MenuHomePhabricator

Create Utility For Dictionary/Non-Dictionary Words Check
Closed, ResolvedPublic13 Estimated Story Points

Description

Once we have a set of word tokens identified in the wikitext, we want to classify these word tokens into dictionary words or non-dictionary words. This will allow us to build the credibility signals related to dictionary and non-dictionary words.

Needs a bit to research to find a suitable library for the job.

Screen Shot 2022-01-19 at 3.21.10 PM.png (722×1 px, 122 KB)

Implementation details:
[1] This utility will live under structured-data/packages.
[2] is_dictionary_word(word_token) -> True/ False
If a word token exists in the dictionary for any language, we return true. Specific language dictionary checks are implemented using enchant/aspell/myspell in revscoring. You will need to find similar libraries for dictionary check in golang. Please refer to the check for english here. Similarly, browse through the directory for other languages.
[3] If #2 returns false, it is non_dictionary word

Event Timeline

prabhat triaged this task as High priority.Jan 18 2022, 5:36 PM
prabhat updated the task description. (Show Details)
prabhat renamed this task from Create Utility For Dictionary Words, Informal And Bad Word Check to Create Utility For Dictionary/Non-Dictionary Words Check.Jan 19 2022, 8:23 PM
prabhat updated the task description. (Show Details)
Lena.Milenko changed the task status from Open to In Progress.Jan 22 2022, 4:26 AM