Once we have a set of word tokens identified in the wikitext, we want to classify these word tokens into dictionary words or non-dictionary words. This will allow us to build the credibility signals related to dictionary and non-dictionary words.
Needs a bit to research to find a suitable library for the job.
Implementation details:
[1] This utility will live under structured-data/packages.
[2] is_dictionary_word(word_token) -> True/ False
If a word token exists in the dictionary for any language, we return true. Specific language dictionary checks are implemented using enchant/aspell/myspell in revscoring. You will need to find similar libraries for dictionary check in golang. Please refer to the check for english here. Similarly, browse through the directory for other languages.
[3] If #2 returns false, it is non_dictionary word