- Compile all non-whitespace languages as a single
- Train sentencepiece on the non-whitespace languages cluster
Description
Description
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Resolved | Appledora | T316941 NLP Tools for Content Gaps | |||
Resolved | Appledora | T328264 NLP Tools: Word Tokenization | |||
Declined | Appledora | T328267 Word Tokenization: Non-whitespace languages | |||
Declined | Appledora | T328270 Sentencepiece: all non-whitespace languages |