Such built-in lists loader could reuse open licence lexicons projects.
* https://db.panlex.org / panlex_swadesh.zip
* https://github.com/lingua-libre/unilex-extended/tree/main/frequency-sorted-hash
Given a folder with variations of :
```
LICENCE.md
aaa.txt
....txt
cmn.txt
eng.txt
fra.txt
hin.txt
```
{F56355693}
{F56355707}
==== Relevant projects
- Panlex > Swadesh : top common words under CC0 licence:
- Swadesh 110 : available in 2,111 languages
- Swadesh 207 : available in 776 languages
- Unicode > Unilex > frequency lists under GNU licence:
- Various length : available in 1,001 languages
Therefore providing convenient starting lists for 2,000+ languages.
==== Notes
* Some files have several words on a row (swadesh), which should be split using the known separator
* Some files formats includes prefixes `# word` or suffixes `word 38890550`
* The concerned open source projects could be pre-processed, normalized, uploaded to Github (or alternative), then serving the raw files as github pages. This would remove the first two concerns.
==== Sources
- https://db.panlex.org / panlex_swadesh.zip
- https://github.com/lingua-libre/unilex-extended/tree/main/frequency-sorted-hash