Such built-in lists loader could reuse open licence lexicons projects.
* https://db.panlex.org / panlex_swadesh.zip
* https://github.com/lingua-libre/unilex-extended/tree/main/frequency-sorted-hash
Given a folder with variations of :
```
LICENCE.md
aaa.txt==== Data
....txtThe following provides a convenient starting pack of 2,000+ languages via ~3,900 lists :
cmn.txt
eng.txt
fra.txt
hin.txt
```
{F56355693}
{F56355707}
==== Relevant projects
- Panlex > Swadesh : top common words under CC0 licence- [[ https://db.panlex.org | Panlex ]] > [[ https://db.panlex.org/panlex_swadesh.zip | Swadesh.zip ]] (CC0 licence):
- - `Swadesh 110 : available in 2` for 2,111 languages
- - `Swadesh 207 : available in` for 776 languages
- [[ https://github.com/lingua-libre/unilex-extended/tree/main/frequency-sorted-hash
| Unicode > Unilex > frequency lists under ]] (GNU licence:)
- Various length : available in 1for 1,001 languages
{F56355693}
Therefore providing convenient starting lists for 2,000+ languages.
==== Notes{F56355707}
==== Pre-formating
* Some files have several words on a row (swadesh), which should be split using the known separator
* Some files formats includes prefixes `# word` or suffixes `word 38890550`
* The concerned open source projects could be pre-processed, normalized, uploaded to Github (or alternative), then serving the raw files as github pages. This would remove the first two concerns.
==== SourcesBots for Commons
- https://db.panlex.org / panlex_swadesh.zipThe following bots are usable :
-- [[ https://github.com/lingua-libre/unilex-extended/tree/main/frequency-sorted-hash
==== WikiBot and Commons ?hugolpz/Dragons_Bot | Dragons_Bot ]]
A way would be to replicate exactly the current approach, but on Commons. If so, need to run the bot to upload those list to Commons. Pretty straight forward, editable collaboratively, watchable, etc.- [[ https://github.com/hugolpz/WikiapiJS-Eggs | WikiapiJS-Eggs ]]