==== Data
The following provides a convenient starting pack of 2,000+ languages via ~3,900 lists :
- [[ https://db.panlex.org | Panlex ]] > [[ https://db.panlex.org/panlex_swadesh.zip | Swadesh.zip ]] (CC0 licence):
- `Swadesh 110` for 2,111 languages
- `Swadesh 207` for 776 languages
- [[ https://github.com/lingua-libre/unilex-extended/tree/main/frequency-sorted-hash
| Unicode > Unilex > frequency lists ]] (GNU licence)
- Various length for 1,001 languages
{F56355693}
{F56355707}
==== Pre-formating
* Some files have several words on a row (swadesh), which should be split using the known separator
* Some files formats includes prefixes `# word` or suffixes `word 38890550`
* The concerned open source projects could be pre-processed, normalized, uploaded to Github (or alternative), then serving the raw files as github pages. This would remove the first two concerns.
==== Bots for Commons
The following bots are usable :
- [[ https://github.com/hugolpz/Dragons_Bot | Dragons_Bot ]]
- [[ https://github.com/hugolpz/WikiapiJS-Eggs | WikiapiJS-Eggs ]]