Given iso369-3 code :
- get content from relevant raw github file (warning: github could prevent JS query ?)
- slice by 5000 items (via JS or before on github ?, see )
- create lists up to 20,000 (if exist) :
- create List:{iso3}/words-by-frequency-00001-to-05000 : append relevant items
- create List:{iso3}/words-by-frequency-05001-to-10000 : append relevant items
- create List:{iso3}/words-by-frequency-10001-to-15000 : append relevant items
- create List:{iso3}/words-by-frequency-15001-to-20000 : append relevant items
- create list_talks up to 20,000 (if exist) :
- create List_talk:{iso3}/words-by-frequency-00001-to-05000 : append {UNILEX License}
- create List_talk:{iso3}/words-by-frequency-05001-to-10000 : append {UNILEX License}
- create List_talk:{iso3}/words-by-frequency-10001-to-15000 : append {UNILEX License}
- create List_talk:{iso3}/words-by-frequency-15001-to-20000 : append {UNILEX License}
Server side split
split -d -l 5000 --additional-suffix=".txt" ./clean/${iso}-all.txt ./clean/${iso}-words-by-frequency-
Iso names
- The largest languages use iso2. May need renaming on github.
Other commands