Page MenuHomePhabricator

Lists loader: integrate UNILEX data and license.md in code ?
Open, LowPublic

Description

I found out today that while under copyrights, the UNILEX data uses a licence.md which is a variation of the GNU license.

They maintain 999 lists by frequency.

curl 'https://raw.githubusercontent.com/lingua-libre/unilex/master/data/frequency/ig.txt' | tail -n +5 | sort -k 2,2 -n -r | cut -d$'\t' -f1 | sed -E 's/^/# /g' | head

Comment: remove first 5 lines, sort by 2nd column numerical value descendant, cut to keep first field, add a # to make a list, print only first 20 lines. Creates a Lingualibre compatible wordlist, shows the top 20 items.

Links:

Event Timeline

Yug updated the task description. (Show Details)
Yug updated the task description. (Show Details)
Yug updated the task description. (Show Details)
Yug updated the task description. (Show Details)
Yug triaged this task as Low priority.Jul 6 2022, 10:40 AM
Yug renamed this task from RecordWizard: Integrate UNILEX data and license.md in code ? to Lists loader: integrate UNILEX data and license.md in code ?.Jul 7 2022, 11:03 AM