ListImporter Gadget: import UNILEX lists when available.
Open, HighPublicFeature
Actions

Assigned To

Authored By

	Yug
	Feb 25 2021, 9:21 AM

Description

Given iso369-3 code :

get content from relevant raw github file (warning: github could prevent JS query ?)
slice by 5000 items (via JS or before on github ?, see )
create lists up to 20,000 (if exist) :
- create List:{iso3}/words-by-frequency-00001-to-05000 : append relevant items
- create List:{iso3}/words-by-frequency-05001-to-10000 : append relevant items
- create List:{iso3}/words-by-frequency-10001-to-15000 : append relevant items
- create List:{iso3}/words-by-frequency-15001-to-20000 : append relevant items
create list_talks up to 20,000 (if exist) :
- create List_talk:{iso3}/words-by-frequency-00001-to-05000 : append {UNILEX License}
- create List_talk:{iso3}/words-by-frequency-05001-to-10000 : append {UNILEX License}
- create List_talk:{iso3}/words-by-frequency-10001-to-15000 : append {UNILEX License}
- create List_talk:{iso3}/words-by-frequency-15001-to-20000 : append {UNILEX License}

Server side split

split -d -l 5000  --additional-suffix=".txt" ./clean/${iso}-all.txt ./clean/${iso}-words-by-frequency-

Iso names

The largest languages use iso2. May need renaming on github.

Other commands

Help:How_to_create_a_frequency_list ?

Event Timeline

Yug created this task.Feb 25 2021, 9:21 AM

Yug renamed this task from LanguagesImporter Gadget: import UNILEX lists when available. to LanguaImporter Gadget: import UNILEX lists when available. .Feb 25 2021, 12:27 PM

Yug changed the subtype of this task from "Task" to "Feature Request".

Yug updated the task description. (Show Details)Feb 25 2021, 12:39 PM

Yug updated the task description. (Show Details)Feb 25 2021, 12:41 PM

@Yug, could you elaborate a bit more? From the title I understand that you would like the LanguaImporter gadget be able to import such list? If so, I disagree I think it should be done by another gadget or be improted by hand and/or bot. We should not add more features to LanguaImporter other than creating an item for a language. So could you retitle?

My idea was to both create the language Qid and add a referencial lists on the go.
Could be a separate gadget, true, to separate concerns.
The two should be closely co-occuring tho.

Yug renamed this task from LanguaImporter Gadget: import UNILEX lists when available. to ListImporter Gadget: import UNILEX lists when available. .Feb 25 2021, 9:39 PM

Note: I developed a bot to import all unilex to such format. Bot ran for first 500 languages, as a test. Need to run for the next 500 languages.

Yug claimed this task.Jul 6 2022, 1:21 PM

Yug triaged this task as Unbreak Now! priority.

I don't think this is "unbreak now". See https://www.mediawiki.org/wiki/Phabricator/Project_management#Priority_levels for information about setting task priority.

ListImporter Gadget: import UNILEX lists when available. Open, HighPublicFeatureActions

Description

Event Timeline

ListImporter Gadget: import UNILEX lists when available.
Open, HighPublicFeature
Actions