Maniphest T193645

[Epic] querying for lexicographical data
Closed, ResolvedPublic
Actions

Description

We want to be able to query lexicographical data on query.wikidata.org to find out all the things.

Open questions:

Do we add Lexemes to the main Wikidata dump or make a separate dump (or both)?
- We may need separate dump at least for initial data load.

Open TODOs (that are not in subtasks)

~~https://www.wikidata.org/wiki/Special:EntityData/L42.ttl produces "RDF export is disabled for this type of entity: lexeme."~~
~~Form statements are still not included in the RDF output.~~
~~WDQS Updater and Munge needs to be support lexemes & forms.~~

Related Objects
Search...

Status	Assigned	Task
Resolved	None	T193645 [Epic] querying for lexicographical data
Resolved	Smalyshev	T160259 [Story] RDF for Lexemes, Forms and Senses
Resolved	Ladsgroup	T157791 Build basic LexemeRdfBuilder
Resolved	Lydia_Pintscher	T160260 [Task] Spec for Lexeme Rdf mapping
Resolved	Smalyshev	T195043 [Task] Implement RDF serialization for lexemes and forms
Resolved	Smalyshev	T201885 Lexeme RDF export has labels repeated several times
Resolved	Tpt	T200901 [Task] Implement RDF serialization for senses
Resolved	Lucas_Werkmeister_WMDE	T201153 re-enable RDF export for Lexemes
Resolved	Lucas_Werkmeister_WMDE	T201841 Remove feature to disable RDF export per entity type
Resolved	hoo	T202452 dumpRdf is unable to dump lexemes (or any extension-defined type)
Resolved	Smalyshev	T202459 Implement Lexeme data model for WDQS
Resolved	Smalyshev	T202830 Separate dumps for Lexemes

Event Timeline

Lydia_Pintscher created this task.May 2 2018, 2:40 PM

Restricted Application added projects: Wikidata, Discovery-ARCHIVED. · View Herald TranscriptMay 2 2018, 2:40 PM

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Lydia_Pintscher added a project: Epic.May 2 2018, 2:40 PM

Lydia_Pintscher added a subtask: T160259: [Story] RDF for Lexemes, Forms and Senses.

Lucas_Werkmeister_WMDE subscribed.May 2 2018, 2:47 PM

Lydia_Pintscher triaged this task as High priority.May 27 2018, 3:27 PM

Lydia_Pintscher mentioned this in T197145: Create special pages for lexemes.Jun 30 2018, 5:34 PM

• Vvjjkkii renamed this task from [Epic] querying for lexicographical data to asdaaaaaaa.Jul 1 2018, 1:12 AM

• Vvjjkkii added projects: CheckUser, Connected-Open-Heritage-Batch-uploads (RAÄ-KMB_1_2017-02), Tamil-Sites, Gamepress, Hashtags, Jade, KartoEditor, Language-2018-Apr-June, New-Editor-Experiences, Mail, TCB-Team (now WMDE-TechWish).

• Vvjjkkii updated the task description. (Show Details)

• Vvjjkkii removed a subscriber: Aklapper.

CommunityTechBot renamed this task from asdaaaaaaa to [Epic] querying for lexicographical data.Jul 1 2018, 9:38 PM

CommunityTechBot updated the task description. (Show Details)

CommunityTechBot removed projects: TCB-Team (now WMDE-TechWish), Mail, New-Editor-Experiences, Language-2018-Apr-June, KartoEditor, Jade, Hashtags, Gamepress, Tamil-Sites, Connected-Open-Heritage-Batch-uploads (RAÄ-KMB_1_2017-02), CheckUser.

CommunityTechBot added a subscriber: Aklapper.

Smalyshev subscribed.Jul 30 2018, 11:55 PM

Lydia_Pintscher updated the task description. (Show Details)Aug 2 2018, 5:16 PM

Smalyshev updated the task description. (Show Details)Aug 2 2018, 6:31 PM

Lea_Lacroix_WMDE subscribed.Aug 5 2018, 9:32 AM

Lucas_Werkmeister_WMDE closed subtask T201153: re-enable RDF export for Lexemes as Resolved.Aug 13 2018, 2:42 PM

Smalyshev added a subtask: T202452: dumpRdf is unable to dump lexemes (or any extension-defined type).Aug 21 2018, 8:30 PM

Smalyshev added a subtask: T201885: Lexeme RDF export has labels repeated several times.Aug 21 2018, 9:16 PM

I think lex data dumps should be available independently of the other Wikidata data. For example, https://sklonenie-slov.ru/ shows all Russian noun declensions (30,000+), and I think such sites can greatly benefit from the community work.
P.S. I have began a discussion with the site authors, trying to get them to donate their database to Wikidata.

@Yurik Yes, we're going in that direction, also because having items+Lexemes in one dump would be waaay too big :)

Thanks for your work! If you need any support in your discussion with this organization, feel free to contact my colleague @johl (jens.ohlig@wikimedia.de) who's expert in partnerships and data donations.

Lydia_Pintscher removed a subtask: T201885: Lexeme RDF export has labels repeated several times.Aug 26 2018, 11:11 AM

Lydia_Pintscher moved this task from features/bugs for later releases to features/bugs for next release (querying and searching) on the Wikidata Lexicographical data board.Sep 1 2018, 6:48 PM

Lydia_Pintscher closed subtask T202452: dumpRdf is unable to dump lexemes (or any extension-defined type) as Resolved.Sep 16 2018, 10:33 AM

Smalyshev closed subtask T202459: Implement Lexeme data model for WDQS as Resolved.Sep 20 2018, 11:38 PM

Right now full lexeme dump is just 2.1M compressed, so adding it to main dump would not be a big deal for dump size. However, absent the separate dump, you'd have to always download the huge one, of course. Which makes me still support the separate dump route.

Smalyshev closed subtask T202830: Separate dumps for Lexemes as Resolved.Oct 14 2018, 6:49 PM

Smalyshev updated the task description. (Show Details)Oct 15 2018, 6:02 PM

Lydia_Pintscher closed this task as Resolved.Nov 2 2018, 3:08 PM

Smalyshev closed subtask T160259: [Story] RDF for Lexemes, Forms and Senses as Resolved.Feb 5 2019, 8:57 PM

[Epic] querying for lexicographical dataClosed, ResolvedPublicActions

Description

Related ObjectsSearch...

Event Timeline

[Epic] querying for lexicographical data
Closed, ResolvedPublic
Actions

Related Objects
Search...