Phase 1: Represent Wiktionary lexicon using structured data
Open, LowPublic

Description

https://www.wikidata.org/wiki/Wikidata:Wiktionary/Development/Proposals

Phase 1: Lexicons

This phase requires parsing the existing pages for some basic information. There appears to be clear consensus on the desired data model, but extracting the data will take some work.

Existing Wiktionary structure has language as the next level of hierarchy under representation, with lexeme being determined by a split on etymology and then lexical category. For the most part, form and grammatical category are only differentiated by sense or conjugation tables.

Data model
  • lexeme (L)
    • language
    • lexical category
    • form (F)
      • grammatical category
      • representation (R)
        • script

Details

Commits
Unknown Object (Diffusion Commit)
Unknown Object (Diffusion Commit)
Unknown Object (Diffusion Commit)
Unknown Object (Diffusion Commit)
Unknown Object (Diffusion Commit)
Unknown Object (Diffusion Commit)
Unknown Object (Diffusion Commit)
Unknown Object (Diffusion Commit)
Unknown Object (Diffusion Commit)
Unknown Object (Diffusion Commit)
Unknown Object (Diffusion Commit)
Unknown Object (Diffusion Commit)
Unknown Object (Diffusion Commit)
Unknown Object (Diffusion Commit)
Unknown Object (Diffusion Commit)
Unknown Object (Diffusion Commit)
Unknown Object (Diffusion Commit)
Unknown Object (Diffusion Commit)
Unknown Object (Diffusion Commit)
Unknown Object (Diffusion Commit)
Unknown Object (Diffusion Commit)
Unknown Object (Diffusion Commit)
Unknown Object (Diffusion Commit)
Unknown Object (Diffusion Commit)
Unknown Object (Diffusion Commit)
GPHemsley updated the task description. (Show Details)
GPHemsley raised the priority of this task from to Needs Triage.
GPHemsley added a project: SDC General.
GPHemsley changed Security from none to None.
GPHemsley added a subscriber: GPHemsley.
Gilles triaged this task as Normal priority.Nov 24 2014, 1:43 PM
Gilles added a subscriber: Gilles.
Lydia_Pintscher lowered the priority of this task from Normal to Low.Nov 27 2014, 10:40 AM
Lydia_Pintscher added a subscriber: Lydia_Pintscher.

Before we settle on all this I'd like to have more discussions please. Wiktionary is huge and I don't feel we've spent enough time thinking it through yet to commit to one way of doing it.

Lydia_Pintscher renamed this task from Implement Phase 1: Lexicons to use structured data to represent the lexical data in Wiktionary.Nov 27 2014, 10:42 AM
GPHemsley updated the task description. (Show Details)Dec 1 2014, 3:05 AM
GPHemsley added a comment.EditedDec 1 2014, 3:14 AM

Before we settle on all this I'd like to have more discussions please. Wiktionary is huge and I don't feel we've spent enough time thinking it through yet to commit to one way of doing it.

While I agree that Wiktionary has a lot of users, and the number of people involved in these discussions has been small relative to the Wiktionary userbase, we've actually spent a great deal of time discussing numerous proposals:

https://www.wikidata.org/wiki/Wikidata:Wiktionary/Development/Proposals

Those of us who were involved in the discussions have reached consensus on certain portions of my proposal (the latest, dated 2014-10), and it is along those lines which I have broken up the phases.

As such, as a side note, renaming this task the way you have obscures those distinctions.

Lydia_Pintscher removed a subscriber: Unknown Object (MLST).
epriestley added a commit: Unknown Object (Diffusion Commit).Mar 4 2015, 8:14 AM
epriestley added a commit: Unknown Object (Diffusion Commit).
epriestley added a commit: Unknown Object (Diffusion Commit).
epriestley added a commit: Unknown Object (Diffusion Commit).
epriestley added a commit: Unknown Object (Diffusion Commit).Mar 4 2015, 8:21 AM
epriestley added a commit: Unknown Object (Diffusion Commit).
epriestley added a commit: Unknown Object (Diffusion Commit).
epriestley added a commit: Unknown Object (Diffusion Commit).
epriestley added a commit: Unknown Object (Diffusion Commit).
epriestley added a commit: Unknown Object (Diffusion Commit).
epriestley added a commit: Unknown Object (Diffusion Commit).
epriestley added a commit: Unknown Object (Diffusion Commit).
epriestley added a commit: Unknown Object (Diffusion Commit).
epriestley added a commit: Unknown Object (Diffusion Commit).
epriestley added a commit: Unknown Object (Diffusion Commit).
epriestley added a commit: Unknown Object (Diffusion Commit).
epriestley added a commit: Unknown Object (Diffusion Commit).
epriestley added a commit: Unknown Object (Diffusion Commit).
epriestley added a commit: Unknown Object (Diffusion Commit).
epriestley added a commit: Unknown Object (Diffusion Commit).Mar 4 2015, 8:23 AM
epriestley added a commit: Unknown Object (Diffusion Commit).
epriestley added a commit: Unknown Object (Diffusion Commit).
epriestley added a commit: Unknown Object (Diffusion Commit).
epriestley added a commit: Unknown Object (Diffusion Commit).
epriestley added a commit: Unknown Object (Diffusion Commit).
GPHemsley renamed this task from use structured data to represent the lexical data in Wiktionary to Phase 1: Represent Wiktionary lexicon using structured data.Mar 8 2015, 3:21 PM
jberkel added a comment.EditedMar 31 2015, 9:26 PM

Is there an easy way to set up one (or even several) WikiData / Wiktionary integration sandboxes where interested parties could just try out things and experiment? I think prototyping an integration with a small subset of the data could be very beneficial. It would allow us to get some quick feedback on which kind of ideas could work (and where the problem areas are). Planning everything upfront is almost impossible, given the ambition of this project.

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptSep 10 2015, 4:28 PM
Gilles removed a subscriber: Gilles.Dec 15 2015, 8:23 PM
Noe added a subscriber: Noe.Jul 22 2016, 12:16 AM
JAnD added a subscriber: JAnD.Sep 14 2016, 9:13 AM
daniel added a subscriber: daniel.Oct 1 2016, 11:57 AM

@Lydia_Pintscher I can't find the ticket we made for the baseline implementation of the Lexeme entity type. Shouldn't it be a subtask of this ticket here?

Izno removed a subscriber: Izno.Oct 1 2016, 12:43 PM
He7d3r added a subscriber: He7d3r.Feb 6 2018, 3:27 PM