Page MenuHomePhabricator

Phase 1: Represent Wiktionary lexicon using structured data
Closed, ResolvedPublic

Description

https://www.wikidata.org/wiki/Wikidata:Wiktionary/Development/Proposals

Phase 1: Lexicons

This phase requires parsing the existing pages for some basic information. There appears to be clear consensus on the desired data model, but extracting the data will take some work.

Existing Wiktionary structure has language as the next level of hierarchy under representation, with lexeme being determined by a split on etymology and then lexical category. For the most part, form and grammatical category are only differentiated by sense or conjugation tables.

Data model
  • lexeme (L)
    • language
    • lexical category
    • form (F)
      • grammatical category
      • representation (R)
        • script

Revisions and Commits

Related Objects

View Standalone Graph
This task is connected to more than 200 other tasks. Only direct parents and subtasks are shown here. Use View Standalone Graph to show more of the graph.

Event Timeline

GPHemsley raised the priority of this task from to Needs Triage.
GPHemsley updated the task description. (Show Details)
GPHemsley added a project: SDC General.
GPHemsley changed Security from none to None.
GPHemsley subscribed.
Gilles triaged this task as Medium priority.Nov 24 2014, 1:43 PM
Gilles subscribed.
Lydia_Pintscher lowered the priority of this task from Medium to Low.Nov 27 2014, 10:40 AM
Lydia_Pintscher subscribed.

Before we settle on all this I'd like to have more discussions please. Wiktionary is huge and I don't feel we've spent enough time thinking it through yet to commit to one way of doing it.

Lydia_Pintscher renamed this task from Implement Phase 1: Lexicons to use structured data to represent the lexical data in Wiktionary.Nov 27 2014, 10:42 AM

Before we settle on all this I'd like to have more discussions please. Wiktionary is huge and I don't feel we've spent enough time thinking it through yet to commit to one way of doing it.

While I agree that Wiktionary has a lot of users, and the number of people involved in these discussions has been small relative to the Wiktionary userbase, we've actually spent a great deal of time discussing numerous proposals:

https://www.wikidata.org/wiki/Wikidata:Wiktionary/Development/Proposals

Those of us who were involved in the discussions have reached consensus on certain portions of my proposal (the latest, dated 2014-10), and it is along those lines which I have broken up the phases.

As such, as a side note, renaming this task the way you have obscures those distinctions.

Lydia_Pintscher removed a subscriber: Unknown Object (MLST).
epriestley added a commit: Unknown Object (Diffusion Commit).Mar 4 2015, 8:14 AM
epriestley added a commit: Unknown Object (Diffusion Commit).
epriestley added a commit: Unknown Object (Diffusion Commit).
epriestley added a commit: Unknown Object (Diffusion Commit).
epriestley added a commit: Unknown Object (Diffusion Commit).Mar 4 2015, 8:21 AM
epriestley added a commit: Unknown Object (Diffusion Commit).
epriestley added a commit: Unknown Object (Diffusion Commit).
epriestley added a commit: Unknown Object (Diffusion Commit).
epriestley added a commit: Unknown Object (Diffusion Commit).
epriestley added a commit: Unknown Object (Diffusion Commit).
epriestley added a commit: Unknown Object (Diffusion Commit).
epriestley added a commit: Unknown Object (Diffusion Commit).
epriestley added a commit: Unknown Object (Diffusion Commit).
epriestley added a commit: Unknown Object (Diffusion Commit).
epriestley added a commit: Unknown Object (Diffusion Commit).
epriestley added a commit: Unknown Object (Diffusion Commit).
epriestley added a commit: Unknown Object (Diffusion Commit).
epriestley added a commit: Unknown Object (Diffusion Commit).
epriestley added a commit: Unknown Object (Diffusion Commit).
epriestley added a commit: Unknown Object (Diffusion Commit).Mar 4 2015, 8:23 AM
epriestley added a commit: Unknown Object (Diffusion Commit).
epriestley added a commit: Unknown Object (Diffusion Commit).
epriestley added a commit: Unknown Object (Diffusion Commit).
epriestley added a commit: Unknown Object (Diffusion Commit).
epriestley added a commit: Unknown Object (Diffusion Commit).
GPHemsley renamed this task from use structured data to represent the lexical data in Wiktionary to Phase 1: Represent Wiktionary lexicon using structured data.Mar 8 2015, 3:21 PM

Is there an easy way to set up one (or even several) WikiData / Wiktionary integration sandboxes where interested parties could just try out things and experiment? I think prototyping an integration with a small subset of the data could be very beneficial. It would allow us to get some quick feedback on which kind of ideas could work (and where the problem areas are). Planning everything upfront is almost impossible, given the ambition of this project.

@Lydia_Pintscher I can't find the ticket we made for the baseline implementation of the Lexeme entity type. Shouldn't it be a subtask of this ticket here?