Page MenuHomePhabricator

Lexicography: Create pages for words in Lingua Libre
Open, Needs TriagePublic

Description

This is a proposal for both how to group pronunciation together onto a single page including declension, conjugations, alternative spellings, pronouns, etc. In essence, I believe that we should have unified templates that can be used on both Wiktionary or Lingua Libre.

Web URL
interface_language.lingualibre.org/wiki/word_language/word

interface_language = the language that the Lingua Libre will be displayed in. It probably makes the most sense to match the codes for other wikiprojects, e.g. “fr” or “en”
word_language = the language that the word is in using Iso-3 codes, e.g. fra;

For example,
en.lingualibre.org/wiki/fra/chien – will display all pronunciations for the French word chien in English.
fr.lingualibre.org/wiki/fra/chien – will display all pronunciations for the French word chien in French

Rationale: We should try to make the web urls as similar to other Wiki projects to make it easier to navigate more. Unlike, Wiktionary, I do not think that we should group multiple language on one page. It will make it too hard to navigate.

Pages for words
In essence, we should group pages by headword and then display all the possible forms. We should base the templates for these pages on those from Wiktionary. Instead of writing new templates for displaying words, we should take the existing ones and modify them to incorporate audio. This will make it easy to reimport them into Wiktionary. I think that there will be the following two major subtasks

Subtask 1: Create a standard template for every language

  1. Import all templates for entries from Wiktionary in every language
  2. Discuss the pro and cons of each one
  3. Discuss how to integrate sound into them
  4. Create standard templates
  5. Write a parser to automatically transform all templates into this standard one

For instance, French Wiktionary use the following to tag a noun as male {{S|nom|fr}} and {{m}}; while English Wiktionary uses {{fr-noun|m}}. We need to reconcile these.

  1. Import both templates into Lingua Libre (And all others from across the globe)
  2. Make a sample page and display all the possible variations
  3. Discuss each one
  4. Create and implement a standard template. Make these template multi lingual.
  5. Write a parser to automatically transform all templates into this standard one

Personally, I prefer table style like French Wikipedia uses for adjectives or German Wikipedia uses. For verbs, I prefer https://fr.wiktionary.org/wiki/Annexe:Conjugaison_en_fran%C3%A7ais/croire

Each cell in the table can have the following format ▶ {{word}) ▽ where ▶ will play the default audio and ▽ will expand the cell to show other pronunciations grouped together by accent/region.

Subtask 2: Import from Wiktionary and Automatically Fill Word Pages

  1. Import dumps of Wiktionary for each language
  2. For each page, separate them by language
  3. Remove all information except for templates and pronunciation information
  4. Use parser to convert templates into the standard template
  5. Attempt to automatically merge and flag errors

Sub Task 3
Make sure that the page has all the possible forms for pronunciations. Allow for the selection by default.

I think that Lingua Libre should only contain pronunciation (IPA/Pinyin/Tokyo/etc.) and sound files

Sub Task 4
We should write a bot that will enable the importation of templates from Lingua Libre to Wiktionaries. In the future, pages on Wikitionary will automatically fetch pronunciation and sounds from Lingua Libre in the same way that images are fetched from Common now.

Thank you to Rugops, Yugs, and Pamputt for ideas and inspiration.

Related Objects

Event Timeline

Lepticed7 renamed this task from Create pages for words in Linga Libre to Create pages for words in Lingua Libre.Sep 14 2021, 7:21 PM
Lepticed7 updated the task description. (Show Details)
Yug renamed this task from Create pages for words in Lingua Libre to Lexicography: Create pages for words in Lingua Libre.Jul 7 2022, 11:30 AM