Store translations of frequent section titles
Open, MediumPublic
Actions

Assigned To

None

Authored By

	Amire80
	Jul 11 2015, 10:26 AM

Description

There are section titles that appear in a great number of Wikipedia articles: "Biography", "Early life", "Bibliography", "External links", "References", "History", "Legacy", etc.

When machine translation is not available (and even if it is) these section titles can be auto-translated by the CX software. The translations can be stored in the usual i18n JSON files.

This should save translator a few seconds on each of these sections.

(This feature was suggested by User:Chimel31 at https://meta.wikimedia.org/wiki/Research:Increasing_article_coverage .)

Related Objects

Mentioned In: T111869: CX various problems

Event Timeline

Amire80 created this task.Jul 11 2015, 10:26 AM

Amire80 raised the priority of this task from to Medium.

Amire80 updated the task description. (Show Details)

Amire80 added projects: ContentTranslation, ContentTranslation-Release6, I18n.

Amire80 subscribed.

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJul 11 2015, 10:26 AM

Amire80 updated the task description. (Show Details)Jul 11 2015, 10:28 AM

Amire80 set Security to None.

Amire80 moved this task from Needs Triage to CX6 on the ContentTranslation board.Jul 11 2015, 10:32 AM

Amire80 moved this task from Backlog to Bugs on the ContentTranslation-Release6 board.

This is also useful for Wikivoyage.

However, I don't think it's a good idea to hardcode a list of frequent section titles.

What do you mean by "hardcode"?

The way I see it, the headings will be added by default if any text is written in following paragraph, and the translator will be able to edit them just like any other text.

How can we get the list of frequent section titles?

For a precise number we could analyze dumps, although it would probably take a very long time given that they are so huge in some languages.

Or maybe @ssastry and @cscott have a way to query Parsoid data quickly?

Of course, we could start from some intuitive ones: "Biography", "History", "Early life", "Personal life", "Awards", "Bibliography", "External links", "References", "History", "Legacy", "Death", "Geography", "In popular culture", etc.

Is this the sort of thing you're looking for? Takes few minutes to generate with something like find /public/dumps/public -path '*/201506*/*pages-articles.xml.bz2' -print0 | xargs -0 -n 1 -I '{}' jsub -cwd LC_COLLATE=C bzgrep -HE '^==' {} | sort | uniq -c | sort -nr | head -1000 >> frequent-headers.txt -> F191888.

frequent-headers-sorted.txt525 KBDownload

Pginer-WMF moved this task from CX6 to CX7 on the ContentTranslation board.Jul 21 2015, 4:43 PM

Pginer-WMF removed a project: ContentTranslation-Release6.Jul 21 2015, 4:50 PM

Amire80 mentioned this in T111869: CX various problems.Sep 9 2015, 11:31 AM

Amire80 moved this task from CX7 to CX8 candidates on the ContentTranslation board.Oct 19 2015, 9:07 AM

Amire80 added a project: OKR-Work.Oct 23 2015, 8:12 AM

Nemo_bis unsubscribed.Jan 18 2016, 2:29 PM

Amire80 moved this task from CX8 candidates to CX9 candidates on the ContentTranslation board.Jan 20 2016, 10:42 PM

Amire80 moved this task from CX9 candidates to Bugs on the ContentTranslation board.Apr 20 2016, 12:40 PM

Arrbee moved this task from Bugs to Enhancements on the ContentTranslation board.Jun 22 2018, 1:29 PM

As part of the upcoming work on Section translation, we may have access to section mapping data which may enable to translate sections in this way.

Pginer-WMF added a project: Language-Team (Language-2020-January-March).Dec 12 2019, 11:41 AM

	F191888: frequent-headers.txt.7z
	Jul 13 2015, 11:15 PM

Store translations of frequent section titlesOpen, MediumPublicActions

Description

Related Objects

Event Timeline

Store translations of frequent section titles
Open, MediumPublic
Actions