Page MenuHomePhabricator

Create an easy to maintain glossary to facilitate documentation translation (help pages and technical documentation)
Closed, ResolvedPublic

Description

Involve translators need to provide them a easy way to find how to translate a particular term. Based on a discussion with @Johan, we should consider to have discussed about that: have a global glossary would be helpful.

That kind of tool needs to be think carefully on several points:

  • updates: have a glossary needs updates:
    • when there is a new product/feature/idea/practice. that kind of inclusion must be automated. A new product is translated on TranslateWiki and we have to get these new translated terms to create and maintain the glossary. Concerning Movement's philosophy, there is plenty of pages on wikis that can be linked from Wikidata.
    • when there is an updated term, it must also be automatically picked from somewhere and updated on the page (transclusion?)
  • search: endless pages with thousand of terms are hard to read. It is worse when there is dozen of languages. That kind of page must be easily searchable.
  • internationalization: some languages don't have active translators. We should encourage people who want to start on a language to add items on the glossary first to have a global resource which will allow other people to get involved.

See also:

Existing glossaries:

Last update

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
Qgil added a project: I18n.
Qgil added subscribers: Arrbee, Qgil.

It is worth sharing the intentions of this task with the Language team. They might have ideas that could save us some work. CCing @Arrbee just in case.

Trizek-WMF triaged this task as High priority.
Trizek-WMF unsubscribed.

From the description it is not clear whether this glossary is text that anyone can edit in wiki page or an actual software feature. If it is the latter, then things get even more complicated (who will develop it?) and the description needs more explanation.

This comment was removed by Trizek-WMF.

Actually, that's not clear. I've think about it, and that's more complicated than I expected. I need to check what features may be useful and dig a little bit more.

At the moment, roughly, that glossary would look like that for one defined term:

TermTerm translation
DefinitionTranslated definition

"Term" and "definition" should be locked, because they are source.
Sometimes, "Term translation" shouldn't be editable, for example when that's a title or an action already translated from Translate Wiki.

Also, it would be interesting to have filters, in order to only see terms related to a specific feature. That will need, I think, a specific development.

One problem of this proposal is that it is mixing translatable strings in software (the scope of translatewiki.net) with translatable documentation (content translated through Translate extension, not connected to translatewiki.net, but capable of integrate automated translation engines).

What is the situation of Content Translator in multilingual wikis? That might be a more interesting direction for the problem that we are trying to solve here.

If this is the correct direction, those glossaries should be fed to the translation engines, I guess.

One problem of this proposal is that it is mixing translatable strings in software (the scope of translatewiki.net) with translatable documentation (content translated through Translate extension, not connected to translatewiki.net, but capable of integrate automated translation engines).

In fact, there is no real connection between TranslateWiki and our documentation. Having two places is complicated sometimes because you need to understand what is the purpose of a particular message (on TW) in a particular context (you will have that context on MW). I feel like reinventing a new Wiktionary with multiple inclusions...

What is the situation of Content Translator in multilingual wikis? That might be a more interesting direction for the problem that we are trying to solve here.

You mean Content Translation extension? I'm afraid that would not be enough, because that extension don't deal with updates.

If this is the correct direction, those glossaries should be fed to the translation engines, I guess.

I hope so, that would be very interesting and helpful. Two questions then: how to structure that glossary to have useful and easy to use data, and is it still on our scope if that need software development?

One problem of this proposal is that it is mixing translatable strings in software (the scope of translatewiki.net) with translatable documentation (content translated through Translate extension, not connected to translatewiki.net, but capable of integrate automated translation engines).

In fact, there is no real connection between TranslateWiki and our documentation. Having two places is complicated sometimes because you need to understand what is the purpose of a particular message (on TW) in a particular context (you will have that context on MW). I feel like reinventing a new Wiktionary with multiple inclusions...

What is the situation of Content Translator in multilingual wikis? That might be a more interesting direction for the problem that we are trying to solve here.

You mean Content Translation extension? I'm afraid that would not be enough, because that extension don't deal with updates.

If this is the correct direction, those glossaries should be fed to the translation engines, I guess.

I hope so, that would be very interesting and helpful. Two questions then: how to structure that glossary to have useful and easy to use data, and is it still on our scope if that need software development?

There's an ongoing discussion on wikitech-l regarding translation tags that touches upon the differences between the extension and Content Translation and the Translate extension and why they work differently.

I mentioned TranslateWiki because it is mentioned in the description of this task. If it is clear that it is not related to the problem of user documentation, then good.

It is also probably better to leave Content Translation out of the equation, for several reasons.

So we are left with the Translate extension, the possibility to add translation engines to it in order to make the work of translators simpler, and the question mark about how to feed or complement these engines with our glossaries.

On the latter, I wonder whether the Collaborative spelling dictionary project (extension page) could be part of the solution. Could it be used to create new dictionaries and improve existing ones?

Some more thoughts: could you provide more information about the problem to be solved?

As an occasional volunteer translator of documentation pages to Catalan, I know I would welcome an automated translation engine from English to my language (which is available for Content Translation from en.wiki to ca.wiki, but not in Translate at mediawiki.org). The glossary part is more relevant in the context of translating strings in translatewiki.net, and there translators already receive suggestions based on the use of that string in their language and similar ones.

There is two possibilities: deal with or/and improve the Translate extension or have something new. Good fixes maybe to have a reliable translation suggester (not the half-working one that we have now on MediaWiki wiki) and a way to add real suggestions or context to that extension. That would need to improve that product and to have a way to pick context or definitions from somewhere. That will need some structured data that we don't have at the moment.

I didn't knew about Collaborative spelling dictionary project. That may be useful if I manage to understand what is it about. :/ (Side rant: like many other extensions, there is here a lack of examples.)

Examples of what could be solved with the glossary

Case 1:
You volunteer to translate Tech News. A product has some features which are improved, changing some actions. You are not supposed to know all products, and having a glossary which gather all functions or actions and they are used for may be helpful (example)

Case 2:
You want to translate messages on TranslateWiki to your language. Sentences are all out of context. Having a centralized glossary with a definition would help to understand that context.

For example, as someone who has worked most of my adult life in tech, I'm fairly well acquainted with technical concepts. Writing Tech News is fairly easy in that sense. Then I translate it into Swedish. Occasionally I have to spend five, ten minutes looking up what something is actually called in Swedish, because I'm unfamiliar with the term. This is when translating a text I wrote myself, where I know exactly what I mean, to my native language, either looking for a term in the field where I've spent years working or something specific to the projects I've been editing since 2004.

I think just a plain list of words and how to translate them to go to would be a very helpful first step if we want to encourage translations by newcomers. I would certainly occasionally find it useful myself.

Then my advice is that you focus on a manual glossary in a wiki page for now while you agree with the Language team how that could be integrated to the existing tools via software. This way current translators will have something to start with that is not blocked on anybody's development. This exercise would also show how big the need for this glossary, and how big the chances to succeed creating a collaborative effort t fix that need.

On the other hand, why not enable translation engines in mediawiki.org? That would save real time to translators, and would put in evidence which terms are already covered by existing translation libraries, and which ones are missing and would benefit from this glossary.

Okay about creating a first glossary. I'm just reticent on the idea of creating a bunch of non structured data and have difficulties to re-inject it after a new tool.

Concerning the translation engines, we should consider that possibility. That will not solve the problem for specific terms we use.

Trizek-WMF lowered the priority of this task from High to Medium.May 10 2016, 2:52 PM

You can write glossary as a table, for limited structure. Also making a little database and a web front end on tool labs shouldn't take too much time, should it?

It would also be possible to start with a dump from komputeko.

https://eo.wikibooks.org/wiki/Katalogo_de_Esperanta_retenhavo/Vortaroj_kaj_terminaroj may be of some interest for this thread. Also I think I already consulted some general glossary on Wikibooks but I can't find it again right now.

I don't understand that service. What is relevant here?

Discussing with @Noe (PhD in linguistics) about that glossary. He suggests:

  • List all functions and all terme that need to be defined.
    • example: what "SiteNotice" means, how can I translate it, is there an official translation?
    • example: what "implementing" means?
  • The glossary should list incorrect terms to facilitate search. For instance, "Topic" for Flow should list "thread", if someone is searching for thread, "topic" should pop-up as the right result.
  • descriptors should be used. Elements listed under a descriptor should be self-sufficient and be defined by each other (no external input) and can have terms only used on that glossary
  • at the moment, glossaries are descriptive: each extension has its own glossary. They will be normative in the future, with definitions shared by tools/doc/products. That fact can define next steps.

Plus Noé was searching about a central page to work on translations, which does not exists (yet T131581: Centralize information about translations within the movement).

When you have multiple terms defined by one definition and you are using definition lists, design can be odd to understand because you see piled bolded terms with no definition rather than three terms with a common definition. That should be fixed. (feedback)

Potpourri of my last thoughts and searches about that glossary.

Definition lists

Definition lists on HTML are implemented on MediaWiki.

A structure like

;Defined term
:Definition

is parsed as

<dl>
<dt>Defined term</dt>
<dd>Definition</dd>
</dl>

That's cool and accessible.

Synonyms

All documents say the same thing:

;Defined term
;Synonym at same level
:Definition

But there is no visual distinction between a preferred term and a synonym.
The best we can do is to add a note (example) but that's not the best option we can have. Plus, we don't know if it is accessible?

English Wikipedia manual of style advise to add an "also" line in the definition:

;Defined term
:also known as Synonym
:Definition

The also known would have a different styling that will feat to most of our readers, the "also known as" would be bonus for the accessibility. I think that's the best we can do.

Language specific synonyms

I think the synonyms translations should be open to translations. Translate some terms from English to an other language are not always accurate, plus some communities may have created a specific term for a specific feature. That's good because it shows that the product is adopted by communities.

The idea would be to have the translation of the synonyms remaining open. That way, people will not have to translate the English terms but will have a space where they can add the terms used in their language. This will systematically be documented on the qqq documentation language.

Template or not template?

That's the question.
That would help, but it may complicate the translations marking. Plus, templates are not always friendly with RTL & LTR. Based on that, the only template we can have will style the synonyms.

A glossary with context indication would be of far greater interest. One English entry can have several counterparts in other languages. It would be even better if we could link that to validated example, and "look for more occurrences on translatewiki".

I forgot to mention the last updates. I'm currently working on T150261: Publish best practices on MediaWiki.org concerning glossaries.

A glossary with context indication would be of far greater interest. One English entry can have several counterparts in other languages. It would be even better if we could link that to validated example, and "look for more occurrences on translatewiki".

How would you see that context?
What do you mean by "a validated example"?

Well, to begin we may use a module to store the data, this way the display would be far easier to evolve both in term of UX and storing structure.

By validated example, I mean an effective translation where the term appears and which have been reviewed by at least one other contributor.

Well, to begin we may use a module to store the data, this way the display would be far easier to evolve both in term of UX and storing structure.

Is that matching T150263: Create a searchable and shared glossary?

By validated example, I mean an effective translation where the term appears and which have been reviewed by at least one other contributor.

Interesting, but don't you think that will increase the workload?

Is that matching T150263: Create a searchable and shared glossary?

As far as I understand it, yes, it may fit.

Interesting, but don't you think that will increase the workload?

Well, a less time greedy solution might to propose to see the list of translations in translatewiki containing the term. That might include false positive, but it may be better than nothing.

Closing. All sub-tasks are resolved (especially T150261) and the next big step T150263: Create a searchable and shared glossary if out of my scope (plus I'm not a product manager).