Author: millosh
Description:
(I am writing it here because, AFAIK, it is necessary to make changes inside of Edit.php for the implementation of this idea. Also, some DB changes are needed, too. However, feel free to move it wherever you think that it should stay.)
Present situation of conversion engine, designed by Zhengzhu, may be described as:
- There is one form archived in DB.
- Contributors have to know both (or more than two) scripts if they are willing to edit pages.
- Scripts have to be generally 1:1 in substituting elements.
Such approach may work in classical examples, like Chinese or Serbian engines are. Every educated Serbian knows Cyrillic and Latin alphabets (Cyrillic is learned from the 1st year of the primary school, Latin is learned from the 2nd year of the primary school). AFAIK, it is not so hard for one Chinese to find a meaning of a character from a non-native script. Also, in both examples scripts corresponds almost 100% 1:1 (there are some exceptions, but it is not so hard to add them inside of the markup for exceptions: -{ ... }-). (There are maybe up to 10 of implementations of this principle all over the MediaWiki languages.)
However, there are a number of very different situations in the world. Some scripts differ from each other a lot and education issues may be significant. For example, while Tajik and Persian are structurally the same language systems, it is not so common to find a Tajik who is able to read Perso-Arabic script and Persian who is able to read Cyrillic script. Also, there are complex issues in relation to the "interpunctional behavior" of letters: there are somewhat different rules for usage of small and capital letters in Cyrillic (Latin, Greek) and Arabic scripts.
So, the goals of the generalized conversion engine for MediaWiki are:
- Allow to contributors to see and edit pages in their preferred script.
- Make an open set of rules which may be applied easily for different cases.
- Solve different kinds of "interpunctional behavior" problems in a generalized manner.
- Introduce a dictionary-based conversions. (This was initially introduced into the Serbian engine for Ekavian-Iyekavian paradigm, but it was abandoned because no work on that issue was done after the initial implementation.)
- A future goal, completely possible if this engine is implemented: Transform a conversion engine to a user-side feature. When script differences are great, for some user it is easier to try to read the content in the preferred script (for example, for one European it is easier to read Chinese transcribed to Latin).
I was thinking about some of the approaches to this issue and I may guarantee that there are better ones :) However, I'll list some of them:
- There should be fields in database for different versions of the article. Or, inside of one field it should be possible to separate different versions. Here is the example for the second idea:
- There are a lot of situation when forms are exceptional. A classic example is from the relation of Latin and Arabic scripts. Arabic script doesn't recognize capital letters (or they have different rules).
- So, if the sentence is beginning with "Llll" in Latin, which is transcribed as "aaaa" in Arabic, form in the database should be something like -{ Latin: Llll; Arabic: aaaa }-. However, such markup shouldn't be seen from the side of editor. Editor of Latin text should see just "Llll" and editor of Arabic text should see just "aaaa".
- In this case, if editor of Latin text changes it, general rules should be applied. If editor of Arabic text changes it, some specific rules should be applied (like: if previous word has dot at the end, the letter should be capital, if not, the letter should stay small in Latin). But, if it is not correct in Latin (for example, the word is personal name and it is in the middle of the sentence), then when editor of Latin text is fixing the text, from "llll" (which corresponds to "aaaa") to "Llll" (which, also, corresponds to "aaaa") should be changed with -{ Latin: Llll; Arabic: aaaa }-.
- Of course, both editors should be able to go into the "meta mode", which would show to them all of the markup and allow them to make fine tuning.
- When everything is changed (major edit), some general and specific rules should be followed, but, also, it should be allowed to editors to fix errors.
The main issue why I am writing this as a bug is that I am not a PHP programmer (while I am able to program in PHP :) ), which means that I am not able to solve all of the complex programming issues needed for MediaWiki. However, as a [formal] linguist I am willing to participate actively in working on this issue. I am willing to cover all of the linguistic work needed for this (including finding relevant persons for problems related to different scripts).
Version: unspecified
Severity: enhancement