kwami (talk) 08:24, 2 October 2010 (UTC) wrote:
I'm making corrections to 800 IPA transcriptions of Burmese. Some of these involve changing a diacritic. Many of the vowel+diacritic have no combined Unicode value, and so I made the rules a generic combining-diacritic → combining-diacritic swap. However, some of the combos do have preexisting glyphs, and when saved, WP converts the combos into those glyphs. AWB cannot save a combo that this happens to. Although it displays the proper corrections in the edit box, when I hit save, it just restarts (in X seconds).
I can copy the edit box and paste it into any of the articles manually (I've done about 70), and that works fine. That gets to be quite tedious, however.
I reported a similar problem some time ago, here, and that was chalked up to a server problem. However, it's been two days now w the Burmese stuff, and it's still happening. It's also always the same subset of articles.
Duplicate: One example replaces a combining under-breve ( ̯ ) with a combining under-tilde ( ̰ ). I made the manual change, cut & pasted from the AWB edit window, here. Note that although I copied and pasted from the edit window in AWB, where the diacritics were separate combining glyphs, and that's still the case in the saved version in the page history with a̰, the page history has a precomposed ṵ, which was not produced by AWB. I reverted that change, ran it again, and deleted the under-tildas, and it saved fine here. It also saves fine if I delete only the tilde under the u, http://en.wikipedia.org/w/index.php?title=Thukha&action=historysubmit&diff=388246382&oldid=388246257 but not if I delete only the tilde under the a.
That particular rule is regex under advanced rules, \{\{IPA-my\|([^|}]*)̯ to {{IPA-my|$1̰ (regex, case sensitive, apply 3 times, inside templates, no 'if' conditions). However, the same thing happens with the same diacritic in a 'regular settings' rule that changes ṵ (precomposed u-under.tilde) to ṵ (u + combining diacritic--I'm telling you in case the latter gets saved as the precomposed character when I hit 'save page' on this post, which I believe it will) under the 'normal settings' rules, and also the same thing with i instead of u. No boxes in the regular rules window are checked apart from 'enabled', but same problem as the regex rule.
A different diacritic I manually overrode was here. The problematic part was correcting taʊ̀ɴ to tàuɴ. (That accent probably will be fused to the a when I save this posting, but in the edit window it is a separate combining glyph which I can delete by hitting the backspace key.) It only saves if I delete the grave accent over the a (that is, save to tauɴ http://en.wikipedia.org/w/index.php?title=Three_Pagodas_Pass&diff=next&oldid=388241313). That's a 'normal settings' rule that finds aʊ̀ and replaces with àu. There is another rule than replaces unaccented ʊ with u in certain environments, and that doesn't cause problems, so it's not the ʊ → u part. Also, when I replaced the combining accent à (which is easier for me to type inside AWB) with precomposed à inside the replace rule, then the problem disappears, as here.
So I figure it's the combining diacritic. I replaced the problematic ṵ in first problem listed above with ʊ̰ (same diacritic on a letter which has no precombined Unicode character for it) and it saved just fine, here just as it did when I cut & pasted in the precomposed letter into the AWB edit box.http://en.wikipedia.org/w/index.php?title=Supayalat&diff=prev&oldid=388244621
So it would seem to be specifically (1) trying to save a page with a letter plus combining diacritic sequence, when that combination would normally be converted into a precomposed character when saved in WP, but not (2) when saving the precomposed character itself, or (3) when saving the same diacritic on a letter for which Unicode (or at least WP) does not have a precomposed version.
OS: Win7
.NET: 2.0.50727.4952
Version: 5.0.3.0
Workaround: Create separate rules for every precomposed letter-diacritic combination, or cut and paste from the edit window