Fri, Jan 10
Yes, using explicit HTML tags for list containers can be a solution too.... if it works (for now no, because, like you say, the wikisyntax opens a new embedded container)
The folllowing case stil ldoes not work as expected:
` * item 1 * item 2 <ul> * item 2.1 * item 2.2 </ul> * item 3 `
Styling ol, ol ol, ol ol ol, etc. does not work for the general case. It's much preferable to be able to give attributes (notably id and class) for list items and list containers (style can then be made on classes, id's can be used as anchors).
<div style="list-type-type: upper-roman"> # roman # numerals! </div>
Also note the special case for numbered lists; it is frequently needed to split a numbered list in several parts, or showing only an extract, or numbering them differently (e.g. with letters, or with roman digits). This cannot be specified on list items, only as attributes of the list container (and MediaWiki provides no syntax for the container of the list itself, only a syntax for individual list items, it infers the presence of containers (ul, ol, dl) from the sequential generation of individual items (li, dt, dd, encoded with the wiki syntax), and then it groups them "magically".
:: <<< any == wiki text == markup you like * including embedded lists <div style="clear: both; color: red">including html-ish tags</div> * list continued here >>>
Another possibility would be to define a magic keyword like __TIDY__ that could be used as a "meta-element" inside template to indicate that what is generated by the template will be tidied only in the scope of this template, which would then generate some replaceable hidden element.
The clear intent in this example is to insert a floatting element inside a list item, not outside of it.
Dec 17 2019
Even though I translated it in French as "Mio" (which the the approved value for mebibytes, i.e. binary), this was reverted to "Mo" which is decimal in French, then I was blocked for that !
Th reason being that "if MB" is used in English, use "Mo" in French, as they assume that "MB" unambiguously indicates decimal in English, which of course is not.
IEC units are standard in the UI of file explorers even if they do not always use the correct unit symbol, or use shorter symbols (like M instead of MB or MiB) for more compact presentations in table views (e.g. in MacOS) or on consoles (e.g. in Linux with "df -h", intended to be "human readable").
Nov 6 2019
Note that when closing T194125 (charset issue) this new task still does not address the backernd confiormance or emulation level we'd need to check (possibly with additional extensions to load in the backend SQL server). Each backend must still be able to pass a conformance test and fail, or enable an emulation layer to be used to allow compatibility or background migration of schemas (without causing a server upgrade to be imemdiately shutdown for hours... or days).
Oct 22 2019
@Nikerabbit I don't know why you affirm in T235188 that I was "spamming" people. As far as as know I post in a thread which is still relevant to an unsolved issue which continues to have effect and whoise origin is still not known. Also that previous bug is still open and people were instructed to signal every quirk they saw that may be related to mysterious caching effects and incorrect displays, and you filed this new bug T236011 only very recently.
Oct 21 2019
Oct 20 2019
The decimal form cannot be correct of it's not qualified with the address type (IPv4 or IPv6).
I don't know if this is related, but there's now a very strange behavior with lost sessions somewhere in the wiki.
Oct 15 2019
Note that despite of the (selective?) cache purge in Translatewiki.net, the incorrect a cached version (including "Le champ est obsolète. $1", but not only this text) is still recurring (Look at some of my edit history on TWN for comments with "T235188").
Oct 14 2019
Adding a space at end if the translation unit actually does not work always: yes it will be stripped but if the translation was incorrectly marked fuzzy, this does not work.
You need to add a character that is NOT stipped automatically, forces an update in the SQL database, even if you then remove it in a subsequent change.
And to make sure this effectively works, you need to force refreshing the page.
And this only works if you do that in the wikitext editor on a separate specific page for each message, it does not work in the Translation UI which really seems to have its own caching and to delay pending updates in Memcached. With the Translation UI, there's a strange effect caused by a subsequent automated edit made by Fuzzybot (at unpredictable time, and visibly this job is delayed and can rerun multiple times, by either canceling the pending change(s) and then doing nothing, or incorrectly changing the pending changes when readding the incorrect FUZZY mark to the old message).
There's something to search around "TranslationsUpdateJob::getRenderJobs( $page );", probably not selective enough in its query.
I think that this is caused by very old background jobs being relaunched, using outdated data from its own internal cache and resubmitting the incorrect message with its old incorrect FUZZY mark prepended, and sending that to Memcache, even if no SQL request is performed.
When checking if changes must be made to the database, there's some "break" that blocks the update, but still the update of the key in Memcache is not blocked, and use any message left in some temporary variable which was not cleared after use when processing other messages in a loop.
The bug is to be searched in this background job that continues to corrupt the Memcached data.
For now all that can be done is to force an SQL update (which will also update the Memcached data) by inserting some space (and then removing it), and also changes the version number and timestamp.
some old jobs in job archives are incorrectly reprocessed if this was about a resource previously containing the prepended FUZZY mark (even if that mark was buggy).
Oct 13 2019
Yes,, the bug is effectively reported also in Russian, Turkish, Arabic, i.e. the languages where there are more active reviewers. Initial translations where there are not very frequent users working on it are less affected.
And clearly there's a bug in the logic for marking updated messages "fuzzy", in Fuzzybot, with different assumptions and checks made by this bot, and those made by the wiki editor, or by the translate UI, each one using its distinct strategies to manage caches.
On servers there's not a single cache but several layers (internal and external), and they are not properly synchronized coherently. This causes havoc in the database and all other dependant services (such as data exports). The bug in Memcached is probably caused by insufficient keys to detect concurrent modifications of pages. And I think there are also various buffer overflow bugs or uncaught exceptions with incorrect/unsafe restart that are also not logged properly. The overall design of TWN architecture is very fragile and depends too much on unchecked assumptions, and it highly depends on external components updated independantly without measuring the impact of the change (notably the order of events because of the existing assumptions)
Oct 11 2019
Looks OK now, the last glitches (for the remaining fake "FUZZY" marks) can be successfully solved by dummy-editing (and then reverting instantly) the affected messages in the wiki editor, as indicated above.
Note: I could successfully update and fix some dirty reads, but not when using the Translate UI: instead I select each message and use the wikitext editor, add a dummy space in the middle of the correct string (just removing the leading !!FUZZY!! mark has no effect) then remove it. In which case both actions are now viwible in the history.
To make sure, I click on the "clock" in the top right to see if it makes a dirty read again.
Then I refresh the Translate UI page and this message is gone from the list of fuzzy messages, but on the "all messages" it is now correct.
This is definitely now a bug in the Translate UI interface itself, rather than the wikieditor, because it performs a batch of 100 dirty reads at once and seems to make unsafe async requests when updating anything. It seems to use its own internal cache (in its extension in PHP), independant of Memcached and the SQL backend, to decide what to do.
I can confirm that this still does not work: submitted modified data are not always visible in the history and when looking at them individually in the wikitext editor (no indication at all that there's a new version). Translations continue to be marked fuzzy (with the older content) when they were already fixed (or even changed by some dummy extra space or punctuation, then modified again to remove this dummy character: nothing appears in the history, the old incorrect value is still there).
I wonder if the logic used for computing storage keys into Memcached is correct, why it does not have any version number or version timestamp?
It seems to be a race issue with reordered requests that are overriding each other or canceling the effect of a prior store by using data from dirty reads.
In Translatewiki.net this may be the effect of concurrent modifications: those made by the actual editor, those made by a SemanticWiki bot, or by FuzzyBot, or which do not have consistant reads, and then rapidly overwrite what was just stored from their older view.
Is there something (garbage, truncated or unparsable XML, incorrect encoding, incorrect security token, deprecated security algorithm, incorrect data compresssion...) in my own user data (or in user data from other users) on the wiki that could cause this bug ?
Was the Memcached software updated with new dependencies not satistisfed in the current installation (e.g. a security fix or new requirement for some shared ZLib component), cauising its own local storage to be garbled (possible buffer overflow)?
That page still shows the incorrect message "Le champ est obsolète. $1" in a dozen of units:
Is there any chance that the Memcache server has incorrect clock setting (or NTP failing its updates) ?
I still see other occurences of "Le champ est obsolète. $1"
e.g. in "MediaWiki:Logeventslist-tag-log/fr"
I see these duplicates in normal accesses, not if I link to oldversion.
Another message duplicated:
"<strong>$1</strong> a été effacé."
There are other mixup with French messages containing
"Lien vers les conditions d’utilisation. Requis si ce partenaire exige que les utilisateurs acceptent les conditions d’utilisation pour recevoir l’accès ; sinon facultatif."
they are also duplicated.
This time I was not among the authors or reviewers
There's somthing strange in your Memcache query; "rev_user_text:" is empty for many ones. Aren't revisions supposed to be associated to a user ?
In that case a SQL join failed to load one user, and some bad outer join made that sort of mixup.
An for those, I think I was the initial author, but then why am I not listed like others ? My own user data failed to load ?
Oct 10 2019
So this is related to a migration of schema, with some changes in a limited period not correctly migrated (duplicate or missing ids in the database, causing unresolved outer joins or joins with the wrong row of another table or from a previous row in a subselection dataset ?
Then the bug should be isolated within a well defined period: when the migration started (probably the migration was made when the wiki was restarted and already live, but with event handlers still not running, or missing constraints that were delayed during the construction of some index, while data was already being added or updated by the live service).
In that case, an integrity check on changes made 2 days ago should detect the missing data in some helper tables (notably if the schema is partly denormalized for performance reason, with some data times duplicated in different tables).
Was there a migration of collation rules by upgrading the SQL engine for a new version of Unicode and its DUCET, or in CLDR support libraries if collation is performed by computing and storing collation keys in the schema?
Was there a change in the filesystem for the datastore for large blobs (e.g. a modified mounting point, or incorerctly migrated access rights if the datastore was moved, or an incorrect setting for some symbolic links)? I can't tell if this is what happened as I don't see the internal installation (only a site admin can inspect what happened and knows when a new installation was performed, and can inspect the migration logs or the internal filesystems and storage settings)
Could that be caused by the volume of my own contributions on that site (if there's some extension trying to process ALL my history, and there's some internal limit which gets exhausted by some limited buffer sizes?
Anyway, even another user trying to fix the broken messages for me do not succeed to do that (possibly this applies to any resource in which I was a commiter: all past contributers in some period have their history processed?).
I'm convinced now this is a problem of resource limits, causing a silent error (not tested or not logged, causing an exception which is caught but at a higher level where the debugging info has been lost, e.g. an exception caught, not logged but rethrown in a simpler way where they are managed in an oversimplistic way without really identifying the initial cause, possibly silently blocked and filtered by strict privacy rules to avoid exposing private data).
Some problem of cache with the wrong message "Le champ est obsolète. $1" displayed (in French) for a dozen of messages in this recent module (was completed several weeks ago, was correct, no change made since, but now with messages replaced by this one (with leading FUZZY marks).
And if you look at the recent update in Translatewiki, it uses a new set of messages to translate, with multiple ones displayed corrupted even if they were already translated correctly:
All these are showing the unexpected message (in French): "Le champ est obsolète. $1" (17 occurences), which does not reflect at all what was really translated (or that I tried to fix again without success).
Bad caching is still an issue, because this is what users will see when reviewing translations, so they will try to fix them multiple times, with multiple variants, and validation becomes futile, progress is impossible, users constantly see the same items coming back again and again corrupted, statistics are not properly handled, and lot of work become far bhind in the trail of their to do list.
And this causes many unnecessary updates, and there are even people revalidating as OK things that were corrected with unexplained reverts to old versions (which are themselves still invisible in the history). When people will finally see that in the histories, it will be late, users will not understand why all was OK at one time and then bad later
And it makes any form of cooperation becoming futile between users trying to find an agreement and to apply it consistantly. It then becomes hard to adopt a consistant terminology and a coherent interface in the target wikis where they are imported. And we cannot discuss issues based on page histories when one sees somethig thaty others don't see at all or see differently.
One reason this may affect the French translations is its heigh level of messages that are already translated and reviewed: finding the remaining messages that are left to translate or review may require more extensive queries. This may be a problem of schema (missing selective indexes, full scans forced over large counts of blobs that will be parsed massively, even if the parsing is very basic).
another page affected:
(once again the edit that displayed "Le champ est obsolète. $1" was not what I submitted, it has appeared in multiple unrelated messages)
In the page history, nothing was changed since over 2 weeks. But it still loads a different message and I don't know if what is visible in the basic Wiki editor is really what is used and displayed elsewhere, or if it's the basic Wiki editor that "lies" and the effective data is what we see in the Translate UI).
Note that I signeld this bug in Translatewiki.net support, and on its IRC support channel
I suggest alerting the admins channels so they investigate seriously and do not trust immediately what they see in simplified reports. In case of problems or doubts, they should come to this bug. If there are too many problems caused by this bug, then we should go to maintenance and revert the new version affected.
I don't know when Tranlatewiki.net was updated with a new version or extension causing this.
I'm not sure, but the "tone" is getting high in some wikis with people complaining that they did not make what they are accused for
the last edit you see is from me, I made it before you reported that message was reported here
I used the suggestion to make pseudo-dummy edits, just by seeing this message was in the list of messages to review.
But then once I did it, another unrelated message is impacted.
We are seeing now a loop of humane corrections that does not seems to end (and most of them are msising the user's history and the page histories, all is mixed up)
And may be there's a hidden hack that was introduced in the source code (possibly in a library) which is now exploited to attack wikis and create these conditions
Anyway on the affected wikis, some users start getting insulted or banned even if they did not make the breaking changes.
Havoc starts spreading to random places, and the user's history is now definitely wrong for what they really did.
This is not jsut a problem of display:
the unexpected text that appeared "Le champ est obsolète. $1" was coming from an unrelated change in another translation unit, and it was propagated to multiple different pages.
And then each time we try to fix it, another translation unit is unexpected impacted as well.
Looks like a loop with multiple retries, reusing the content of some local variables due to improper initialisation or invalidation of internal caches.
Sep 5 2019
For now this old bug is still present in various wikis (and more serious on wikis running in smaller servers with more limited CPU/memory resources): the HTTP error 500 occurs even when we ask to purge the list completely (and nothing is purged at all) even those with the most recent versions of Mediawiki.
Aug 7 2019
These two scripts are completely incompatible, so please don't make any fallbacks. If fallback is required according to the MediaWiki rules, then fall back to English.
Jul 22 2019
You affirmed "I don't know why" but I explain you the reason. It's a fact that I got notified by Phabricator just a few minutes ago (may be Phabricator was very late in delivering his notification email; in that case you should know that notifications are not delivered in due time and some can take months before being sent).
and your comment about "how to report abug" is NOT relevant, I'm not submitting a new bug, just commenting about the topic covered by the bug: getting consistant view of all scripts using Noto Fonts. Time has passed and this goal is still valid today, the Unicode coverage has constantly been improved (and it continues: Noto is a very active and well supported open project).
I got a recent update today from this channel. It was sent by "Maintenance_bot removed a project: Patch-For-Review" (https://phabricator.wikimedia.org/T184664) which just got closed now. And I was notified a few minutes ago about it by Phabricator which jsut sent me an email for it.
Jul 21 2019
Note that ALL ISO 15924 scripts marked as encoded in Unicode up to version 9.0 (including historic scripts) have a suitable Noto Font (most of them a "Noto Sans <abbreviatedScriptName>", but a few ones are in Serif style only). This includes all script variants and script mixes, provided you select the correct fallback for these scripts (e.g. use the default "Latn" script for "Latf" or "Latg", but for "Aran" there's a Nastaliq variant defined, and as well for the "Zsye" variant).
For CJK Fonts, it's best to use the "script-mixes" codes to map them: "Jpan", "Kore", and for "Hans" and "Hant" you should add the Bopomofo to the list.
In all cases, for CSS "font-family:" styles, the default font "Noto Sans" for Latin must be added at end of lists.
For symbols, there are three fonts to add in that order: "Noto Sans Symbols", "Noto Sans Symbols2", "Noto Sans Mono" (the last one needed for box drawing characters should be listed *after* the default font "Noto Sans" for Latin/Greek/Cyrillic.
Jul 20 2019
There's not just MySQL. Other organization may use MSSQL, Sybase, Oracle, Infomix all of them having their own charset support (and in all of them, installing additional charsets to support the full UTF-8 is costly as it also requires installing (and maintaining) collation data. In frequent cases, collation cannot be updated all the time at each Unicode version, because it requires costly reindexing (but partial UTF-8 is possible, and I think this is the reason why MySQL defined the UTF-8(mb3), even if it also requires updating the collations when there are Unicode or CLDR updates for characters encoded in the BMP).
Jul 15 2019
If you don't have it, then my comment is a feature request that would allow Translatewiki.net to be more useful for other projects (and would also avoid polluting the Translatewiki.net with broken links, missing categories, because some programming or markup language uses "custom" placeholder syntax.
your suggestion does not apply: it 's not viable to convert all tables on an existing database that has other uses.
And I do not necessarily "want to support full plane UTF-8" in a "utf8(mb3)" config. MediaWiki should still run without problem with that config, without causing major issues because of some unsupported characters that MediaWiki never checks.
Reread what I asked: I just want that MediaWiki checks the character sets (a simple insert or update in the database at startup, followed by a read, can immediately detect is non-BMP characters are safe or not, and it is enough to position a flag and then allow Mediawiki to make correct "preview" that will warn the user that his edit cannot be saved "as is".
But the fact that MediaWiki continues working as if there was no problem (and no problem visible or reported even when previewing the edited page) is unsafe.
Is it so complicate to make such check, which has a near-zero cost on UTF8(mb4) config, but will force the code to use text validation prior to saving or previewing, only if this "non-UTF8" config is detected? What is the performance impact really ?
Now you suggest me to develop a patch, but that requires me to develop MediaWiki itself (and I don't like programming in PHP). My initial bug was to ask to some developer to consider this as a request for improvement and fixing, and this old bug was valid since years and is still valid today, it is just not solved for now, and the current developers only seem to consider the needs of Wikimedia for its own wikis, but forgets the needs for other wikis that have different goals (and MediaWiki is not just made for Wikimedia, which has lot of WM-specific features not portable to other places, that don't have the large farm of servers and the complex storage configuration). Most wikis outside Wikiemdia run on a single host which run their own local database engine (and cannot support multiple engines, due to resources constraints). That's why MEdiawiki has many optional plugins they don't have to support, and why MediaWiki also supports several DB engines (and I don't see why it could not support an existing "mb3" config, even if this measn that users won't be able to post non-BMP characters; but in this config MediaWiki shoulkd still be safe to use (and for now it is not).
But even if the parser is disabled on Translatewiki.net, the message will be imported in a wiki where it will be incorrect as it will categorize the page displaying the message, but will not display the link and its text, leaving an incomplete non-sense sentence.
Unless the "intuition" plugin uses its own message parser and not the Wiki parser: in that case you should avoid the wiki syntax in the message for placeholders (in that case no fix is needed).
That message has no comment in "qqq" saying that it does not use the wiki syntax, and it was imported in Translatewiki.net with the default flags saying it was using the wiki syntax.
You still then need to fix the source message so that Translatewiki.net can infer the correct thing (this is what is done for messages intended for C/C++, or other programming languages or other markup languages).
So can you state clearly that the wiki parser will not be used on wikis where the "Intuition:Catdown" extension will be used ? You need to check the source code and test it. If effectively the wiki parser will not be used, then add the flag in the import that says to Translatewiki.net that it uses another parser, and don't forget to document it in the "/qqq".
Even if we use the translate interface, the message will be stored in wiki format, and it creates inexistant target links in the Translatewiki.net tracking categories. you should avoid that !
Jul 12 2019
This T135969 is an old bug (open many years ago) but it was recently "closed" abusively (by some mediawiki developers that are only interested in cleaning up the backlogs and don't want to solve the signaled problems, even this one which is a severe one), even if it was reported many years ago (and then considered as perfectly valid, because it really affected several wikis of Wikimedia). I cannot find the initial bug that existed long before Phabricator (where the history of bugs/RFEs that were closed before Phabricator was openwas not imported). But this was accepted as valid in that time (and it concerned multiple wikis used by Wikimedia and many others, not all of them were migrated or reloaded).
No, this is still installed as it was always documented. The basic test I request is also on topic for "MediaWiki database". This is a real bug in that part of Mediawiki, that never asserts but only assumes this is configured as you expect. Wikimedia itself has changed multiple times the way the encodings were used in the DB, and changed appropriately the SQL adapters, but it forgot this case which is very simple to test (at least assert at startup). If you made an assertion and stopped the engine, you would receive tons of complaints that Mediawiki now refuses to run.
It cans still be easily corrected by implemented (when required) the encoding converter (using NCRs for example, or saving with pairs of surrogates, if supported by the engine).
That's wrong. Being "capable" is just assumed, it is never checked and there are existing wikis using SQL backends that silently drop non-BMP characters (and all what follows them), one of them being the OpenStreetmap wiki. May be its misconfigured, but MediaWiki is completely forgets to check that, and this causes silent drops of data when editing.
Jul 4 2019
I know this is old, but the idea of using GENDER (from the viewing user) to change the title of a namespace where every user is neutral ! Or may be this will just apply when the "User:" prefix is used before an existing registered user (whose gender is known). But frequently we can't refer to ''any'' user just by its name (in some parameter) to guess which gender should apply. But may be MediaWiki, when it sees a "User:Name" may change itself the gender in the namespace found in link or when viewing the user page (or one of its subpages) according to user's preference.
In all cases, the gender forms for the "User:" namespace must be aliases on the target wiki, and this can cause problems on multilingual wikis if all gender forms in all languages must be used (the case of multilingual wikis in Wikimedia are for example Commons and Meta, but these also have a "default language" which is English and does not need any gnder form (so no need to create aliases).
Jul 3 2019
It's not a yes/no question, but multiple questions packed into one, for which it is impossible to reply by yes/no.
@Aklapper. You still did not ask any "yes/no" question. You just mentioned me with the goal to get explains, and that's what I did (in a structured way even if that not what you expected, but then what you expected is not what you asked for).
Jul 1 2019
Finally another reason is that the 4/4 icon is also very ugly: it should be dropped in case of completion of the translation (i.e. the green 4 squares), making it visible only for uncomplete ones, just to signal to users that the link goes to an uncomplete page.
We would then cleaner lists once translations are completed, only separated by tiny bold middle dots, and not with the bold bullet. The icons and the "big bullet" are both undesirable for standard navigation
Also I don't see any interest of using the very bold "standard bullet": there's also a thick separator introduced by the 4/4 colored icon.
Also the "big bullet" is actually much too bold in horizontal lists, it obscures the text.
These big bullets are only suitable for vertical lists. The "default" value is then bad. Note that the vertical line used in some other lists of the interface (notably in categories) is also bad (the vertical line is confused with actual letters of some scripts).
The middle dot is correct ONLY if it is surrounded by spaces and distinguished for some other dors used in some scripts. Note that the middle dot may now be used in French in the middle of words (notably for the "inclusive orthography" noting masculine + feminine), making it a bit bolder (but still not the ugly very bold "bullet") and surrounded by spaces avoids all confusions and still has a good interpretation as a punctuation separator.
No it was made to be consistent with other lists in many places of the interface.
Jun 29 2019
I just wanted that MediaWiki performs a basic check if it is not installed on a compliant base (this test can be extremely fast at startup, to see if it supports non-BMP characters or if they cause text to be truncated: in that case, some gobal boolean flag is set and will activate any data submission containing such non-BMP character so that the user is informed that these characters are not supported; but submitted text with them will then be rejected, and no unexpected truncation will silently occur) : this is a basic security feature, as many wikis cannot be reinstalled on another database without long offline migration period, and possibly the underlying database will not support it.
Jun 26 2019
You did not ask any question that I would reply yes or no. You just pinged
me with "may be I would know best" so you wanted some explains. That's
exactly what I did.
Jun 24 2019
There's NOTHING that Lua cannot handle. But modules have to do that
themselves (and most of them don't!). We have already various helper
modules that allow parsing the wikitext in parameters (in the appropriate
frame context) and convert them to stripped wiki text, or to perform the
full conversion to HTML, cleaning up safe HTML, normalizing, trimming,
compressing, detecting other equivalent values (including normalizing input
numbers), performing case folding.
However I'm not sure that all these steps can be implemented by Mediawiki
(before using the Scribunto hook) or by Scribunto itself:
In act it's up to each Lua module to determine which parameter values they
consider as equivalent, so that they will first canonicalize them.
Unfortunately I was cited for a single edit made 5 years ago (not invalid
at that time and not conflictin with any one as it had no prior history).
I cannot remember exactly the reason why I used " " in that case when
other languages used (most probably later) " " (which is also
May 6 2019
I'm not convinced this is needed to parse the page, only to generate its content. But is this related to conditional code like #if and #switch and with transclusion of Lua generated contents (that would then need to generate all lingusitic versions until a laguage filter is applied at end to purge the excluded section) ?
An we still lack the possibility of marking a specific page (with a margic syntax generating metadata, not content, like "[[Category:...]]") as being primarily in a specific language (independant of the user language, but that should NOT be inserted in pages marked for translations with the translation tools, which are marked automatically by the Translation tool and uses a specific page naming convention using "/langcode" suffixes/subpages, or some "langcode:" prefix or namespace, like on the OpenStreemap wiki).
Mar 12 2019
Your suspcition is wrong, I used standard jamos, not compatibility ones, and standard Hangul syllables.
I gave the sample code that just uses
#( mw.ustring.toNF[K][C/D] ( teststring ) )
to test the length of the result (3 bytes per Korean character in UTF-8).
I've not been able to get any NFD decomposition from an NFC encoded standard Korean string from Lua (in currently deployed "wm.ustring" package) where there are precomposed Hangul LVT or LV syllables (which are used everywhere in the NFC form in almost all Korean texts). So you jsut tested that you got NFC correct from an NFC string, but NFD is still not working, and canonical equivalence is still not working across all forms (NFC, NFD, or other non-normalized forms)
Mar 10 2019
Feb 11 2019
Note that the current implementation found in Commons does not work!
mw.ustring.toNFD(str) and mw.ustring.toNFKD(str) do not work as expected for all modern Korean Hangul syllables:
- the algorithmically composed syllables (LVT or LV forms using basic jamos) in range U+AC00..U+D7AF are still not decomposed at all
- the current implementation only uses the simple decomposition mapping pairs found in the UCD
- most decomposition mappings are found in the UCD for Korean, except those for Hangul LVT and LV syllables using "modern simple jamos".
- only the decomposition "legacy jamos" (some of them are of type VV, there are also some LVT or LV forms but using legacy jamos nor part of the Hangul precomposed syllable ranges) are in the UCD !
Oct 2 2018
Was there also a patch in the code that supported the query of external queries via any web API (independantly of the external service) so that they won't honor any further redirect from HTTPS to HTTP ? This is still needed for security, and may affect other Mediawiki extensions: such redirects should not be followed at all by default (except with a specific authorisation in the module/extension using that external HTTPS API, using an optional parameter to the support library); the global setting specific for $wgULSGeoService is not enough as the problem is more general and we shouldn't need to multiply such global setting when each extension should have its own settings.
Sep 19 2018
No: it's not "confusing", as the purpose was exactly to imitate the syntax of files, but with another prefix, and notably (as stated in the proposal allow simple basic conversion by replacing "File:name" by "Mapframe:x/y/z" and keeping all the additionals parameters used in Files, including notably sizing, positioning, framing, alignment, description, and as well supporting the same for links to files (using ":" before "File:" to go to a separate full page, such as a link with "[[:File:name|text]]" just becomes "[[:Mapframe:x/y/z|text]]" with the same basic replacement.
Sep 7 2018
Anyway this bug may reoccur at any time: I really suggest that any script or extension that makes HTTPS requests to any third party site does NOT honor any 403 redirect to HTTP, but instead logs a warning to disable it, or treat that redirect as a server-side error (as if it was HTTP 500, and not HTTP 403). This will then allow scripts to behave correctly and not bypass the security.
I wonder if there's a way to handle that a generic way in a common library used by all extensions.
This will make them safer: the status can be kept as 403, instead of being replaced by the status of the redirected page. The library loading the resource should still flag the resource as being in error. The HTTP status text may be kept, but appended by " (error: redirect from HTTPS to HTTP forbidden)". May be some scripts/extension may need to avoid this and a flag could by this check and still honor it, but the HTTP status text should be appended by " (warning: deprecated redirect from HTTPS to HTTP)".
Aug 17 2018
OK this looks good when testing it effectively on Mediawiki wiki. The bug can be closed (until the new version fully deployed on other wikis).