Page MenuHomePhabricator

Explore bot-replacing all WikiHiero uses with Unicode glyphs, and switching off the extension
Open, Needs TriagePublic

Assigned To
Authored By
Jdforrester-WMF
Aug 19 2023, 1:38 AM
Referenced Files
F37690113: grafik.png
Sep 11 2023, 11:26 AM
F37689971: image.png
Sep 11 2023, 10:56 AM
F37689953: image.png
Sep 11 2023, 10:55 AM
F37689950: image.png
Sep 11 2023, 10:55 AM
Tokens
"Love" token, awarded by Aklapper.

Description

WikiHiero is a very old, and very under-resourced, extension, that takes a Latin sequence and replaces it with PNGs representing them as hieroglyphics. In the twenty years since it was first developed, Unicode support for hieroglyphics is now sufficient that we probably don't need this legacy code, and could instead switch off the extension.

If we were to do this, we'd need to:

  • Confirm this approach will work, and announce to various stakeholders.
  • Have a bot replace all uses on Wikimedia wikis.
  • Disable the extension from Wikimedia production.
  • Sunset the extension, and advise third party users to use the above script.

Event Timeline

First thought that comes to mind is that Hiero tends to output images that are a pretty solid size, whereas the Unicode basically fits into the size of a normal text character. So some sort of template is probably called for. https://en.wikipedia.org/wiki/Egyptian_hieroglyphs#Phonetic_reading seems exemplary for sizing. IDK if that's necessarily desirable in all places....

It looks like in that section that there's some support in the extension for combining/modifying glyphs (see the group <hiero>z:G38-A-A47-D54</hiero>, the z: modifies G38). It's not obvious to me if that's trivial to do with the Unicode and probably can't be done without at least some chunk of CSS in the Template:Fraction / <ruby> direction.

(I am not a domain expert.)

This may also require bundling a supported webfont to UniversalLanguageSelector.

Have a bot replace all uses on Wikimedia wikis.
and advise third party users to use the above script.

Instead of using such a bot we can consider writing a Lua module to convert WikiHiero syntax to Unicode characters. The Unicode characters for hieroglyphics are not easy to type in the keyboard.

Peachey88 subscribed.

CSR tag should on a proper CSR task (not that they are actually getting reviewed)

You sure about removing the Code Stewardship Review tag? Discussing the possible undeployment of an extension seems to me like a good fit for that tag.

As for inserting the characters from the perspective of medium and small wikis, it would be best to add them to "special characters" in the editors, preferrably all three (Wikieditor, VisualEditor, 2017 wikitext editor).

CSR is for the review process itself including its heavy template. But "Code Stewardship Review process is on pause until further notice" anyway.

My suggestion is to write a script that would do the change (based on pywikibot), create a category for pages using the legacy hieroglyph and then announce and ask communities to migrate as they see fit (e.g. if they want to use a template, they can) and then run the bot separately for small wikis.

Have a bot replace all uses on Wikimedia wikis.
and advise third party users to use the above script.

Instead of using such a bot we can consider writing a Lua module to convert WikiHiero syntax to Unicode characters. The Unicode characters for hieroglyphics are not easy to type in the keyboard.

That means writing some script and maintaining it for basically forever while supporting two competing systems at the same time (hiero tag + unicode). This is sub-optimal.

Change 952525 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[mediawiki/extensions/wikihiero@master] Add tracking category for pages that use <hiero>

https://gerrit.wikimedia.org/r/952525

Forgive me if I'm missing something in the technical ins and outs, but would the Unicode replacement be able to arrange hieroglyphs in vertical groups the way WikiHiero can? Trying to look it up via Google turns up this discussion on Stackexchange from two years ago, which seems to indicate that at that time it was technically possible to do it using Hieroglyph Format Controls, but the respondents weren't sure that that there was any system able to implement them.

I haven't tried it but it should be doable via CSS to rotate the text. You could add it to TemplateStyles of Template:Hiero and use that instead.

Unfortunately I don't know anything about unicode either, I'm entirely ignorant. Googling tells me the format control characters exist in Unicode for vertical stacking and horizontal grouping (based on the Manuel de Codage formatting that <hiero> and JSesh use, using : for vertical stacking and * for horizontal grouping) but like the link A. Parrot found, the only discussions I can find about it here from a year and a half ago and here 2 months ago seem to say the fonts didn't exist at the time. To my mind, a replacement for WikiHiero needs the same amount of functionality as the existing system

Currently, Unicode hieros mostly appear in the {{font}} template. I don't know if this contributes to how they appear or if it is just that they don't display properly because I don't have the necessary font installed. I'm assuming the blank boxes are supposed to be control characters. Something with Unicode hieros that I don't know if it would be easy to fix is that the default position for a horizontal sign is flat along the bottom of the line, whereas in WikiHiero (and Ancient Egyptian writing in general) the default position is centrally in the middle of the line: compare the two at Transliteration of Ancient Egyptian.

(It's probably just me but I find WikiHiero way easier to see with its thicker lines than Unicode of the same size)

Your sandbox renders like this for me. Chrome 116, Windows 10

image.png (838×1 px, 160 KB)

image.png (277×814 px, 14 KB)

Mine does have thicker lines for the bold one:

grafik.png (2×1 px, 329 KB)

Change 952525 merged by jenkins-bot:

[mediawiki/extensions/wikihiero@master] Add tracking category for pages that use <hiero>

https://gerrit.wikimedia.org/r/952525

Should't the category be named wikihiero-usage-tracking-category?

Should't the category be named wikihiero-usage-tracking-category?

Thanks for catching that, it doesn't matter that much as we hopefully undeploy this extension and this will be temporary.

Should't the category be named wikihiero-usage-tracking-category?

Thanks for catching that, it doesn't matter that much as we hopefully undeploy this extension and this will be temporary.

It's better to just fix it now. There are many "temporary" changes that end up staying with us for many years. The tracking category is likely to take many months to populate, given that T157670 has not been addressed by developers.

+1 that this should be fixed, got here by translating untranslated category names over at ruwiki and was surprised that this message is named this way (ideally it should be made wikihiero-tracking-category, like others).

I also don’t think that WikiHiero should be undeployed without providing an alternative solution first. Just converting the text to Unicode is not a solution, developing a Lua module/template that would insert the approximately same markup is a better one.

I also don’t think that WikiHiero should be undeployed without providing an alternative solution first. Just converting the text to Unicode is not a solution, developing a Lua module/template that would insert the approximately same markup is a better one.

Adding a template or module to every wiki could create a lot of technical debt. Unless this extension is used on a low number of wikis and/or there's some kind of system for mass updating templates/modules.

Exactly, I understand the need for some specific cases and covering all edge cases the extension covers currently but I also want to mention we have more than 180 extensions deployed to production and each class of +1M lines of code needs maintenance.

For example:

We really can't maintain that much code, even if we triple our engineering size and do just maintenance and not create any new feature. Every class we have in production is taking volunteer and staff time just to keep it there. If we undeploy or remove code in general, we will have better capacity to do a better maintenance in other areas or build new features (or combination of both).

In other words, I understand there are marginal benefits to the extension compared to the unicode but have considered the cost of the extension itself? By spending engineering resources on maintaining this extension, we are missing improvements in other areas.

Adding a template or module to every wiki could create a lot of technical debt. Unless this extension is used on a low number of wikis and/or there's some kind of system for mass updating templates/modules.

There were some, see the description at https://www.mediawiki.org/wiki/Global_templates

But I was speaking more so to the fact that the replacement should be equivalent, and given the problems with Unicode hieroglyphics, I don’t think just pointing to them is a solution. Yes, WikiHiero is bad (I can personally point to the accessibility problems in its implementation that make it bad), no, that does not mean that we should throw the baby out with the bathwater.