Page MenuHomePhabricator

Add short-to-type aliases for <syntaxhighlight> and <syntaxhighlight inline>
Open, Needs TriagePublicFeature

Assigned To
None
Authored By
Novem_Linguae
Mar 29 2024, 8:54 PM
Referenced Files
F44481425: image.png
Apr 4 2024, 11:34 AM
F44481380: image.png
Apr 4 2024, 11:34 AM
F44063342: syntaxhighlight inline.gif
Apr 1 2024, 7:54 PM
F43796542: image.png
Mar 29 2024, 9:10 PM
F43796534: image.png
Mar 29 2024, 9:10 PM

Description

Feature summary (what you would like to be able to do and where):

  • add <sh> for <syntaxhighlight> (multi-line code)
  • add <shi> for <syntaxhighlight inline> (single line code)

Use case(s) (list the steps that you performed to discover that problem, and describe the actual underlying problem which you want to solve. Do not describe only a solution):

Benefits (why should this be implemented?):

  • <syntaxhighlight>, <syntaxhighlight inline>, and <code><nowiki> are laborious to type, providing a poor experience for technical contributors trying to type code on wiki pages
  • There's folks that go around patrolling the category https://en.wikipedia.org/wiki/Category:Pages_with_syntax_highlighting_errors and putting pressure on folks to always add lang="abc" to <syntaxhighlight> tags, making the typical tag length even longer. So for example I find myself typing very long things like <syntaxhighlight inline lang="wikitext">.

Other

Event Timeline

Change #1015469 had a related patch set uploaded (by Novem Linguae; author: Novem Linguae):

[mediawiki/extensions/SyntaxHighlight_GeSHi@master] Add <sh> and <shi> parser tags

https://gerrit.wikimedia.org/r/1015469

Would also be great to get help with why my patch's 8 new parser tests pass locally when running the command docker compose exec mediawiki php tests/parser/parserTests.php --file=extensions/SyntaxHighlight_GeSHi/tests/parser/parserTests.txt, but 3 of the 8 tests fail in Jenkins.

image.png (259×1 px, 38 KB)

image.png (449×1 px, 74 KB)

https://integration.wikimedia.org/ci/job/quibble-vendor-mysql-php74-noselenium-docker/165698/consoleFull

Personally i was a fan of <source>....

putting pressure on folks to always add lang="abc" to <syntaxhighlight> tags

Without a language you are using <pre> which is pretty close to your desired short code tag length.

Personally i was a fan of <source>....

T39042: Remove <source> syntax from SyntaxHighlight (GeSHi)

I would not oppose declining this in favour of also declining that.

more industry standard

I don't disagree with the sentiment that there are widely used pseudo-standards for code documentation and these are often based on Markdown or borrow some Markdown syntax, but "industry standard" is a boring appeal to authority argument without even stating the authority you would wish to defer to. To my knowledge there is no well defined industry for the production and use of FOSS wiki software.

Anecdotally I have spent more time that I care to recall over the course of my software development career pushing back on the idea that the number of characters in an identifier is a meaningful barrier to adoption of human readable labels. To my mind, the slight inconvenience of the number of keystrokes needed in this case is outweighed by the clarity of the tag's name being exactly analogous to the operation performed by the tag. The shorter "sh" and "shi" proposed aliases carry no semantic meaning without the longer "syntaxhighlight" label being presented in the same context.

Perhaps "highlight" could be seen as a meet in the middle compromise? I'm still not sure how to determine the actual need for such an alias however. It certainly would be easier to engage in a discussion of the topic if it was presented in the form of a Phabricator feature request task, a mailing list post, or a wiki page rather than a Gerrit patch.

but "industry standard" is a boring appeal to authority argument without even stating the authority you would wish to defer to

A bit harsh. I was just trying to reference the other two tickets where backticks were suggested. Anyway, I'll edit my post to say "syntax that some people are already familiar with such as Markdown".

Without a language you are using <pre> which is pretty close to your desired short code tag length.

Oh nice. I didn't know <pre> had nowiki built into it. I'll start using that, although I don't think it covers use cases where we want to specify a language such as <syntaxhighlight lang="js">. And I don't think it covers the one line/inline case.

<pre> is indeed processed in wikitext similar to many Markdown implementations, with "nowiki"-like treatment applied.

Another unofficial, but fairly common, feaure in Markdown implementation is support for <pre lang="js">. This works on GitHub, Gitiles, Doxygen, npmjs.org, and probably others. It'd be tricky to somehow work that into the SyntaxHighlight extension as it'd require replacing a core parser tag (or hooking into its callback), but not impossible.

Personally i was a fan of <source>....

T39042: Remove <source> syntax from SyntaxHighlight (GeSHi)

I would not oppose declining this in favour of also declining that.

T39042#7084286 merits engagement if you want to go that path.

I actually think providing a short form of <syntaxhighlight> is great, but would personally rather not support the <shi> form. (And would prefer to see the triple back ticks supported for the block form, but I get why this task exists.)

I'm afraid I have to oppose as well. The syntax is 15 years old. Why change it? Why now? How often do people actually face the situation that they have to type it? How often will people have to decipher a cryptic <sh> in the wikitext? Isn't there always a way to add it with one or two mouse clicks? If not, why not? Won't it be copy-pasted anyway most of the time? Doesn't "sh" mean "shell (script)"? Or dozens of other things? In which other context does it mean "syntax highlight"?

Seems like there's lots of opposition to this.

I think I'll give up on the <sh> part of this ticket and just use <pre>.

But the inline stuff that <shi> would solve might be worth continuing to explore. Here's my use case (video):

syntaxhighlight inline.gif (1×2 px, 2 MB)

Any ideas for what we can name <shi> that would achieve consensus and still be short?

<shi> combines a tag name (syntaxhighlight) with a boolean parameter (inline). That is even more confusing than the short tag name and should be avoided.

Somewhat of an opinion that <pre lang=""> and <code lang=""> should just function as synonyms to <syntaxhighlight> if the extension is installed. Or something similar to them (language or hlang to differ from HTML lang?). Yes, it is not ‘correct HTML’, but so what? As long as wikitext doesn’t insert the attribute into HTML, we’re fine, I’d say. And <syntaxhighlight> without any lang should be considered an error. (<source> to me is bad since it doesn’t say anything about what the tag does, even if it is less verbose.)

Change #1015469 abandoned by Novem Linguae:

[mediawiki/extensions/SyntaxHighlight_GeSHi@master] Add <sh> and <shi> parser tags

Reason:

objections in phab ticket

https://gerrit.wikimedia.org/r/1015469

Looks like the <pre> tag doesn't indent, making it much less useful on talk pages than <syntaxhighlight>.

image.png (801×1 px, 46 KB)

image.png (856×1 px, 33 KB)

Readability trumps writability, IMO. If you find it annoying to type <syntaxhighlight inline> or <code><nowiki> (me too), some ideas:

  • Better editor support (a keyboard shortcut, or opening up a syntaxhilight dialog when you type ```).
  • Introduce <syntax> or <highlight>, which isn't all that shorter, but easy to understand and doesn't conflict with HTML.
  • Introduce a pseudo-namespace, like <mw:code>.
  • Probably not feasible to do retroactively, but I wonder whether making <syntaxhighlight>text</syntaxhighlight> automatically inline iff text doesn't contain any newline would have made sense.
  • We could create a parser tag for <code> or <source>, but have it just output the HTML tag unless some distinctive property is used. lang is also a HTML property though, so not the best candidate for that. Maybe something like format or type?
  • Maybe the pre-save transform could convert <sh> and <shi> into <syntaxhighlight> / <syntaxhighlight inline>? That feels very hacky... also, not easy to do I think.

T311518: Add ``` as shorthand for <syntaxhighlight> would be cool if not very hard to do. I assume Linter could be used to weed out the few false positives?
(On the other hand, I doubt T217944: Support backticks (`) for code samples (as in Markdown) is feasible.)

<pre> is indeed processed in wikitext similar to many Markdown implementations, with "nowiki"-like treatment applied.

Another unofficial, but fairly common, feaure in Markdown implementation is support for <pre lang="js">. This works on GitHub, Gitiles, Doxygen, npmjs.org, and probably others. It'd be tricky to somehow work that into the SyntaxHighlight extension as it'd require replacing a core parser tag (or hooking into its callback), but not impossible.

Parser::setHook() just overrides the previous hook (and even gives you the old one as a callable), and ParserFirstCallInit runs after the core hooks are set, so that seems very easy to do. Confusing for editors though, because of the formatting/positioning differences between core <pre> and SyntaxHighlight blocks, unless plain (non-language-specific) <pre> is also taken over, which could cause issues with a lot of existing / historic code.

Code completion T95100 should alleviate this when we finally get it.

Oh no, please do not change course every decade.

  • German Wikipedia successfully phased out <source> and migrated to <syntaxhighlight> in all namespaces.
  • Do not make things more complicated and introduce new aliases.
  • Every alias makes things more complicated, more things humans need to keep in mind, patterns to be searched for in pages, patterns to be obeyed by bots and replacing scripts.

Who is typing syntax keystroke by keystroke today? That’s last century style. Nowadays we discovered c&p, source text insertion tools, forms, cheatsheets (cribs) with c&p patterns as individually needed.

  • A shortcut is saving five seconds once for one single keystroke-by-keystroke author, but consuming minutes for all following people over decades. Very bad balance.

<source> was bad not because it is a shorter name or an alias but because it conflicts with an HTML tag. Most people are currently typing out things keystroke by keystroke on many wikis if MediaWiki:Edittools or similar doesn’t contain all the needed syntax combinations. While I don’t think <sh> / <shi> / <shit> are the way to go here, having syntax highlighting with a shorter syntax that wouldn’t conflict with any HTML is a normal idea. Half of the solutions you describe aren’t.

Most people are currently typing out things keystroke by keystroke on many wikis

[Citation needed] Please provide a proof.

  • People in German Wikipedia acted this way about 2010, but meanwhile our users learnt to use c&p.

Only a very limited number of page authors require <syntaxhighlight>, e.g. those describing new examples in new programming languages etc. Those are technical experts, and if desired they have a wiki user page or text editor page open with frequently needed patterns ready to copy.

  • For template documentation we use <syntaxhighlight> on regular base, but we do provide this for new documentation pages via preload page.
  • Hint: Once you got an opening <syntaxhighlight> you can gain the closing </syntaxhighlight> by c&p the opener and adding a slash. I did this five seconds ago. To be honest, in this talk I did never type syntaxhighlight but copied it always from Add short-to-type aliases for <syntaxhighlight> and <syntaxhighlight inline> or previous contributions.

Still it is a benefit of five seconds once for the first keystroke-by-keystroke author, but paid over decades by many others consuming minutes every time if they encounter an unknown tag, or search for source text did not match, or bots and scripts failed on replacing.

Who is typing syntax keystroke by keystroke today?

I am.

I had an idea today. Maybe we could solve one of the root problems of this ticket (too many keystrokes to type <code><nowiki></nowiki></code> and <syntaxhighlight inline></syntaxhighlight>) by coding the wikiparser to always treat <code> tags as nowiki on the inside. I've filed T402727: <code> tags should treat content inside them as <nowiki> to discuss.

Another unofficial, but fairly common, feaure in Markdown implementation is support for <pre lang="js">. This works on GitHub, Gitiles, Doxygen, npmjs.org, and probably others. It'd be tricky to somehow work that into the SyntaxHighlight extension as it'd require replacing a core parser tag (or hooking into its callback), but not impossible.

Somewhat of an opinion that <pre lang=""> and <code lang=""> should just function as synonyms to <syntaxhighlight> if the extension is installed.

I like this idea a lot. It's unfortunate that the raw <pre> and <pre lang=xxx> will have different indentation schematics, but that seems like a relatively minor problem.

How often do people actually face the situation that they have to type it? [...] Won't it be copy-pasted anyway most of the time?

Copy-pasting is supposed to a convenience, not the canonical way to insert text. It's for a reason that we have wikilink syntax like [[this]] and not like <wikilink target="foo">that</wikilink>.

It'd be tricky to somehow work that into the SyntaxHighlight extension as it'd require replacing a core parser tag (or hooking into its callback), but not impossible.

Implementation-wise, looks like all we have to do is to redefine <pre> tag in SyntaxHighlight. It will take precedence over core's definition. The catch being the regular <pre> (with no lang= attribute) is also now controlled by SyntaxHighlight, which now that I see it, is perhaps not a bad idea? Per CTT, wikitext is not html (T25932#8434339). So <pre> in wikitext emitting something other than <pre> in html should not be a problem. This will be an "upgrade" for the tag as it would now work in indented blocks (T361407#9688086).

I am pretty strongly against causing impedance (per my discussion in T25932) with <pre lang>/<code lang> expecting a code language name and not a BCP47 language code (wikitext remains a fairly straightforward translation in many ways to HTML and we should minimize conflicts like that one so teaching people who know one is less painful - especially regarding technical topics like this one where they will know that there is an expectation of the other flavor).

(Yes, there are reasons to use particularly <pre> that aren't code e.g. T194651, so I think I do lean to "is perhaps not a bad idea" probably being a bad idea, so maybe some further searching on the front is a good idea.)

I would probably have enough heartburn to have heartburn but just that about picking some other reasonable attribute name. syntax perhaps to echo the extension's name. (c.f. <math> as a particular example where we overrule element naming in what I guess is a happy accident these days since no one should ever have to write Math HTML even by copy-pasting from something that does it for you.)

(Alternatively, someone can ask the WHATWG to bless having code language names in lang if they really want to wait a long time for a change in this regard. That might be interesting and would reflect some markdown as well as syntaxhighlight using lang= for this purpose and they like it when there are prior use cases.)

(Alternatively, someone can ask the WHATWG to bless having code language names in lang if they really want to wait a long time for a change in this regard. That might be interesting and would reflect some markdown as well as syntaxhighlight using lang= for this purpose and they like it when there are prior use cases.)

Would lead to all kinds of fun conflicts. E.g. ada and awk are valid language codes and programming language names at the same time.

I am pretty strongly against causing impedance (per my discussion in T25932) with <pre lang>/<code lang> expecting a code language name and not a BCP47 language code (wikitext remains a fairly straightforward translation in many ways to HTML and we should minimize conflicts like that one so teaching people who know one is less painful - especially regarding technical topics like this one where they will know that there is an expectation of the other flavor).

I'd have been happy if you won the debate at T25932 and we considered wikitext a superset of html. But since you didn't, I don't see why should we be stuck in the worst of both worlds. Using the lang html attribute in wikitext is niche, and <pre> in wikitext for non-code stuff is even more so. The combination making it harder to teach wikitext seems a poor reason to be "strongly against". <pre lang has exactly 6 uses on enwiki – 1 with a bogus value ("tid") and 5 already have a code language passed to them, so people actually think that should work.

I am pretty strongly against […] <pre lang>/<code lang> expecting a code language name and not a BCP47 language code […]
(Alternatively, someone can ask the WHATWG to bless having code language names in lang […])

See also https://github.com/whatwg/html/issues/7869. TLDR: The conversation there seems to lean toward "No", and for exactly the reasons you state. It would conflict with a global attribute that is already specified, inherited, and used in its place.

While <pre lang may have low adoption in MediaWiki context, it's part of the HTML standard. We could make a breaking change and make <pre lang invoke the syntaxhighlight tag. But would that not inevitably lead to a feature request for how to output <pre lang HTML from wikitext? Setting the language for a piece of content seems a likely use case.

Personally i was a fan of <source>....

T39042: Remove <source> syntax from SyntaxHighlight (GeSHi)

I would not oppose declining this in favour of also declining that.

I suggest:

  • Decline T39042 on the grounds that there is clearly interest in having a shortcut, and <source> is a useful and established shortcut that does precisely that. Creating a different shortcut seems not worth the effort. Over 13 years ago we posited at T39042, that <source> should go because it exists in the HTML5 spec. In that time, no concrete application has emerged. Given how media uploads and external content are managed in MediaWiki, this seems unlikely to change. Note that we already support videos (via Extension:TimedMediaHandler) and we even output <source> today, without conflict. If and when a concrete use case does emerge, we can weigh the pros/cons of a adding a MW-specific tag vs re-purposing <source>.
  • Close this task (T361407) as resolved, given <source> exists. Those that prefer it can continue to type and submit it that way in wikitext. (How we save it and how VisualEditor/Parsoid format their content is a different matter. See also: Parsoid default read views, and long-term ideas about storing Parsoid HTML in revision blobs, with wikitext as input method.)

While <pre lang may have low adoption in MediaWiki context, it's part of the HTML standard. We could make a breaking change and make <pre lang invoke the syntaxhighlight tag. But would that not inevitably lead to a feature request for how to output <pre lang HTML from wikitext? Setting the language for a piece of content seems a likely use case.

BTW, please note that

  • syntaxhighlight will expose &nbsp; or &lt;
  • Both pre and nowiki render the entity character

and mixing all together will break existing pages. Indeed, a “breaking change”.

Again – we have a consistent set of sufficient capabilities, with self-explaining tags.

  • Everything that is added to optimum will make things worse, more cryptic, more difficult to learn and understand all redundant aliases.
  • Recently it has been invented C&P; personally I do not type those by keyboard, but insert words with the mouse.

No new syntax without new functionality.

I'd have been happy if you won the debate at T25932 and we considered wikitext a superset of html. But since you didn't, [...]

You should go say that you support my points on that task then. (NB I also said it should *not* be a superset of HTML, already acknowledging ways in which it clearly should not be and is not today.) The other person departed the discussion. I don't see that as a "win" on their part either, and certainly doesn't make my points go away or not useful in other discussions. (I haven't pushed on literally anything over there because it's exhausting having people not read or respond to what you've written, potentially multiple times though I try not to repeat things, and because I don't want to make a bunch of subtasks that all fundamentally say "see parent for why this exists". But hey, it's been multiple years at this point so maybe that's the solution to getting what I've requested over there.)

I don't see why should we be stuck in the worst of both worlds. Using the lang html attribute in wikitext is niche, and <pre> in wikitext for non-code stuff is even more so.

Another attribute is probably a reasonable improvement to wikitext for me (modulo upstream risk c.f. mentioned <math>) and certainly a reasonably compromise, and is something that stjn in his original "here's an idea" himself suggested, though you may have glanced over it as I originally did :^). And since you *haven't* argued against my points from 25932?...

Particularly of interest here given the intersection of lang as programming language and natural language attribute name is that English is the king of programming and we have a lot more languages than just English, so the likelihood of clashing increases when you depart English using wikis (and it might be relevant on wikis like Wikisource even if you can't find examples today on English or elsewhere [I checked Russian]).

The combination making it harder to teach wikitext seems a poor reason to be "strongly against".

It's not fun to say "there's edge cases" (and it's not fun to program for them). Learnability/teachability should always be a key factor in deciding whether syntax should be one way or another in a designed language. We had two other persons here pipe up about my offhand "change HTML" comment that I think pretty solidly supports "no, using the same attribute is just a bad idea". (I still think there's something that could be done there, there's a lot of space to ask people who want to use the same attribute to do something like p-python since IIRC there are no one-letter language codes. But I wouldn't have heartburn with the upstream direction either.)

<pre lang has exactly 6 uses on enwiki – 1 with a bogus value ("tid") and 5 already have a code language passed to them, so people actually think that should work.

lang not being linted is something that can/should be fixed upstream ( :^), then maybe we can get someone to appreciate the extreme efforts that {{lang}} goes to to emit correct lang attributes and actually help support more languages than those currently supported.

Attribute lang= should be reserved to human languages, and must not be introduced with another meaning in any task any more.

  • It has been a sin to define <source lang="lua">, both conflicting with HTML system.
  • The HTML tags and the MW tags create a unique Wikisynta space, and attributes as well as tag names should be consistent.
  • lua is ISO 639 code, currently in WMF incubator.

There might be cases where lang= in <pre> will be used by regular text authors for human language, e.g. in multlingual descriptions:

The library record is defined as:
<pre>
item type: book | booklet
<pre>
or in French
<pre lang="fr">
reliure: livre | cahier
<pre>
or in German
<pre lang="de">
Produktformat: Buch | Heft
<pre>

And the example above is precisely HTML, since MW has just caught the HTML, inserts nowiki for content, and will forward it. For authors it is not obvious that they are using a MW element. HTML spec must not be violated.