Page MenuHomePhabricator

Regex lists should have distinct content model, including syntax highlighting and line numbers like CSS/Javascript pages.
Open, Needs TriagePublicFeature

Assigned To
None
Authored By
Bugreporter2
Apr 19 2025, 7:46 AM
Referenced Files
F59385455: image.png
Apr 24 2025, 11:52 PM
F59298147: image.png
Apr 19 2025, 7:46 AM
F59298074: image.png
Apr 19 2025, 7:46 AM
F59298046: image.png
Apr 19 2025, 7:46 AM
F59298014: image.png
Apr 19 2025, 7:46 AM

Description

Background:

Regex lists exist in MediaWiki installation, including:

Plus

https://meta.wikimedia.org/wiki/Title_blacklist

And possibly others that I've missed.

Observations:

There is currently no syntax highlighting on the Regex pages on Wikipedia. Presumably, nobody has thought to do it yet.

Syntax highlighting is possible however using <syntaxhighlight lang=python> or <syntaxhighlight lang=perl>, or <syntaxhighlight lang=bash>, etc. Note that, unfortunately, there is no <syntaxhighlight lang=regex>. The colour schemes vary a little depending on the lingo, but you end up with something like:

image.png (577×1 px, 197 KB)

But where the code block begins...

# <syntaxhighlight lang=python>

The formatting ends up a little bit weird, because the parser initially treats it as WikiText:

image.png (195×1 px, 41 KB)

or

image.png (187×1 px, 40 KB)

It might be possible to start with <syntaxhighlight> tag not in a comment, but that would be treated as Regex, and might have unintended consequences, although page titles can't include <> so maybe it won't have any impact.

OK, so there is a quick fix, but let's dig a little deeper...

Which content model is best?:

Two possible content models are possible for this (both work the same, functionally):

  • Plain text, which permits *no* syntax highlighting. After T202424 at least this looks like plain text.
  • Wikitext, which allows syntax highlighting, but this clearly isn't really wikitext. It doesn't function as wikitext, and doing so produces some formatting issues.

And btw, it clearly isn't CSS or Javascript, but those two existing content models serve as examples of how to use content models on different types of pages, including appropriate syntax highlighting, and line numbers.

New content model is needed:

I think what's needed here, therefore, is an additional content model called Regex (or similar name). This would wrap the whole page in a <syntaxhighlight lang=python> (or whatever tag is preferred), and add line numbers, but otherwise function as Wikitext, and allow [[double-square bracket]] links and urls, e.g.

image.png (140×1 px, 40 KB)

Thank you for your consideration. Sorry, this one is a bit long-winded.

Notes:

Add further notes here.

Event Timeline

Bugreporter2 updated the task description. (Show Details)
taavi raised the priority of this task from Low to Needs Triage.Apr 19 2025, 1:03 PM

I would suggest forking the spam related content to a new task to be a child of T337431: Rework MediaWiki:SpamBlacklist and this task.

Thanks @Izno

I note the plan with T337431 is to completely deprecate the MediaWiki:Spamblacklist.

At present, that would still leave the other lists as Regex lists unless they also got similar treatment.

I would suggest forking the spam related content to a new task to be a child of T337431: Rework MediaWiki:SpamBlacklist and this task.

I honestly think this should be actually declined. Most regexes should be moved out to simple cases (as outlined in T337431) and what's left will be small enough that wouldn't require a dedicated content model or syntax highlighting. In most cases in software engineering, using a regex means something has gone wrong

And using a JSON content model as in the new BlockedDomains would likely highlight regexes appropriately anyway. So yes, I more or less agree with that suggestion regarding the context of the spam blacklist.

It does not, however, account for reworking the other items listed in the original task, which would probably deserve their own tasks a la 337431's efforts.

Wrapping everything in <syntaxhighlight lang=json> makes a completely incoherent neon-coloured headache-inducing mess. There is no lang=regex option, but there are, at least, other language options that aren't quite as bad.

image.png (837×1 px, 270 KB)

Wrapping everything in <syntaxhighlight lang=json> makes a completely incoherent neon-coloured headache-inducing mess. There is no lang=regex option, but there are, at least, other language options that aren't quite as bad.

image.png (837×1 px, 270 KB)

You misunderstand. The context of interest would be

{
  [
    "a? reg(ular )?ex(pression)?",
    "really\?",
    "It\'s more likely than you think",
  ],
}

If there is an actual highlighter out there, it's probably over in the direction of regex101, and is perhaps not supported by the editor (or pair) anyway.