Create wikitext tokenizer with rules identical to Parser of MW Core
Open, LowPublic
Actions

Assigned To

Authored By

	Pastakhov
	Aug 21 2015, 10:38 AM

Description

The CodeMirror extension tokenizes wikitext differently of MW Core Parser.
It is not a problem for plain wikitext, but for complex wikitext needed a different approach. (Example T108455 and T108450)
The main difference is that CodeMirror looks for tokens successively in text and Parser looks for tokens in whole text.

The problem is that the string at the beginning may seem like a token, but in fact it isn't.
Incorrect syntax highlighting complicates the visual perception, but rolling back for correction reduces performance.

Perhaps, the best way is to use the combined method, since when editor is writing an article, the end is not known, but probably it will be more comfortable when wikitext is highlighted.
Or necessary automatically add closing tokens, for example if editor wrote '{{', need to add '}}' after cursor.

Related Objects
Search...

		Status	Subtype	Assigned	Task
		Resolved		Pastakhov	T108450 Highlighting broken in complicated templates
		Open		Bhsd	T109822 Create wikitext tokenizer with rules identical to Parser of MW Core

Event Timeline

Pastakhov created this task.Aug 21 2015, 10:38 AM

Pastakhov raised the priority of this task from to Needs Triage.

Pastakhov updated the task description. (Show Details)

Pastakhov added a project: MediaWiki-extensions-CodeMirror.

Pastakhov subscribed.

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptAug 21 2015, 10:38 AM

Pastakhov added a parent task: T108450: Highlighting broken in complicated templates.Aug 21 2015, 10:39 AM

Like Parsoid?

Not sure, but I think their tasks are different.
I'll see how parsoid works, Thanks.

Pastakhov renamed this task from Create JS parser of wikitext similar Parser of MW Core to Create wikitext tokenizer with rules identical to Parser of MW Core.Aug 24 2015, 5:19 AM

Pastakhov updated the task description. (Show Details)

Pastakhov set Security to None.

Pastakhov added a subscriber: Florian.

Pastakhov updated the task description. (Show Details)Aug 24 2015, 10:05 AM

The Parser's WikiText tokenizing is pretty complex (using multiple passes on the whole text). Fully matching it in real-time isn't likely to be possible. I think we should decline this task and instead try to address specific cases that are broken (some of which may not be fixable without degrading performance unacceptably).

If I remember correctly, I meant order of parser.
And performance should be increased only. Now sometimes (maybe always) tokenizer returns back for parse the same text again because when you look at the beginning of a string you can not be sure which exactly this token is, until find the end of the token.
For example if you meet {{{{{ it can be:

{{{{{1}}}}} - parameter inside template transclusion
{{{{{ hello word - just five {

The Parser's WikiText tokenizer works different. It looks for tokens in whole a string.
For example the firstly it find parameters, then templates, etc. It should be faster and more correctly, but this is not suitable for cases where the string has not been written completely yet.

You could also have a look at my parser in https://de.wikipedia.org/wiki/Benutzer:Schnark/js/syntaxhighlight.js, for the template mess especially search for "Multiple braces". My approach there isn't perfect, but I actually never found a real live instance where it broke.

Schnark mentioned this in T108450: Highlighting broken in complicated templates.Jul 11 2017, 8:12 AM

Pastakhov mentioned this in T171074: HTML comment next to a heading disables heading highlighting.Jul 20 2017, 4:53 AM

Niharika triaged this task as Low priority.Nov 7 2017, 11:56 PM

Restricted Application added a project: Community-Tech. · View Herald TranscriptNov 7 2017, 11:56 PM

kaldari removed a project: Community-Tech.Dec 13 2017, 1:14 AM

TheDJ moved this task from Backlog to Improvement on the MediaWiki-extensions-CodeMirror board.Mar 20 2019, 10:35 AM

Restricted Application added a subscriber: Liuxinyu970226. · View Herald TranscriptMar 20 2019, 10:35 AM

Bhsd claimed this task.May 29 2024, 10:02 AM

Create wikitext tokenizer with rules identical to Parser of MW CoreOpen, LowPublicActions

Description

Related ObjectsSearch...

Event Timeline

Create wikitext tokenizer with rules identical to Parser of MW Core
Open, LowPublic
Actions

Related Objects
Search...