Page MenuHomePhabricator

Double braces syntax highlighting crash.
Open, Needs TriagePublicBUG REPORT

Description

List of steps to reproduce (step by step, including full links if applicable):

  • Write long paragraph in an editor with syntax highlighting.
  • Type {{ before the paragraph.
  • See how the entire thing gets highlighted
  • Crash if paragraph is long enough.

What happens?: When {{ is typed in front of a large paragraph in the source code editor of Wikipedia. The browser tab resource usage causes either a crash or unresonable slowness. This is probably due to the purple highlighting of the subsequent text, or due to a frontend search of the corresponding template.

This is probably also a less visible issue when pages are parsed on the backend.

What should have happened instead?: Possible fixes include limiting the length of valid templates, or prohibiting newline characters, as is done with [[ links.

Software version (if not a Wikimedia wiki), browser information, screenshots, other information, etc.: whatever en.wikipedia.org is using. Browser is Firefox 88.0 . Bug probably present in all browsers. Can't trigger the crash with chrome.

Event Timeline

@TZubiri any chance you could link to an example paragraph which causes a crash for you?

Yes, the most prolific generator of walls of texts, the talk page of Wikipedia's Manual of Style https://en.wikipedia.org/w/index.php?title=Wikipedia_talk:Manual_of_Style&action=edit&section=32

Note how there is a block of bold faced votes starting with "I think 2 years is too soon." . These are not separated by a double newline until the end of the section, so a double brace immediately before that line is enough to crash my browser.

See https://en.wikipedia.org/wiki/Wikipedia:Village_pump_(technical)#Double_braces_bug_reproduction.

I have never had this crash my system, but I will confirm that Firefox (up to date version) chugs for a bit when the opening braces get opened and it re-renders the edit box.

I commented there about why limiting how far the {{ highlighting can extend is a bad idea (infoboxes as a first example).

A valid workaround is to type the }} first rather than after other changes, including the {{, as well as not using syntax highlighting when you're on a long page and you need to modify or add many templates.

To some extent, this performance issue is a side effect of the solution to T174480. In usual use cases of CodeMirror, the browser will only render a finite viewport instead of hundreds of lines of code.

This comment was removed by Bhsd.

To some extent, this performance issue is a side effect of the solution to T174480. In usual use cases of CodeMirror, the browser will only render a finite viewport instead of hundreds of lines of code.

Absolutely. We are doing something that CodeMirror advises against, probably because of potential problems like this, because we want to use the native find functionality of the browser. This seems like one of those "You can't have your cake and eat it"-situations.

BTW, I experience no crash on Safari or or FF, but my Mac is pretty new and has a lot of memory.

Maybe adding two-brace autocompletion would solve this, since you probably will also add two closing brackets anyway.

I like the idea, but the problem wall still occur if the user deletes the autocompleted braces, albeit less frequently. Additionally, this would impact some users negatively by changing the UI.

To some extent, this performance issue is a side effect of the solution to T174480. In usual use cases of CodeMirror, the browser will only render a finite viewport instead of hundreds of lines of code.

I see. A naïve solution I can think of is to keep the notion of a viewport for highlighting, and make everything outside the viewport searchable only. But it sounds non trivial to implement, and would increase the complexity of the code.

Maybe there's a solution that would decrease complexity rather than increase it. I bet the solution is not in the highlighting but in the parsing itself. The highlighter only makes the issue visible, but it would probably be an issue for any wiki markup parsing. Suppose a long page with a broken template at the start, this issue would occur as well, admittedly it would happen way less often, and the resulting html is probably cached, but still.

One naïve solution in the parsing would be to fail early if the first argument (name of the template) doesn't look like the name of a known template, surely there's a limit on the number of characters allowed in a template, maybe inherited from the length limit on articles? (minus the template namespace+1) But this would just push the crash to the moment he user writes the second argument. Maybe the highlightable viewport and non highlightable but searchable full text isn't such a bad idea.

One naïve solution in the parsing would be to fail early if the first argument (name of the template) doesn't look like the name of a known template, ...

This may also be hard to implement. Below is an example where I modified the code a bit to recognize wrong template names. The left brackets are marked as errors in red, because the character is prohibited in page names. However, the next problem is that we do not know whether the curly braces at the beginning of this line are mistyped, or the square brackets are mistyped.

Screen Shot 2022-03-10 at 3.17.33 PM.png (51×1 px, 25 KB)

It can be even more complicated if there are more wikitext syntax in the template name. For instance, enwiki recommends something like {{ {{#if:{{{test|}}}|template1|template2}} }} in order to reduce the post-expand include size.

What a hard problem.

I have been thinking, even in a 100K character article, the parsing would only cost 100K cycles, since modern processors have a frequency of GHz, this would mean that parsing should take less than a thousandth of a second. Perhaps there needs to be no behaviour change, just a plain optimization without any impact on behaviour.

Should we try step debugging on an mvce to measure where processor cycles are going? Or do you have any idea why typing two braces would cost more than 1 or 2 cycles per character? I'm thinking literally about searching for '{{' or '}}' occurences, and, ending the search if }} is encountered, or recursing if {{ is encountered. But I guess that you still need to look for other syntax like '[[' occurences and all of wiki syntax.

I realize this is a hard problem, but it may be solved already in an analogous codebsae, on server side processing of wiki pages. Is the syntax parser when running a preview ( or posting an article) the same as when highlighting syntax? Perhaps this code is duplicated and the problem is already solved in the server side.

Based on my understanding of this extension, the real-time syntax highlighting, or CodeMirror, works very different from the backend PHP parser. Furthermore, it can be costly to "look ahead" in CodeMirror, which simply means getting information from the next few lines (reference).

I think this is basically the same as T197502. I am tempted to merge that here since this task is more focused on the general problem rather than performance on mobile devices, specifically.

T303664: CodeMirror option "viewportMargin: Infinity" rethinking has a nice write-up of the current situation and available options. My inclination is to revive https://gerrit.wikimedia.org/r/c/mediawiki/extensions/CodeMirror/+/574855/. It should help, even without the further improvements Ed Sanders was proposing on the patch. What do you think @TheDJ ?

@MusikAnimal it will most likely help with this yes. If I understand Ed's proposed further change, he is arguing that we could split handling of input (by user), from the input coming in via textSelection api. For the user input, we know the changes are inside the viewport (probably, I have some concerns about what would happen if you scroll down, bringing the cursor out of view and then pasting some text for instance, but that could be tested), so we don't actually have to re-render the entire contents in that particular case.