Page MenuHomePhabricator

Test performance impacts of highlighting brackets for a large section of content
Closed, ResolvedPublic2 Estimated Story PointsSpike

Description

Use test instance to understand the performance impacts of highlighting brackets from the middle or the beginning or end of a large section (and large section with nested brackets). From the initial investigation T254976: Investigation: Bracket matching , it was estimated that this feature may be too slow on large chunks of text.

Event Timeline

Lena_WMDE renamed this task from Test performance impacts of highlighting brackets from center of a section to Test performance impacts of highlighting brackets for a large section of content.Aug 12 2020, 2:37 PM
Lena_WMDE updated the task description. (Show Details)
Lena_WMDE set the point value for this task to 2.

Note: need a follow up ticket to investigate the same impacts for tag matching. This has the potential to be even slower.

Here is what I have seen so far when reviewing the code of the existing matchbrackets add-on:

  1. It starts by looking at the cursor position. Is the cursor next to a bracket? If not, nothing happens.
  2. Is it an open or closing bracket? Depending on that the search direction is either forward or backwards. Never both.
  3. Search starts, character by character.
  4. Search might encounter a bogus bracket, e.g. (…}. It stops and highlights this error.
  5. Search might encounter nested brackets, e.g. (…{…}…). It uses a stack to keep track of these and skips them.
  6. Search stops when the matching bracket on the same nesting level is found, or reaches one of the rate-limits (maxScanLines and maxScanLineLength).

Note that while most of the code looks like it works with single characters only, there are also calls to cm.getTokenTypeAt(). This does, as far as I can see, work on larger chunks, i.e. "tokens". And because our "mediawiki" mode is already able to handle {{, {{{ and such as single tokens, there are situations where the character-based matchbrackets add-on gets confused.

Option A

My idea for "in the middle" support includes reusing all the steps above exactly as they are. All we do is to add a few more steps before the existing ones:

  1. When the cursor is not next to a bracket, we start searching for one. Let's use this example, where _ is the cursor: ( { } _ { } ).
  2. It might be a good idea to search in both directions the same time for performance reasons. But that's optional. Let's assume we search forward first.
  3. Going forward, we find {. That's a nesting level we need to skip. As before, create a stack and skip these. This will make sure the } is skipped as well.
  4. The moment we find the ) we found one of the two brackets we need. Now we use the existing algorithm from above to find the other one.

The complexity of this algorithm is, on average, 150% of the original one (assuming we start exactly in the middle on average).

Option B
  1. Similar to A, but we search in both directions (doesn't matter if done the same time or one after the other). As above, nested brackets are skipped.
  2. When we found a brackets in each direction, we highlight them.
  3. When they don't match (e.g. in { _ )), we additionally highlight them as errors.
  4. We never call the original algorithm. However, we must make sure we respect the same rate limits.

The complexity of this algorithm should be the same as the original one.

However, in both cases the new search algorithm is executed on all cursor positions, not only when the cursor is next to a bracket. This might significantly impact how responsive it feels when navigating the wikitext. It will probably feel the same as navigating a page that is entirely made of brackets.

There are many ways to improve this:

  • Start the search only when the cursor is on the same position for – for example – 500ms. This makes sure rapidly changing the cursor position is not affected. But it might be a visible delay, which might create the impression the editor is slow – the opposite of what we want.
  • Make sure the search is interrupted and restarts the moment the cursor position changes.
  • We could cache the structure of the text somehow and scan this tree instead scanning the text character by character over and over again. The cache is only updated when the text changes.
  • The syntax highlighter already marks pairs of brackets and everything they contain with increasingly opaque color shades. This might be a super-cheap way to find the pair of brackets we want to highlight. It doesn't look like the necessary information is there. There is only a stream of colored tokens, but no nesting.

Change 620653 had a related patch set uploaded (by Thiemo Kreuz (WMDE); owner: Thiemo Kreuz (WMDE)):
[mediawiki/extensions/CodeMirror@master] [POC] Always highlight brackets when cursor is inside a pair

https://gerrit.wikimedia.org/r/620653

to be honest I'm not sure how to reliably benchmark this. I believe this needs to be done on a slower machine. On my current dev machine navigating the text feels all fine and good with the POC patch https://gerrit.wikimedia.org/r/620653 in place. I did a quick benchmark with the Chromium dev tools and my new method findSurroundingBrackets() shows up with 0.4% "self time". That's not nothing, but very reasonable. I believe we can improve this a lot, if we need to. I already marked some ideas in the patch.

to be honest I'm not sure how to reliably benchmark this.

IMHO, perception is the most important measure and you've already checked this. In T260249#6388060 I measured microseconds elapsed for iterations through the add-on hooks, but it's not a very useful number other than to show the magnitude and give an upper bound on some specific machine.

Change 620653 abandoned by Thiemo Kreuz (WMDE):
[mediawiki/extensions/CodeMirror@master] [POC] Always highlight brackets when cursor is inside a pair

Reason:
Now part of Ib01d991.

https://gerrit.wikimedia.org/r/620653