Page MenuHomePhabricator

IDEA: Move parallel tag parsing logic from Math to core
Open, LowestPublic

Description

The Math extension has a mechanism that causes that math tags on wiki pages are not parsed sequentially but in micro-batches. With the current batch size of 150, this causes a significant speedup in rendering pages with multiple math expressions. (Speedup for math-heavy pages, e.g., in Wikiversity, from about 10 minutes to about 1min.)

The downside of the approach - It is currently a hack -

  1. It uses a static variable to collect all math tags
  2. then it replaces math tags with strip markers,
  3. there are different implementations that either pass tags at once to a mathoid command-line client or post the requests in micro-batches to a restbase server using HTTP-multi client implementations (currently guzzle).
  4. uses the parserAfterTidy hook to insert the actual rendering. (See also T103269)

As we are updating the math extension code in preparation to update to MathJax 3 (See T237516) I was thinking if that might be an option to find a better solution that must not live in the Math extension and can be integrated to the core, since the current mechanism can not be maintained by the math community in the long run.

Any ideas on how to proceed here?

Event Timeline

To me, it would be important to figure out

  • does nobody care about this idea,
  • or are the people that care not aware that this ticket exists

as soon as possible.

I am asking myself:

  • Is the best use of my time to implement a patch and upload it to Gerrit?
  • Or is it better to promote this ticket on mailing lists, IRC, etc...?

Since we plan to replace the old parser with Parsoid in the not-too-far-away future, it would perhaps be best to think about how this kind of thing could work in 5that new world. I personally know very little about the inner workings of parsoid. Perhaps @SubrahamanyamVarma has some thoughts.

@SubrahamanyamVarma long time no see. It would be really interesting to hear what you think. I would expect that this is exactly what parsoid was made for:-)

Physikerwelt lowered the priority of this task from High to Lowest.Dec 20 2020, 9:49 PM

This can also be done later. I made good progress in implementing thís for direct mathoid access.

ssastry added a subscriber: cscott.

I think you all meant to tag me instead of the other person. :-) Adding @cscott as well. Haven't taken a look at the ticket yet, Will do later, but for now, wanted to fix up subscribers.

We would have to introduce an extension config / api option to introduce a batching mode where Parsoid appropriately batches the requests and calls the extension to process the batch and takes care to integrate the results into the document. So, the extension would still be responsible for processing the batch, but all the boilerplate handling / admin will be removed from the extension. Presumably this would be useful for any extension that shells out / issues api requests, not just Math.

So, sourceToDom is the one-at-a-time processing case. and we would probably introduce a batchProcessToDom which would return DomDocumentFragment[].