Parsoid uses [[ https://github.com/tstarling/pegjs | a fork ]] of [[ https://github.com/pegjs/pegjs | PEG.js ]]that @tstarling worked on. This fork adds some features to PEG.js to remove some JS / async related hacks to PEG.js and improve the tokenizer-generation performance of PEG.js.
To port Parsoid to PHP, we need a replacement for this PEG tokenizer.
Here are some options available to us.
* There is [[ https://github.com/nylen/phpegjs | phppegjs ]] which is a plugin for PEG.js that generates a PHP tokenizer instead of a JS tokenizer. It also enables co-location of PHP and JS action code in the PEG tokenizer. But, this requires us to do one of the following:
## Abandon Tim's fork and adapt Parsoid-PHP to use this tokenizer. This is not be a workable solution out of the box.
## Upstream some of Tim's changes to PEG.js, and then use the php-peg plugin. This requires us to separate out the necessary features and upstream them and for the maintainer to be interested in these changes.
## Implement the php-peg plugin on top of Tim's fork.
* Evaluate [[ https://github.com/smuuf/php-peg | PHP-PEG ]] and see if our PEG grammar works with that
* If performance of the tokenizer is a potential concern, evaluate [[ https://github.com/gpakosz/peg | C-PEG ]] and see if our PEG grammar works with that.
This task is to evaluate our options and propose a suitable solution that meets our functional and performance requirements.