The parser tests file has grown to be 26000+ lines now and is close to 700+kb. It has 1600+ test entries.
wt2html was the only test mode before Parsoid came along and there were fewer than 500 tests.
Today, the PHP parser still runs only wt2html tests and it runs 1316 of those as of today.
For Parsoid, the test counts today are: wt2html: 1033, wt2wt: 1150, html2html: 976, html2wt: 594, selser: 15953.
And, we are constantly tweaking and adding new tests.
In the parserTests file, you can specify PHP parser only tests, Parsoid-only parser tests, or tests that run with both parsers, and tests that invoke Tidy on the PHP parser output.
For Parsoid, you can also specify that the test run in all or a subset of the 4 test modes (wt2html, wt2wt, html2wt, html2html). Randomized edit-generated selser tests are enabled on all tests that have wt2wt enabled.
For Parsoid, in addition, you can also specify manual edits that should be performed on the wikitext and specify what the edited wikitext should look like (the edits are specified on the Parsoid DOM with CSS selectors). In addition, you can also specific whether these manual edits should be run with the production-enabled selective serializer or the full-normalizing serializer. You can also enabled / disable the scrubWikitext mode on these tests
These various test modes are used to test all kinds of Parsoid functionality.
* edge cases for the wt2html path
* parsoid-specific edge cases for the html2wt serialization
* expectations for how the html2wt nowiki algorithm should behave
* expectations for how the scrubWikitext mode should behave
* expectations for how serialization should behave on certain kinds of edits (that we don't want to leave to the whims of randomized edit generation)
I think a single parser tests file that covers all this has gotten a bit out of control. Over the last 2 years, I have periodically proposed that we split up this monolithic test file into multiple separate test files, but I have received pushback each time I proposed it. But, after yet another fixup of the parser tests file (https://gerrit.wikimedia.org/r/#/c/236223 ) where, this time, the nowiki expectation tests had been edited incorrectly over time thus defeating the purpose of those tests, it is time to start moving towards breaking up this test setup to something more sane and manageable. This is not the first time where these or other tests had to be edited because they had been incorrectly updated simply because it was not clear what kind of functionality they had been set up to test.
1. For starters, we could simply split up the test fie into multiple files according to what broad functionality they are testing and implement a 'include' directive maybe that just stitches them together into a monolithic file so the test runner is none the wiser. But, it makes editing of these tests more manageable and keeps the tests better organized.
2. Longer term, we could enable a feature where you could separate out the template/article definitions into its own files and have them be included in these different individual test files and also have those individual test files be runnable separately. We could potentially provide options where we could provide global test options for the entire file and have individual tests override them.
3. I would be happy with just 1. for now, but features as in 2. and other additional features to better manage this are welcome.