As of December 31, 2011, the parser tests file was ~9000 lines and ~190kb and had about 670 test entries. PHP parser was the only test runner and in wt2html mode.
Today, it has grown to be 26000+ lines now and is close to 700+kb. It has 1600+ test entries.
Today, the PHP parser still runs only wt2html tests and it runs 1316 of those as of today.
For Parsoid, the test counts today are: wt2html: 1033, wt2wt: 1150, html2html: 976, html2wt: 594, selser: 15953.
And, we are constantly tweaking and adding new tests.
In the parserTests file, you can specify PHP parser only tests, Parsoid-only parser tests, or tests that run with both parsers. For wt2html output, you can specific output common to all parsers (html/*), for php parser without tidy enabled (html/php), for php parser with tidy enabled (html/php+tidy), or for parsoid (html/parsoid).
For Parsoid, you can also specify that the test run in all or a subset of the 4 test modes (wt2html, wt2wt, html2wt, html2html). On all test that have wt2wt enabled, the parsoid test runner also generates new selective serialization tests by generating randomized edits on the parsed HTML.
For Parsoid, in addition, you can also specify manual edits that should be performed on the wikitext and specify what the edited wikitext should look like (the edits are specified on the Parsoid DOM with CSS selectors). In addition, you can also specific whether these manual edits should be run with the production-enabled selective serializer or the full normalizing serializer. You can also enable / disable the scrubWikitext mode on these tests.
These various test modes are used to test all kinds of Parsoid functionality.
* edge cases for the wt2html path
* parsoid-specific edge cases for the html2wt serialization
* expectations for how the html2wt nowiki algorithm should behave
* expectations for how the scrubWikitext mode should behave
* expectations for how serialization should behave on certain kinds of edits (that we don't want to leave to the whims of randomized edit generation)
I think a single parser tests file that covers all this functionality has gotten a bit out of control. Over the last 2 years, I have periodically proposed that we split up this monolithic test file into multiple separate test files, but I have received pushback each time I proposed it. But, after yet another fixup of the parser tests file (https://gerrit.wikimedia.org/r/#/c/236223 ) where, this time, the nowiki expectation tests had been edited incorrectly over time thus defeating the purpose of those tests, it is time to start moving towards breaking up this test setup to something more sane and manageable. This is not the first time where these or other tests had to be edited because they had been incorrectly updated simply because it was not clear what kind of functionality they had been set up to test.
1. For starters, we could simply split up the test fie into multiple files according to what broad functionality they are testing and implement a 'include' directive maybe that just stitches them together into a monolithic file so the test runner is none the wiser. But, it makes editing of these tests more manageable and keeps the tests better organized.
2. Longer term, we could enable a feature where you could separate out the template/article definitions into its own files and have them be included in these different individual test files and also have those individual test files be runnable separately. We could potentially provide options where we could provide global test options for the entire file and have individual tests override them.
3. I would be happy with just 1. for now, but features as in 2. and other additional features to better manage this are welcome.