Page MenuHomePhabricator

Split parser tests into multiple files
Open, MediumPublic

Description

The mediawiki/core parser tests cases have grow tremendously since the start of the Parsoid project:

Date# lines Size# of tests
December 31, 2011~9 000~190kB670
Sept 2015~26 000700+kB1600+
Delta+ 15 000+ 510 kB+ 930+

In December 31, 2011, PHP parser was the only test runner and in wt2html mode.

In September 2015 the PHP parser still runs only wt2html tests and it runs 1316 of those. For Parsoid, the test counts today are:

Run # of tests
wt2html1305
wt2wt1279
html2html1232
html2wt1329
selser17042

And, we are constantly tweaking and adding new tests.

In the parserTests file, you can specify PHP parser only tests, Parsoid-only parser tests, or tests that run with both parsers. For wt2html output, you can specific output common to all parsers (html/*), for php parser without tidy enabled (html/php), for php parser with tidy enabled (html/php+tidy), or for parsoid (html/parsoid).

For Parsoid, you can also specify that the test run in all or a subset of the 4 test modes (wt2html, wt2wt, html2wt, html2html). On all test that have wt2wt enabled, the parsoid test runner also generates new selective serialization tests by generating randomized edits on the parsed HTML.

For Parsoid, in addition, you can also specify manual edits that should be performed on the wikitext and specify what the edited wikitext should look like (the edits are specified on the Parsoid DOM with CSS selectors). In addition, you can also specific whether these manual edits should be run with the production-enabled selective serializer or the full normalizing serializer. You can also enable / disable the scrubWikitext mode on these tests.

These various test modes are used to test all kinds of Parsoid functionality.

  • edge cases for the wt2html path
  • parsoid-specific edge cases for the html2wt serialization
  • expectations for how the html2wt nowiki algorithm should behave
  • expectations for how the scrubWikitext mode should behave
  • expectations for how serialization should behave on certain kinds of edits (that we don't want to leave to the whims of randomized edit generation)

I think a single parser tests file that covers all this functionality has gotten a bit out of control. Over the last 2 years, I have periodically proposed that we split up this monolithic test file into multiple separate test files, but I have received pushback each time I proposed it. But, after yet another fixup of the parser tests file (https://gerrit.wikimedia.org/r/#/c/236223 ) where, this time, the nowiki expectation tests had been edited incorrectly over time thus defeating the purpose of those tests, it is time to start moving towards breaking up this test setup to something more sane and manageable. This is not the first time where these or other tests had to be edited because they had been incorrectly updated simply because it was not clear what kind of functionality they had been set up to test.

  1. For starters, we could simply split up the test file into multiple files according to what broad functionality they are testing and implement an include directive maybe that just stitches them together into a monolithic file so the test runner is none the wiser. But, it makes editing of these tests more manageable and keeps the tests better organized.
  1. Longer term, we could enable a feature where you could separate out the template/article definitions into its own files and have them be included in these different individual test files and also have those individual test files be runnable separately. We could potentially provide options where we could provide global test options for the entire file and have individual tests override them.
  1. I would be happy with just 1. for now, but features as in 2. and other additional features to better manage this are welcome.

Event Timeline

ssastry raised the priority of this task from to Medium.
ssastry updated the task description. (Show Details)
ssastry added subscribers: ssastry, GWicke.
ssastry set Security to None.

I have slightly tweaked the task details to show some bits as tables.

In MediaWiki core the list of .txt files are defined in $wgParserTestFiles and the PHPUnit test suite just iterate over it to load the tests.

For core:

  • tests/phpunit/includes/parser/NewParserTest.php

For extensions there is some wrapping going out to prevent running the mediawiki/core tests:

  • tests/phpunit/suites/ExtensionsParserTestSuite.php
  • tests/phpunit/includes/parser/MediaWikiParserTest.php

I am not sure what kind of push back you received, but it seems to me it is pretty straightforward:

  • split the .txt file
  • adjust $wgParserTestFiles in includes/DefaultSettings.php

Another daunting task would be to refactor the messy test runner:

  • tests/phpunit/includes/parser/NewParserTest.php
  • tests/testHelpers.inc

Some of the bad code can be blamed to me such as the delayed parser tests (which was to prevent parsing articles for tests that we have filtered out).

This is not related to Continuous-Integration-Infrastructure but part of the broader MediaWiki-Core-Tests. The CI job will just run PHPUnit test suite and does not really care how tests are organized as long as the suite runs them.

Note that Parsoid really ought to be running parser tests from the Cite extension as well: T114256: Parsoid should run tests against the Cite parser tests, not just MediaWiki core's.

One proposed directory layout in Parsoid might be:

  • tests/parserTests/core/*.txt -- tests imported from mediawiki-core
  • tests/parserTests/cite/*.txt -- tests imported from Extension:Cite
  • etc

Of course, this makes synchronizing parsertest changes between parsoid and the various extensions even more... intricate.

Note that Parsoid really ought to be running parser tests from the Cite extension as well: T114256: Parsoid should run tests against the Cite parser tests, not just MediaWiki core's.

One proposed directory layout in Parsoid might be:

  • tests/parserTests/core/*.txt -- tests imported from mediawiki-core
  • tests/parserTests/cite/*.txt -- tests imported from Extension:Cite
  • etc

Of course, this makes synchronizing parsertest changes between parsoid and the various extensions even more... intricate.

For CI, there is no need to manually sync the files. I believe we already have all the logic/material to have Parsoid and Cite to be joined together. I commented on T114256.

Possible structure for splitting up the parserTests file:

quote
list
table
heading
indent_pre
paragraph
links/
- wikilink
- extlink
- urllink
- magicwords
- interwiki
- langlinks
cite
template
sanitizer
html
edge_cases (?)
parsoid/
- wt2html/
  - tsp
  - dom/
    - tplwrap
    - .. and others .. 
- html2wt/
  - links
  - nowikis
  - selser
  - normalization
  - templatedata
  - edge_cases (?)
Krinkle renamed this task from [RFC]: Split parser tests info multiple files to RFC: Split parser tests info multiple files.Jan 4 2018, 12:18 AM

@ssastry Are you still interested in pushing this forward? What should the next steps be? Are you interested in having an IRC discussion about this?

@ssastry Are you still interested in pushing this forward? What should the next steps be? Are you interested in having an IRC discussion about this?

Yes, I am interested definitely. But, it just keeps get deprioritized on my end among all the other things that needs doing. I don't need an IRC discussion for this but am happy to engage in one if you feel this needs wider discussion / input.

Does TechCom have any input on this proposal?

But, other than that, next step is mostly starting the work starting on the php test framework end to introduce some "include/import/use/require" directive so that definitions of articles/templates/etc. can be sourced properly. If someone else wants to do this, I don't want to block this -- I am happy to have them do it.

@ssastry Thanks for the reply! We currently don't have any comments, we just had a very brief look during our last meeting. I'm butting this into the backlog for now, since it seems stalled at the moment.

ssastry changed the task status from Open to Stalled.Feb 8 2018, 4:23 PM
Krinkle added a subscriber: Krinkle.

Closing old RFC that is not yet on to our 2020 process and does not appear to have an active owner. Feel free to re-open with our template or file a new one when that changes.

ssastry renamed this task from RFC: Split parser tests info multiple files to Split parser tests info multiple files.Sep 16 2020, 7:17 PM
ssastry reopened this task as Open.
ssastry removed a project: TechCom-RFC.
Dinoguy1000 renamed this task from Split parser tests info multiple files to Split parser tests into multiple files.Oct 18 2020, 7:15 AM

Change 655920 had a related patch set uploaded (by Arlolra; owner: Arlolra):
[mediawiki/core@master] Split out media parser tests

https://gerrit.wikimedia.org/r/655920

Change 655920 merged by jenkins-bot:
[mediawiki/core@master] Split out media parser tests

https://gerrit.wikimedia.org/r/655920

Change 656211 had a related patch set uploaded (by Arlolra; owner: Arlolra):
[mediawiki/services/parsoid@master] Sync parserTests with core

https://gerrit.wikimedia.org/r/656211

Change 656211 merged by jenkins-bot:
[mediawiki/services/parsoid@master] Sync parserTests with core

https://gerrit.wikimedia.org/r/656211

Change 658462 had a related patch set uploaded (by C. Scott Ananian; owner: C. Scott Ananian):
[mediawiki/vendor@master] Bump wikimedia/parsoid to 0.13.0-a23

https://gerrit.wikimedia.org/r/658462