Page MenuHomePhabricator

RFC: Split parser tests info multiple files
Open, Stalled, MediumPublic

Description

The mediawiki/core parser tests cases have grow tremendously since the start of the Parsoid project:

Date# lines Size# of tests
December 31, 2011~9 000~190kB670
Sept 2015~26 000700+kB1600+
Delta+ 15 000+ 510 kB+ 930+

In December 31, 2011, PHP parser was the only test runner and in wt2html mode.

In September 2015 the PHP parser still runs only wt2html tests and it runs 1316 of those. For Parsoid, the test counts today are:

Run # of tests
wt2html1305
wt2wt1279
html2html1232
html2wt1329
selser17042

And, we are constantly tweaking and adding new tests.

In the parserTests file, you can specify PHP parser only tests, Parsoid-only parser tests, or tests that run with both parsers. For wt2html output, you can specific output common to all parsers (html/*), for php parser without tidy enabled (html/php), for php parser with tidy enabled (html/php+tidy), or for parsoid (html/parsoid).

For Parsoid, you can also specify that the test run in all or a subset of the 4 test modes (wt2html, wt2wt, html2wt, html2html). On all test that have wt2wt enabled, the parsoid test runner also generates new selective serialization tests by generating randomized edits on the parsed HTML.

For Parsoid, in addition, you can also specify manual edits that should be performed on the wikitext and specify what the edited wikitext should look like (the edits are specified on the Parsoid DOM with CSS selectors). In addition, you can also specific whether these manual edits should be run with the production-enabled selective serializer or the full normalizing serializer. You can also enable / disable the scrubWikitext mode on these tests.

These various test modes are used to test all kinds of Parsoid functionality.

  • edge cases for the wt2html path
  • parsoid-specific edge cases for the html2wt serialization
  • expectations for how the html2wt nowiki algorithm should behave
  • expectations for how the scrubWikitext mode should behave
  • expectations for how serialization should behave on certain kinds of edits (that we don't want to leave to the whims of randomized edit generation)

I think a single parser tests file that covers all this functionality has gotten a bit out of control. Over the last 2 years, I have periodically proposed that we split up this monolithic test file into multiple separate test files, but I have received pushback each time I proposed it. But, after yet another fixup of the parser tests file (https://gerrit.wikimedia.org/r/#/c/236223 ) where, this time, the nowiki expectation tests had been edited incorrectly over time thus defeating the purpose of those tests, it is time to start moving towards breaking up this test setup to something more sane and manageable. This is not the first time where these or other tests had to be edited because they had been incorrectly updated simply because it was not clear what kind of functionality they had been set up to test.

  1. For starters, we could simply split up the test file into multiple files according to what broad functionality they are testing and implement an include directive maybe that just stitches them together into a monolithic file so the test runner is none the wiser. But, it makes editing of these tests more manageable and keeps the tests better organized.
  1. Longer term, we could enable a feature where you could separate out the template/article definitions into its own files and have them be included in these different individual test files and also have those individual test files be runnable separately. We could potentially provide options where we could provide global test options for the entire file and have individual tests override them.
  1. I would be happy with just 1. for now, but features as in 2. and other additional features to better manage this are welcome.

Event Timeline

ssastry created this task.Sep 5 2015, 2:49 PM
ssastry raised the priority of this task from to Medium.
ssastry updated the task description. (Show Details)
ssastry added subscribers: ssastry, GWicke.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptSep 5 2015, 2:49 PM
ssastry updated the task description. (Show Details)Sep 5 2015, 2:52 PM
ssastry set Security to None.
ssastry updated the task description. (Show Details)Sep 5 2015, 2:58 PM
ssastry updated the task description. (Show Details)Sep 5 2015, 3:01 PM
ssastry added subscribers: cscott, Arlolra, tstarling.
ssastry updated the task description. (Show Details)Sep 5 2015, 3:11 PM
ssastry updated the task description. (Show Details)Sep 5 2015, 6:54 PM
hashar updated the task description. (Show Details)Sep 7 2015, 3:14 PM

I have slightly tweaked the task details to show some bits as tables.

In MediaWiki core the list of .txt files are defined in $wgParserTestFiles and the PHPUnit test suite just iterate over it to load the tests.

For core:

  • tests/phpunit/includes/parser/NewParserTest.php

For extensions there is some wrapping going out to prevent running the mediawiki/core tests:

  • tests/phpunit/suites/ExtensionsParserTestSuite.php
  • tests/phpunit/includes/parser/MediaWikiParserTest.php

I am not sure what kind of push back you received, but it seems to me it is pretty straightforward:

  • split the .txt file
  • adjust $wgParserTestFiles in includes/DefaultSettings.php

Another daunting task would be to refactor the messy test runner:

  • tests/phpunit/includes/parser/NewParserTest.php
  • tests/testHelpers.inc

Some of the bad code can be blamed to me such as the delayed parser tests (which was to prevent parsing articles for tests that we have filtered out).

This is not related to Continuous-Integration-Infrastructure but part of the broader MediaWiki-Core-Testing. The CI job will just run PHPUnit test suite and does not really care how tests are organized as long as the suite runs them.

Note that Parsoid really ought to be running parser tests from the Cite extension as well: T114256: Parsoid should run tests against the Cite parser tests, not just MediaWiki core's.

One proposed directory layout in Parsoid might be:

  • tests/parserTests/core/*.txt -- tests imported from mediawiki-core
  • tests/parserTests/cite/*.txt -- tests imported from Extension:Cite
  • etc

Of course, this makes synchronizing parsertest changes between parsoid and the various extensions even more... intricate.

Note that Parsoid really ought to be running parser tests from the Cite extension as well: T114256: Parsoid should run tests against the Cite parser tests, not just MediaWiki core's.
One proposed directory layout in Parsoid might be:

  • tests/parserTests/core/*.txt -- tests imported from mediawiki-core
  • tests/parserTests/cite/*.txt -- tests imported from Extension:Cite
  • etc

Of course, this makes synchronizing parsertest changes between parsoid and the various extensions even more... intricate.

For CI, there is no need to manually sync the files. I believe we already have all the logic/material to have Parsoid and Cite to be joined together. I commented on T114256.

ssastry moved this task from Backlog to Next Up on the MediaWiki-Parser board.Dec 17 2015, 5:50 PM
ssastry moved this task from Needs Triage to Next Up on the Parsoid board.Dec 17 2015, 6:05 PM
ssastry moved this task from Next Up to Needs Triage on the Parsoid board.May 29 2016, 3:49 AM

Possible structure for splitting up the parserTests file:

quote
list
table
heading
indent_pre
paragraph
links/
- wikilink
- extlink
- urllink
- magicwords
- interwiki
- langlinks
cite
template
sanitizer
html
edge_cases (?)
parsoid/
- wt2html/
  - tsp
  - dom/
    - tplwrap
    - .. and others .. 
- html2wt/
  - links
  - nowikis
  - selser
  - normalization
  - templatedata
  - edge_cases (?)
greg added a subscriber: greg.Aug 24 2016, 5:43 PM
Krinkle moved this task from Inbox to Backlog on the Architecture board.Mar 29 2017, 8:52 PM
Krinkle moved this task from Inbox to PHPUnit on the MediaWiki-Core-Testing board.Jul 7 2017, 5:13 AM
Restricted Application added a subscriber: jeblad. · View Herald TranscriptJan 4 2018, 12:18 AM
Krinkle renamed this task from [RFC]: Split parser tests info multiple files to RFC: Split parser tests info multiple files.Jan 4 2018, 12:18 AM
daniel added a subscriber: daniel.Feb 7 2018, 9:46 PM

@ssastry Are you still interested in pushing this forward? What should the next steps be? Are you interested in having an IRC discussion about this?

@ssastry Are you still interested in pushing this forward? What should the next steps be? Are you interested in having an IRC discussion about this?

Yes, I am interested definitely. But, it just keeps get deprioritized on my end among all the other things that needs doing. I don't need an IRC discussion for this but am happy to engage in one if you feel this needs wider discussion / input.

Does TechCom have any input on this proposal?

But, other than that, next step is mostly starting the work starting on the php test framework end to introduce some "include/import/use/require" directive so that definitions of articles/templates/etc. can be sourced properly. If someone else wants to do this, I don't want to block this -- I am happy to have them do it.

daniel moved this task from Inbox to Backlog on the TechCom-RFC board.Feb 8 2018, 4:17 PM

@ssastry Thanks for the reply! We currently don't have any comments, we just had a very brief look during our last meeting. I'm butting this into the backlog for now, since it seems stalled at the moment.

ssastry changed the task status from Open to Stalled.Feb 8 2018, 4:23 PM
ssastry moved this task from Next Up to Backlog on the MediaWiki-Parser board.Apr 6 2018, 8:41 PM
LGoto moved this task from Needs Triage to Backlog on the Parsoid board.Sat, Feb 15, 9:42 PM
ssastry moved this task from Backlog to Testing on the Parsoid board.Wed, Feb 19, 11:30 PM