[REL1_30] Some parserTests fail on debian stretch using Tidy, because of a new version of libtidy
Open, Needs TriagePublic

Description

$ composer update && php tests/phpunit/phpunit.php --group Database --filter 'Block tag pre'
1) ParserIntegrationTest::testParse with data set "parserTests.txt: Block tag pre" ('[details omitted]')
Failed asserting that two strings are equal.
--- Expected
+++ Actual
@@ @@
-'<pre>
-foo
-</pre>'
+'<pre>foo</pre>'

/home/hashar/projects/mediawiki/core/tests/phpunit/includes/parser/ParserIntegrationTest.php:52
/home/hashar/projects/mediawiki/core/maintenance/doMaintenance.php:94

That fails for me in REL1_30 but is fixed in master thanks to:

[bd912292041cfb92e45eeffb7ca0f06cca267b98] Use RemexHtml as the tidy implementation for parser tests

hashar created this task.Apr 9 2018, 9:10 AM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptApr 9 2018, 9:10 AM

Hmm. Those parserTests only got adjusted to expect Tidy changes with that change though?

Legoktm added a subscriber: Legoktm.Apr 9 2018, 5:28 PM

Are you running the tests on Debian stretch? There's a new version of libtidy that causes some tidy parser tests to fail: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=828097#22

I think you'll need to recompile tidy and php-tidy to link against the older libtidy. Or decide that for parser tests to pass you need a version >= libtidy 5, even for old release branches...backporting the Remex patch is too large IMO, and affects extensions as well.

hashar added a comment.Apr 9 2018, 8:09 PM

Indeed that is in a Debian Stretch container with:

PackageDebian version
libtidy51:5.2.0-2
php-tidy1:7.0+49
php7.0-tidy7.0.27-0+deb9u1
tidy1:5.2.0-2

From php -i:

EnvlibTidy Release
php5.5 Jessie25 March 2009
php7.0 Stretch2016/04/07

As I get it master (and thus MediaWiki 1.31) get tests running with Remex. That is great.

For the older branches, I thought about adding php7.0 testing for MediaWiki 1.27, 1.29, 1.30 using Stretch (instead of Jessie). I guess I will stick to php5.5 / hhvm on Jessie for now.

Thanks! At least the root cause is known now :]

Jdforrester-WMF renamed this task from [REL1_30] some parserTests fail (due to lack/presence of tidy?) to [REL1_30] Some parserTests fail on debian stretch using Tidy, because of a new version of libtidy.Apr 9 2018, 10:28 PM

Eventually I imported the source for tidy 20091223cvs-1.4+deb8u1, build it for Stretch, did some hack and:

$ LD_LIBRARY_PATH=/home/hashar/projects/tidy/hacked php -i|grep Tidy
Tidy support => enabled
libTidy Release => 25 March 2009   # <------------ MAGIC!!

$ LD_LIBRARY_PATH=/home/hashar/projects/tidy/hacked php tests/phpunit/phpunit.php --group Database --filter 'ParserIntegrationTest' --debug-tests
Using PHP 7.0.27-0+deb9u1
PHPUnit 4.8.36 by Sebastian Bergmann and contributors.

OK (1440 tests, 1440 assertions)

With the Stretch tidy version I had: Tests: 1440, Assertions: 1440, Failures: 7.


The process:

mkdir workarea && cd workarea
dget 'http://security.debian.org/pool/updates/main/t/tidy/tidy_20091223cvs-1.4+deb8u1.dsc'
rm -fR tidy-20091223cvs
gbp import-dsc tidy_20091223cvs-1.4+deb8u1.dsc
cd tidy
DIST=stretch gbp buildpackage

Then install and prepare the library path:

sudo dpkg -i /packaging/result/stretch-amd64/libtidy-0.99-0_20091223cvs-1.4+deb8u1_amd64.deb
mkdir hacked && (cd hacked && ln -s /usr/lib/libtidy-0.99.so.0 libtidy.so.5)
LD_LIBRARY_PATH=hacked php -i|grep Tidy
Tidy support => enabled
libTidy Release => 25 March 2009

Which really mean we JUST have to ship the old libtidy and php7.0-tidy can still happily uses it.

Mentioned in SAL (#wikimedia-releng) [2018-04-10T12:38:15Z] <hashar> gerrit: created repo operations/debs/tidy-0.99 , a for of tidy Jessie package | T191771

Change 425256 had a related patch set uploaded (by Hashar; owner: Hashar):
[integration/config@master] Add debian-glue to operations/debs/tidy-0.99

https://gerrit.wikimedia.org/r/425256

Change 425256 merged by jenkins-bot:
[integration/config@master] Add debian-glue to operations/debs/tidy-0.99

https://gerrit.wikimedia.org/r/425256

Change 425257 had a related patch set uploaded (by Hashar; owner: Antoine Musso):
[operations/debs/tidy-0.99@master] Rebuild for Stretch as tidy-0.99

https://gerrit.wikimedia.org/r/425257

So I got a package that port the Jessie tidy package to Stretch. It is a bit rough, but at least for libtidy that seems to work. One can then use it with:

 $ mkdir oldtidy
$ ln -s /usr/lib/libtidy-0.99.so.0 oldtidy/libtidy.so.5
$ LD_LIBRARY_PATH=oldtidy php -i|grep Tidy
Tidy support => enabled
libTidy Release => 25 March 2009

Will look at getting that as a package and upload libtidy on apt.wikimedia.org so we can get it installed in the containers.

Change 425489 had a related patch set uploaded (by Hashar; owner: Hashar):
[integration/config@master] Make debian-glue voting on operations/debs/tidy-0.99

https://gerrit.wikimedia.org/r/425489

On REL1_30 with Stretch:

Tests: 15774, Assertions: 85294, Failures: 7, Skipped: 14, Risky: 5.
                                         ^^^^^

With tidy from Jessie:

Tests: 15774, Assertions: 85294, Skipped: 14, Risky: 5.

\O/

Summary

Context:

  • mediawiki/core parser tests expect output generated by tidy from Jessie ( 20091223cvs-1.4 ).
  • The php7 jobs running in Docker container are based on Stretch which ships tidy 5.2.0.
  • MediaWiki master branch uses Remex instead of tidy and thus the tests pass just fine.
  • The MediaWiki REL* branches are still relying on tidy. Parser tests fail on Stretch accordingly.
  • Installing php-tidy also installs libtidy (5.2.0)

Fix:

  • https://gerrit.wikimedia.org/r/#/c/425257/ creates a libtidy-0.99 package that is coinstallable with libtidy (which is a dependency of php7.0-tidy).
  • Create a directory oldtidy having a symlink libtidy.so.5 -> /usr/lib/libtidy-0.99.so.0
  • Run MediaWiki tests with LD_LOAD_PATH=oldtidy. The tidy php extension loads libtidy-0.99 just fine and tests pass

I could use a review of https://gerrit.wikimedia.org/r/#/c/425257/ then the libtidy-0.99 package to be uploaded to apt.wikimedia.org for Stretch. Most probably under components ci like we did for the php5.5 port to Jessie.

I guess @MoritzMuehlenhoff you will be able to assist since you helped previously for php5.5 and introduced the ci component in apt.wikimedia.org :]

Are you running the tests on Debian stretch? There's a new version of libtidy that causes some tidy parser tests to fail: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=828097#22

But realistically the tidy 5 release is what everyone will be using? Debian jessie goes EOL in about two months for the regular support time frame and Fedora and Ubuntu also have only the newer version. Shouldn't we rather adapt the tests to the current status quo of tidy deployments?

Change 425489 merged by jenkins-bot:
[integration/config@master] Make debian-glue voting on operations/debs/tidy-0.99

https://gerrit.wikimedia.org/r/425489

Are you running the tests on Debian stretch? There's a new version of libtidy that causes some tidy parser tests to fail: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=828097#22

But realistically the tidy 5 release is what everyone will be using? Debian jessie goes EOL in about two months for the regular support time frame and Fedora and Ubuntu also have only the newer version. Shouldn't we rather adapt the tests to the current status quo of tidy deployments?

In MediaWiki we're dumping tidy entirely in favor of RemexHtml, so this is just about old releases (1.27, 1.29, 1.30) - the last of which is supported until June 2019. I'm not sure anyone has actually tested how different tidy-html5 is, but back in August 2015 Tim wrote:

As I said in my original post, my number one problem with Tidy is that
it changes. So I am very happy that it is not in active development.
Switching to a fork that is actively maintained would be much worse.
It would be like the switch from Tidy to the proposed HTML
reserializer web service, except that the pain would be repeated every
time we upgrade our Linux distribution.

The other problem with Tidy is that it is poorly specified and has
only one implementation. Switching to a fork of it doesn't improve the
situation.

So if it's not too much effort to do so, I'd rather keep supporting old tidy in CI for old branches instead of trying to upgrade tidy in older branches of MW.

@MoritzMuehlenhoff as Kunal said, MediaWiki has dropped Tidy entirely and uses RemexHtml.

The only reason we need the old Tidy is for CI on the release branches. Their tests are expecting output generated by the old Tidy that was available on Jessie and has been since roughly 2009. In theory we could write support to detect the Tidy version and use different expected output based on the version, then Tidy is dropped from master so there is no need for it anymore.

It sounds easier to just port/keep the old tidy library.

Change 427623 had a related patch set uploaded (by Hashar; owner: Hashar):
[integration/config@master] docker: add tidy from jessie in quibble-stretch

https://gerrit.wikimedia.org/r/427623

Change 427625 had a related patch set uploaded (by Hashar; owner: Hashar):
[integration/config@master] Bump quibble-stretch to 0.0.8-2

https://gerrit.wikimedia.org/r/427625

Change 427623 merged by jenkins-bot:
[integration/config@master] docker: add tidy from jessie in quibble-stretch

https://gerrit.wikimedia.org/r/427623

Change 427625 merged by jenkins-bot:
[integration/config@master] Bump quibble-stretch to 0.0.8-2

https://gerrit.wikimedia.org/r/427625

Change 425257 merged by Muehlenhoff:
[operations/debs/tidy-0.99@master] Rebuild for Stretch as tidy-0.99

https://gerrit.wikimedia.org/r/425257

Change 427683 had a related patch set uploaded (by Muehlenhoff; owner: Muehlenhoff):
[operations/puppet@production] Add component/ci for stretch-wikimedia

https://gerrit.wikimedia.org/r/427683

Change 427683 merged by Muehlenhoff:
[operations/puppet@production] Add component/ci for stretch-wikimedia

https://gerrit.wikimedia.org/r/427683

Mentioned in SAL (#wikimedia-operations) [2018-04-19T16:50:14Z] <moritzm> uploaded tidy-0.99 to component/ci for apt.wikimedia.org/stretch-wikimedia (T191771)

hashar added a comment.May 8 2018, 7:51 PM

I have kept this task open since the CI Docker images are downloading the .deb package from a temporary place instead of apt.wikimedia.org. Some refactoring of our Dockerfiles have to be made to use component/ci from apt.wikimedia.org/stretch-wikimedia

Krinkle moved this task from Inbox to PHPUnit on the MediaWiki-Core-Tests board.Jun 15 2018, 12:53 PM
dduvall added a subscriber: dduvall.

@hashar, any update? Moving to backlog until we have one.

cscott added a subscriber: cscott.Oct 22 2018, 5:04 PM

There's a lot of backlog to read through but -- yeah, Wikimedia has *always* used their own patched version of tidy. The stock debian libtidy has never exactly matched the WMF version, although it's possible the precise differences weren't previously well-covered by parserTests. All use of libtidy is deprecated and is being removed ( https://gerrit.wikimedia.org/r/467972 ) so it's just a matter of keeping our tests of long-term-supported releases sane by maintaining access to the WMF-patched-version-of-old-tidy.

@hashar, any update? Moving to backlog until we have one.

T191771#4191972 is still the status quo. The issue is fixed, but there's CI tech debt that needs to be cleaned up.

We're using the old libtidy that hashar compiled from https://integration.wikimedia.org/oldtidy/ and it needs to be switched to use the packaged version instead. It should be about 10 minutes of work to rewrite the dockerfile/test it and then like an hour to rebuild allll the docker images and jjb jobs.