Page MenuHomePhabricator

Servers using tidy-html5 are rendering pages differently, especially with <bdi>
Closed, DeclinedPublic

Description

See https://meta.wikimedia.org/wiki/User:SSastry_(WMF)/Sandbox for the full explanation and notes I've gathered.

Via https://en.wikipedia.org/wiki/Wikipedia:Village_pump_(technical)#bdi_tags_in_wikilinks

Wikimedia's servers were recently upgraded from Debian Jessie to Debian Stretch, which means that tidy was also upgraded to the html5 version, potentially changing rendering.

Event Timeline

When I send requests to mwdebug1002 via the firefox plugin, I cannot reproduce it, the output always renders correctly. I am now going to dump all instances of good and bad html from that page and compare the html to see if I can narrow this down to specific servers.

From IRC:

<legoktm> subbu:  bar on the same line is hosts running Debian Jessie, bar on a different line is hosts running Debian Stretch. I only checked 4 hosts, 2 of each
<subbu> legoktm, do jessie & stretch come with different versions of tidy? or something else off with the config?
<legoktm> they do have different versions of libtidy, yes

Tidy is the cuase of the problem -- it breaks up the link at the <bdi> tag position

[subbu@earth:~/work/wmf/mediawiki]  cat /tmp/vpt.wt
[http://google.com Foo <bdi lang="fr">bar</bdi>]
[subbu@earth:~/work/wmf/mediawiki] php maintenance/parse.php --tidy < /tmp/vpt.wt 
parse.php: warning: reading wikitext from STDIN. Press CTRL+D to parse.

<div class="mw-parser-output"><p><a rel="nofollow" class="external text" href="http://google.com">Foo</a></p>
<bdi lang="fr">bar</bdi>

</div>

So, the mystery at this point is down to different Tidy versions on different servers -- @Legoktm will confirm that later on.

ssastry renamed this task from Strange non-deterministic parse+Tidy output to Different versions of Tidy installed on different production servers?.Apr 30 2018, 7:24 PM

Not sure if this is related, but we deployed <mapframe> to enwiki today, and we've noticed that when we edit pages to add that tag, sometimes it'll render as plaint text, then when we purge it, it does render correctly.

I'm running a script now to print the value of $wgKartographerEnableMapFrame on every server, but I've already noticed that it's false on mw1262 (and true on the other boxes I've checked so far).

Jdforrester-WMF subscribed.

Tagging in Release Engineering in case this is something they need to know about.

The following hosts have the wrong config:

mw1262.eqiad.wmnet: bool(false)
mw1273.eqiad.wmnet: bool(false)
mw1272.eqiad.wmnet: bool(false)
mw1322.eqiad.wmnet: bool(false)
mw1285.eqiad.wmnet: bool(false)
mw1284.eqiad.wmnet: bool(false)
mw1233.eqiad.wmnet: bool(false)
mw1316.eqiad.wmnet: bool(false)
mw2137.codfw.wmnet: bool(false)
mw2147.codfw.wmnet: bool(false)

Let's see if this changes after I sync InitialiseSettings.php as part of the SWAT.

My sync appears to have fixed that, the config var is now true everywhere.

The original problem in the description persists. For example, mw1250, mw1255, mw1328, mw1329, mw1330, mw1333 (among possibly others) generate "broken Tidy output".

ssastry triaged this task as High priority.May 1 2018, 2:09 PM

Not sure who can look into this .. Release Engineering or MediaWiki Platform, but, could someone investigate this?

ssastry renamed this task from Different versions of Tidy installed on different production servers? to Different production servers tidy output differently.May 1 2018, 2:16 PM
ssastry@mw1269:~$ tidy -v
HTML Tidy for Linux version 5.2.0
...
...
ssastry@mw1330:~$ tidy -v
HTML Tidy for Linux released on 25 March 2009

The original problem in the description persists. For example, mw1250, mw1255, mw1328, mw1329, mw1330, mw1333 (among possibly others) generate "broken Tidy output".

Are you sure about those? All of those seem to be giving the "non-broken" output, while e.g. mw1269 is giving the "broken" output.

Anyway, I can think of four possible fixes for this:

  • Figure out some change to includes/tidy/tidy.conf that makes the tidy version 5.2.0 from stretch not treat <bdi> as a block tag, or whatever it is that it's doing. You might do this yourself.
  • Add a hack as is done for meta, link, and style tags. rMW83b798bbabbe: Hide <style> tags from Tidy could serve as an example.
  • Recompile libtidy 20091223cvs-1.4+deb8u1 for stretch, and recompile the HHVM tidy module against that version. Hope the API didn't change. That'd probably be something for SRE.
  • Wait for everything to be converted to Remex.

The original problem in the description persists. For example, mw1250, mw1255, mw1328, mw1329, mw1330, mw1333 (among possibly others) generate "broken Tidy output".

Are you sure about those? All of those seem to be giving the "non-broken" output, while e.g. mw1269 is giving the "broken" output.

When I looked at view source and looked at the server string, I got these server names ... and on these servers <bdi> tag isn't split up. So, I guess depends on what pre-mw-1.31 versions did with these tags.

But, my concerns is not about what Tidy should or shouldn't do with <bdi> tags -- after all, this is only for 2 months and we get Remex and this will go away.

My concern is about different servers doing different tidying and what the ramifications are, i.e. will editors now not be able to fix linter errors because the tidying output varies depending on which server they hit?

So, I am happy with #4 below if we can determine that this effect is limited to <bdi> tags (or limited to whatever small set of scenarios) .. since that is not as much of a big deal.

Anyway, I can think of four possible fixes for this:

  • Figure out some change to includes/tidy/tidy.conf that makes the tidy version 5.2.0 from stretch not treat <bdi> as a block tag, or whatever it is that it's doing. You might do this yourself.
  • Add a hack as is done for meta, link, and style tags. rMW83b798bbabbe: Hide <style> tags from Tidy could serve as an example.
  • Recompile libtidy 20091223cvs-1.4+deb8u1 for stretch, and recompile the HHVM tidy module against that version. Hope the API didn't change. That'd probably be something for SRE.
  • Wait for everything to be converted to Remex.

When I looked at view source and looked at the server string, I got these server names ... and on these servers <bdi> tag isn't split up. So, I guess depends on what pre-mw-1.31 versions did with these tags.

There's nothing in MW 1.31 that changed behavior here. The change is that appservers are being updated to stretch, which comes with the new version of Tidy. The "not splitting up" is the historical behavior.

My concern is about different servers doing different tidying and what the ramifications are, i.e. will editors now not be able to fix linter errors because the tidying output varies depending on which server they hit?

Possibly. Until we switched the default to Remex, I had 8 parser test failures when running tests locally that I knew to ignore because they were due to the new Tidy version. If I recall correctly they were mostly something with <pre> tags, something like

<pre>foo</pre>

versus

<pre>
foo
</pre>

Yes, we had a task a couple of weeks ago from @hashar about this – MediaWiki does not support the version of libtidy that by default ships with Stretch (as you say, several of the parser tests fail), and given that we're EOLing Tidy support we weren't going to work around that. Production definitely shouldn't be using the new version, but instead the forward-port of the old version that Antoine created so the REL1_31 tests can pass.

Jdforrester-WMF renamed this task from Different production servers tidy output differently to Different production servers have different versions of tidy installed, resulting in varying output.May 1 2018, 3:22 PM
Jdforrester-WMF added a project: SRE.

Until we switched the default to Remex, I had 8 parser test failures when running tests locally that I knew to ignore because they were due to the new Tidy version. If I recall correctly they were mostly something with <pre> tags, something like

Could you maybe generate that list of failing tests, if simple to do? If they are all <pre> related (and edge cases like this <bdi> tag), it doesn't affect the linter right now since pre tags aren't involved there, and so the urgency is less severe.

Production definitely shouldn't be using the new version, but instead the forward-port of the old version that Antoine created so the REL1_31 tests can pass.

This would be the ideal resolution of this task.

I've created some test packages at https://people.wikimedia.org/~jmm/tidy/ if anyone wants to confirm/test, but that would need quite some followup work to bring it properly into shape for production deployment (we can't reuse the approach Antoine used for CI for our setup in production) and if keeping the status quo in favour of remex trickling in is a viable option, I'd be in favour of that.

Also, I'm wondering what the current user-visible impact is; when the task were filed when the stretch upgrade of the app servers was in progress (it's around 300 servers so it gets spread out over several weeks), so editors could have been running into sitations were the page preview might have been rendered on a jessie system while the actual edit was routed to a stretch-based app server. At least that scenario is now moot since all application servers are migrated (API servers and job runners in our active DC are being completed by tomorrow).

Sorry for the late followup, I only became aware of this task when legoktm pinged me on IRC today.

I've created some test packages at https://people.wikimedia.org/~jmm/tidy/ if anyone wants to confirm/test, but that would need quite some followup work to bring it properly into shape for production deployment (we can't reuse the approach Antoine used for CI for our setup in production) and if keeping the status quo in favour of remex trickling in is a viable option, I'd be in favour of that.

Also, I'm wondering what the current user-visible impact is; when the task were filed when the stretch upgrade of the app servers was in progress (it's around 300 servers so it gets spread out over several weeks), so editors could have been running into sitations were the page preview might have been rendered on a jessie system while the actual edit was routed to a stretch-based app server. At least that scenario is now moot since all application servers are migrated (API servers and job runners in our active DC are being completed by tomorrow).

Sorry for the late followup, I only became aware of this task when legoktm pinged me on IRC today.

If all servers are using the same version of Tidy at this point, I can live with the newer version of Tidy till we let Remex replace it by end of June.

@Legoktm, the platform team should weigh in on this as well.

Ack, at this point only four job runners in our active data centre are using jessie (and they'll be updated by tomorrow).

If all servers are using the same version of Tidy at this point, I can live with the newer version of Tidy till we let Remex replace it by end of June.

@Legoktm, the platform team should weigh in on this as well.

I think that's fine as well.

T191771 is MediaWiki parser tests failing under CI when running under Stretch. The expected output matches tidy 25 March 2009. That is for REL1_30 and earlier, master should no more rely on Tidy but on RemexHtml. But maybe on production the transition to RemexHtml hasn't been made yet?

I did a very rough forward port of tidy from Jessie to Stretch which is coinstallable ( operations/debs/tidy-0.99.git ), but that is probably not suitable for production usage. I only cared about having mediawiki parser tests passing. The package at least has been kindly uploaded on apt.wikimedia.org under stretch-wikimedia component/ci .

Legoktm renamed this task from Different production servers have different versions of tidy installed, resulting in varying output to Servers using tidy-html5 are rendering pages differently, especially with <bdi>.May 9 2018, 8:18 PM
Legoktm changed the task status from Duplicate to Declined.
Legoktm updated the task description. (Show Details)
Legoktm added subscribers: De728631, zhuyifei1999, Josve05a.

Sorry, I duped the wrong way.

So what makes this declined? The merged task clearly displays some broken 'tiding'. If this task is about pages get rendered differently and it is now consistent, then this task is resolved. The merged task is about the 'consistent' rendering being simply 'incorrect'.

All servers are running stretch, so if there are still inconsistencies I suspect it's due to the parser cache having used older tidy. This task was primarily focused on the rendering difference between old tidy and tidy-html5.

The decline was mostly about switching back to oldtidy, which won't be happening. I think patches would still be accepted for the first two bullets of T193414#4171065 (and we could re-open the task as well).

This has been fixed upstream https://github.com/htacg/tidy-html5/issues/616 in Oct 2017 ... but not sure if it has made it into the deb package available on stretch. @MoritzMuehlenhoff any idea how easy / difficult it is to get this newer version?

@zhuyifei1999 Tidy is being removed completely in 7 weeks time (see T175706: Progressively switch Wikimedia wikis from Tidy to RemexHTML) which doesn't have this bug. So, ideally, we would simply use RemexHtml instead of messing around with Tidy versions. See https://commons.wikimedia.org/wiki/Commons:Village_pump#HTML_errors_here as well.

Vvjjkkii renamed this task from Servers using tidy-html5 are rendering pages differently, especially with <bdi> to pydaaaaaaa.Jul 1 2018, 1:13 AM
Vvjjkkii updated the task description. (Show Details)
Vvjjkkii removed a subscriber: Aklapper.
Yann renamed this task from pydaaaaaaa to Servers using tidy-html5 are rendering pages differently, especially with <bdi>.Jul 1 2018, 12:22 PM
Yann updated the task description. (Show Details)
Yann added a subscriber: Aklapper.

Now that we've stopped using Tidy in production, this no longer matters (though we should drop the library from production, but that's for a different task). Declaring this Declined.