Page MenuHomePhabricator

Replace HTMLFormatter by Remex
Closed, ResolvedPublic

Description

wikimedia/html-formatter uses loadHTML. It shouldn't. It should use Remex.

Maybe we should just remove the HTMLFormatter framework into Remex and remove uses of the library entirely from core.

Review of current usage: https://codesearch.wmcloud.org/search/?q=\bHtmlFormatter\\HtmlFormatter\b&files=php

Event Timeline

We probably want to use the dodo library here, too.

Change 873766 had a related patch set uploaded (by Umherirrender; author: Bartosz Dziewoński):

[mediawiki/core@master] ApiHelp: Replace use of HtmlFormatter in fixHelpLinks()

https://gerrit.wikimedia.org/r/873766

Change 873766 merged by jenkins-bot:

[mediawiki/core@master] ApiHelp: Replace use of HtmlFormatter in fixHelpLinks()

https://gerrit.wikimedia.org/r/873766

Change #1030548 had a related patch set uploaded (by Func; author: Func):

[mediawiki/extensions/MobileFrontend@master] Convert the RemovableClasses feature into a Transform

https://gerrit.wikimedia.org/r/1030548

Change #1030551 had a related patch set uploaded (by Func; author: Func):

[mediawiki/extensions/MobileFrontend@master] MobileFormatter: Use RemexHtml

https://gerrit.wikimedia.org/r/1030551

Change #1030548 merged by jenkins-bot:

[mediawiki/extensions/MobileFrontend@master] Convert the RemovableClasses feature into a Transform

https://gerrit.wikimedia.org/r/1030548

Change #1030551 abandoned by Bartosz Dziewoński:

[mediawiki/extensions/MobileFrontend@master] MobileFormatter: Use RemexHtml

Reason:

Superseded by https://gerrit.wikimedia.org/r/c/mediawiki/extensions/MobileFrontend/+/1156787

https://gerrit.wikimedia.org/r/1030551

matmarex subscribed.

Review of current usage: https://codesearch.wmcloud.org/search/?q=\bHtmlFormatter\\HtmlFormatter\b&files=php

Can we get an explanation as to why HtmlFormatter should not be used? Is HtmlFormatter deprecated?

The task description is not clear on why this is needed. Knowing that will help Trust and Safety Product Team prioritise this.

HtmlFormatter relies on the HTML parsing and serialization provided by PHP's built-in DOM library, which is from the HTML 4 era, and known to cause a variety of bugs, e.g. T348928, T251434, T396695.

I believe that only reason it hasn't been deprecated is because there is no maintainer (and no one wants to touch if out of fear of becoming the maintainer), see T258964: HtmlFormatter library: Code stewardship review.

It doesn't do much anyway, and what it does is better done using Remex-based helpers provided by Parsoid, so this shouldn't be too difficult, if it's not as deeply integrated into your code as it was in MobileFrontend. I'll do one or two of these as an example.

Change #1169784 had a related patch set uploaded (by Bartosz Dziewoński; author: Bartosz Dziewoński):

[mediawiki/core@master] Use Remex-based helpers provided by Parsoid for tests that parse HTML

https://gerrit.wikimedia.org/r/1169784

I picked the LinkTarget for my second patch, and funnily enough, it also had a recent bug caused by interactions with HtmlFormatter: T397603: TOC not being shown when using LinkTarget

Change #1169787 had a related patch set uploaded (by Bartosz Dziewoński; author: Bartosz Dziewoński):

[mediawiki/extensions/LinkTarget@master] Replace HtmlFormatter with Remex-based helpers provided by Parsoid

https://gerrit.wikimedia.org/r/1169787

Change #1169787 merged by jenkins-bot:

[mediawiki/extensions/LinkTarget@master] Replace HtmlFormatter with Remex-based helpers provided by Parsoid

https://gerrit.wikimedia.org/r/1169787

Change #1169784 merged by jenkins-bot:

[mediawiki/core@master] Use Remex-based helpers provided by Parsoid for tests that parse HTML

https://gerrit.wikimedia.org/r/1169784

Change #1170539 had a related patch set uploaded (by Esanders; author: Esanders):

[mediawiki/extensions/IPInfo@master] SpecialIPInfoTest: Replace HtmlFormatter with Parsoid's Remex-based helpers

https://gerrit.wikimedia.org/r/1170539

Change #1170539 merged by jenkins-bot:

[mediawiki/extensions/IPInfo@master] SpecialIPInfoTest: Replace HtmlFormatter with Parsoid's Remex-based helpers

https://gerrit.wikimedia.org/r/1170539

Change #1170561 had a related patch set uploaded (by Esanders; author: Esanders):

[mediawiki/extensions/TextExtracts@master] ExtractFormatter: Replace HtmlFormatter with Parsoid DOMUtils

https://gerrit.wikimedia.org/r/1170561

Change #1170561 merged by jenkins-bot:

[mediawiki/extensions/TextExtracts@master] ExtractFormatter: Replace HtmlFormatter with Parsoid DOMUtils

https://gerrit.wikimedia.org/r/1170561

Change #1174812 had a related patch set uploaded (by Bartosz Dziewoński; author: Bartosz Dziewoński):

[mediawiki/core@master] WikiTextStructure: Replace HtmlFormatter with Parsoid DOMUtils

https://gerrit.wikimedia.org/r/1174812

There's now no uses left in code search, beyond vendor/core's dependency. Is there a practical way in which we can emit deprecation warnings on access to HtmlFormatter, other than doing a "don't use me" release of the library? Otherwise, we could just declare it deprecated now and drop it in 1.46?

Change #1174812 merged by jenkins-bot:

[mediawiki/core@master] WikiTextStructure: Replace HtmlFormatter with Parsoid DOMUtils

https://gerrit.wikimedia.org/r/1174812

Is there a practical way in which we can emit deprecation warnings on access to HtmlFormatter, other than doing a "don't use me" release of the library?

We can use trigger_error( ..., E_USER_DEPRECATED ); in the constructor: https://codesearch.wmcloud.org/libraries/?q=E_USER_DEPRECATED&files=php

And we can mark the package as abandoned: https://getcomposer.org/doc/04-schema.md#abandoned (wikimedia/remex-html could be the suggested replacement)

Otherwise, we could just declare it deprecated now and drop it in 1.46?

Certainly. In fact, I don't see why we couldn't drop it in 1.45, but we can go slower if you prefer. We should just add a release note in MediaWiki when dropping it.

matmarex claimed this task.

I think we can resolve this, since we've replaced all of its uses. Thanks for contributing, everyone.

I filed a separate task for actually deprecating/removing the library itself: T401028: Deprecate the HtmlFormatter library and drop it from MediaWiki core