Page MenuHomePhabricator

Unclosed tags in parser content causes Vector 2022 skin cause display issues (strikethrough and italic tags) and should be flagged by linter
Closed, ResolvedPublic3 Estimated Story PointsBUG REPORT

Description

NOTE: Blocked on T334527

In a previous edit of English Wiktionary's Wiktionary:Requests for deletion/Non-English, a header alır almaz had an unclosed <s> tag. The closing </s> was in the section contents.

In the Vector 2022 skin, the table of contents has this header with the tag closed as one would hope, but some extra <s> tags are also added after the header's list item, including one around the entire content area (<s><div class="mw-content-container">...</div></s>). So all the text of the page was stricken through and headers with <s> tags were doubly stricken through.

This doesn't happen in Monobook or in the mobile site.

See also this test edit in the sandbox.

It's debatable whether this is a bug as opposed to an expected result of bad HTML in the header. The naive HTML for the alır almaz header and section contents would be invalid (<h2...>...<s>...</h2>...<dl>...<dt>...</s>...</dt>...</dl>), but it is cleaned up by adding more <s> tags. But could Vector 2022 do the HTML cleanup for the table of contents in such a way that an unclosed tag in a header doesn't bleed through across the whole content area? Not sure exactly what that would look like. Not adding more instances of the unclosed tag outside the TOC maybe?

Proposed solution

Developer notes

Lints are essentially calls to recordLint inside the src/Wt2Html/PP/Processors/Linter.php class.

Once you implement code to recognize (and produce) a new lint record, you can test it via the "--linting" command line in Parsoid (bin/parse.php).

Recent commit introducing a linting rule; https://github.com/wikimedia/mediawiki-services-parsoid/commit/13e75de2723460c82067fe7036b3b49a07340535

QA steps

Sign off step

  • Make sure this is a task in the next sprint to make sure these are enabled and displayed in production.

https://phabricator.wikimedia.org/T308398#8848558

QA Results - Beta

ACStatusDetails
1T308398#8848558
2T308398#8848558
3T308398#8848558

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
Jdlrobson edited projects, added Parsoid; removed MediaWiki-Parser.
Jdlrobson updated the task description. (Show Details)
ovasileva set the point value for this task to 3.Apr 20 2023, 5:46 PM

What is the reasoning behind it being hidden by default? Seems like something that ought to be a high priority to fix (more important than removing obsolete html tags IMO).

@jhsoby this is just so that we can give more time to our community liaisons to communicate the change and make sure projects are happy to fix them.

More context in https://phabricator.wikimedia.org/T334527 hope that helps!

Unclosed tags are already flagged as Linter errors. Undesirable rendering should continue to be displayed, not worked around, to motivate editors to fix it. The whole point of the Linter error list is to clean up invalid syntax so that clunky parsing hacks and workarounds can be removed from the MediaWiki software, isn't it? The double-colon error workaround was removed from MediaWiki even before all of the errors had been cleaned up on wiki sites; why is this special case worthy of a new workaround?

@Jonesey95 I totally agree, we already have 'multiple-unclosed-formatting-tags' if we are missing some cases to be flagged we should extend it in that space.

auto fixing will not be applied here, just checking if we cover all cases especially inside headings, as that effects the TOC.

Change 913968 had a related patch set uploaded (by Kimberly Sarabia; author: Kimberly Sarabia):

[mediawiki/services/parsoid@master] Fix unclosed tags logic

https://gerrit.wikimedia.org/r/913968

Change 913968 abandoned by Mabualruz:

[mediawiki/services/parsoid@master] Fix unclosed tags logic

Reason:

No changes are needed

https://gerrit.wikimedia.org/r/913968

After testing and checking with a patch and thanks to @ssastry and @Sbailey, linter already flags this under 'missing-end-tag' Missing end tag category https://en.wikipedia.beta.wmflabs.org/wiki/Special:LintErrors/missing-end-tag . all cases are captured and no further changes are needed.
Worth looking in 'multiple-unclosed-formatting-tags' for other tags but all are captured by linter and logged properly.

This filter was envisioned as a subset of https://en.wikipedia.beta.wmflabs.org/wiki/Special:LintErrors/missing-end-tag - it's more important to flag given it can break article display so the idea would be to make this a higher priority for editors. A missing end tag outside a section heading does not have the same catastrophic consequences as one inside it ! The former is "Low priority" but I'd consider the latter "Unbreak now" priority. Let's sync tomorrow to make sure we have the same goals.

Change 914806 had a related patch set uploaded (by Mabualruz; author: Mabualruz):

[mediawiki/extensions/Linter@master] Unclosed tags in parser

https://gerrit.wikimedia.org/r/914806

Change 914807 had a related patch set uploaded (by Mabualruz; author: Mabualruz):

[mediawiki/services/parsoid@master] Unclosed tags in parser

https://gerrit.wikimedia.org/r/914807

Change 914808 had a related patch set uploaded (by Mabualruz; author: Mabualruz):

[mediawiki/core@master] Unclosed tags in parser

https://gerrit.wikimedia.org/r/914808

The patches will have final touches on Monday and will be ready for QA, Thanks for @Sbailey, and @ssastry. I will heave a pairing time with @ssastry Monday early to finalise the patches.

as of now the new 'missing-end-tag-in-heading' is a super set of 'unclosed-quotes-in-heading', and since it is an hidden category it will be fine for now, but once it is a full criticality defined category we should remove the subset 'unclosed-quotes-in-heading'. and categories should be combined

Change 914806 merged by jenkins-bot:

[mediawiki/extensions/Linter@master] Unclosed tags in article headings

https://gerrit.wikimedia.org/r/914806

Jdlrobson updated the task description. (Show Details)

Note: We are waiting on the content transform team to merge https://gerrit.wikimedia.org/r/c/mediawiki/services/parsoid/+/914807, once that's done we can move this into QA. Moved this from previous sprint so we don't lose track of it. Remaining work is much less than 3 points.a

Change 914807 merged by jenkins-bot:

[mediawiki/services/parsoid@master] Linter: Detect unclosed tags in article headings

https://gerrit.wikimedia.org/r/914807

Jdlrobson added a subscriber: KSarabia-WMF.

FYI QA may be blocked on a deployment. Will let you know when I know more.

Change 914808 abandoned by Mabualruz:

[mediawiki/core@master] Unclosed tags in article headings

Reason:

https://gerrit.wikimedia.org/r/914808

Change 914808 restored by Mabualruz:

[mediawiki/core@master] Unclosed tags in article headings

https://gerrit.wikimedia.org/r/914808

I found out why we have lint error messages in core:
Thanks @Batorsz
https://en.wikipedia.org/wiki/Special:Preferences#mw-prefsection-personal
This page lints the signature added by editors regardless of having the Linting extension it only requires parsiod.
https://gerrit.wikimedia.org/g/mediawiki/core/+/56aeec047a8037a1577cb2fc07bcc36da9e6f824/includes/preferences/SignatureValidator.php#140
which makes having linting messages in core required

The patches are still not on Beta, @Sbailey informed me that content transformation team will be looking into ways to have beta with master branch changes without new composer version/tag for parsoid. And since @Sbailey is out till end of week I guess we should check with @ssastry @cscott and @Arlolra until then.

Change 919160 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):

[mediawiki/vendor@master] Bump parsoid to 0.18.0-a9

https://gerrit.wikimedia.org/r/919160

nray added a subscriber: Edtadros.

Change 919160 merged by jenkins-bot:

[mediawiki/vendor@master] Bump parsoid to 0.18.0-a9

https://gerrit.wikimedia.org/r/919160

Test Result - Beta

Status: ✅ PASS
Environment: beta
OS: macOS Ventura
Browser: Chrome
Device: MBP
Emulated Device:NA

Test Artifact(s):

QA Steps

✅ AC1: Confirm the category doesn't show up on https://en.wikipedia.beta.wmflabs.org/wiki/Special:LintErrors

Screenshot 2023-05-12 at 2.18.22 PM.png (866×1 px, 153 KB)

✅ AC2: Confirm https://en.wikipedia.beta.wmflabs.org/wiki/Special:LintErrors/missing-end-tag-in-heading exists and lists pages with violations.
Screenshot 2023-05-12 at 2.18.39 PM.png (1×1 px, 158 KB)

✅ AC3: Create a new page which has a heading with an unclosed HTML tag e.g. == <em>Foo == and check it shows up on the page.
See "Hooded Skunk" in AC2 above.

This comment was removed by Jdlrobson.

@Sbailey is this code in production yet? I am not seeing any flagged on https://en.wikipedia.org/wiki/Special:LintErrors/missing-end-tag-in-heading so either it's working correctly (since most of these should get fixed right away) or still waiting on the train.

I believe it is riding this weeks train, it was in beta as of last thursday
to allow QA to test. Once group 2 deploys we should start seeing entries.
Shannon

This comment was removed by Sbailey.

LGTM. Follow up work in T336316 will ensure this is verified in production and shipped if ready.

Change 914808 abandoned by Jdlrobson:

[mediawiki/core@master] Unclosed tags in article headings

Reason:

Hi Mo, since https://phabricator.wikimedia.org/T308398 is resolved I'm assuming this is no longer needed. If it is please feel free to restore and open a new Pharicator ticket! Thanks!

https://gerrit.wikimedia.org/r/914808