Page MenuHomePhabricator

Section titles duplicated (and no section edit links) on pages with quotes " in the title when DiscussionTools is enabled
Closed, ResolvedPublic

Description

Section titles are duplicated (and no section edit links appear) on pages with quotes " in the title when DiscussionTools is enabled.

Reported on-wiki:

Example: https://he.wikipedia.org/wiki/שיחה:פת"ח
(I found it a bit difficult to notice if you don't speak the language, but compare the headings to the names in the table of contents)

All reports so far are from the Hebrew Wikipedia, but I don't think this is specific to that wiki. It looks like pages in Hebrew are affected more often than in other languages, because the quote mark is commonly used for the gershayim character.

This seems to be a bug introduced in T267404, and only affects wikis running 1.36.0-wmf.30 (not wmf.27).

Event Timeline

matmarex created this task.Sat, Feb 13, 6:43 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptSat, Feb 13, 6:43 PM

HTML markup from the affected page:

<h2><span id=".D7.9B.D7.95.D7.97_17"></span><span class="mw-headline" id="כוח_17" data-mw-comment='{"type":"heading","level":0,"id":"h|\u05db\u05d5\u05d7_17","replies":[],"headingLevel":2,"placeholderHeading":false}'><span data-mw-comment-start="h|כוח_17"></span>כוח 17<span data-mw-comment-end="h|כוח_17"></span></span><mw:editsection page='שיחה:פת"ח' section="2">כוח 17</mw:editsection></h2>

The mw:editsection element, which contains the duplicate title, is supposed to be replaced by a section edit link. This does not work because the code doing this replacement (search for EDITSECTION_REGEX in ParserOutput.php) assumes that the attributes will be written with the " characters, but in this case the page attribute uses '.

The different attribute quoting is a side-effect of the process used by DiscussionTools to add our reply links (where we parse the whole page as HTML, and serialize it back). The library we use chooses ' for quoting if the attribute's value contains ".

Change 663996 had a related patch set uploaded (by Bartosz Dziewoński; owner: Bartosz Dziewoński):
[mediawiki/extensions/DiscussionTools@master] CommentFormatter: Fix problems with editsection and quotes

https://gerrit.wikimedia.org/r/663996

The affected pages will need to be purged after the patch is deployed :(

Change 663997 had a related patch set uploaded (by Bartosz Dziewoński; owner: Bartosz Dziewoński):
[mediawiki/core@master] ParserOutput: Match single quotes in EDITSECTION_REGEX

https://gerrit.wikimedia.org/r/663997

This is an alternative patch (only one of them is needed). This has the advantage of not requiring pages to be purged, and is probably a better solution in the long term, but I'd be less comfortable backporting it, because the parser is scary.

Change 663996 merged by jenkins-bot:
[mediawiki/extensions/DiscussionTools@master] CommentFormatter: Fix problems with editsection and quotes

https://gerrit.wikimedia.org/r/663996

The affected pages will need to be purged after the patch is deployed :(

If there is a way to determine the list of affected pages (a SQL query, perhaps?), it's easy to do so via https://www.mediawiki.org/wiki/Manual:PurgePage.php or touch.py from Pywikibot.

Change 664254 had a related patch set uploaded (by Bartosz Dziewoński; owner: Bartosz Dziewoński):
[mediawiki/extensions/DiscussionTools@wmf/1.36.0-wmf.30] CommentFormatter: Fix problems with editsection and quotes

https://gerrit.wikimedia.org/r/664254

If there is a way to determine the list of affected pages (a SQL query, perhaps?), it's easy to do so via https://www.mediawiki.org/wiki/Manual:PurgePage.php or touch.py from Pywikibot.

Yes, roughly all non-main-namespace pages with the character " in the title. 8358 pages on hewiki: select count(*) from page where page_namespace != 0 and page_title like '%"%' (I haven't tried other wikis, but AFAIK most group2 wikis should be smaller).

However, I'm not sure if it's worth the effort. The bug only affects users who enabled the beta feature (we split the parser cache… for exactly this reason). And if we decide that it *is* worth more effort, then I'd prefer to fix it by merging the second patch, which doesn't require purging.

[...]
However, I'm not sure if it's worth the effort. The bug only affects users who enabled the beta feature (we split the parser cache… for exactly this reason). And if we decide that it *is* worth more effort, then I'd prefer to fix it by merging the second patch, which doesn't require purging.

That's entriely up to you. I only wanted to note that it is absolutely possible, and can be done relatively easily. If you think that's not necessary, that's also fine.

Urbanecm triaged this task as Unbreak Now! priority.Mon, Feb 15, 3:56 PM

This is currently marked as blocker => setting priority to UBN.

Change 664254 merged by jenkins-bot:
[mediawiki/extensions/DiscussionTools@wmf/1.36.0-wmf.30] CommentFormatter: Fix problems with editsection and quotes

https://gerrit.wikimedia.org/r/664254

Mentioned in SAL (#wikimedia-operations) [2021-02-16T11:54:14Z] <urbanecm@deploy1001> Synchronized php-1.36.0-wmf.30/extensions/DiscussionTools/includes/CommentFormatter.php: 5f4f516177a355b42b896ee142d66c0c969e20f1: CommentFormatter: Fix problems with editsection and quotes (T274709) (duration: 01m 12s)

Urbanecm closed this task as Resolved.Tue, Feb 16, 11:54 AM

Done

Thanks for backporting.

I purged https://he.wikipedia.org/wiki/שיחה:פת"ח (…?action=purge) and the issue on that page is fixed for me.

Change 663997 abandoned by Bartosz Dziewoński:
[mediawiki/core@master] ParserOutput: Match single quotes in EDITSECTION_REGEX

Reason:
Per C. Scott: "The convention in core AFAIK is to *only* use double-quoted attributes in the HTML generated by the core parser."

https://gerrit.wikimedia.org/r/663997