Page MenuHomePhabricator

Multiple comments on a single line are interpreted as a blank line
Closed, ResolvedPublic

Description

When writing a single comment on a line this line is correctly ignored.
When writing two comments on a single line this line is not ignored but interpreted as a blank line.

See this page for an example that illustrates the issue:
http://en.wikipedia.org/wiki/User:Patrick87/comments

It's not a big problem and there should be only few cases when one actually writes two separate comments on a single line, however formatting shouldn't change depending on if there are only one or two comments on the line.


Version: 1.21.x
Severity: minor

Details

Reference
bz41756

Event Timeline

bzimport raised the priority of this task from to Low.
bzimport set Reference to bz41756.

Another test case:

*a
<!-- x -->
*b
<!-- x --> <!-- y --> <!-- z -->
*c

The PHP parser treats 'a' and 'b' as part of the same list, but item 'c' is treated as a completely different list.

There are other examples of this sort in the parserTests. It's becoming a source of diffs between PHP and Parsoid.

Change 77988 had a related patch set uploaded by Cscott:
Preprocessor: Don't treat a line containing multiple comments as a blank line.

https://gerrit.wikimedia.org/r/77988

Change 78248 had a related patch set uploaded by Cscott:
Add '-m' option to dumpGrepper; add patterns for bug 41756.

https://gerrit.wikimedia.org/r/78248

Change 78248 merged by jenkins-bot:
Add '-m' option to dumpGrepper; add patterns for bug 41756.

https://gerrit.wikimedia.org/r/78248

cscott added a comment.Aug 8 2013, 6:11 PM

subbu notes that parsoid accepts both tabs and spaces surrounding the comments. PHP accepts only spaces. Is it worth tweaking my patch to allow PHP to accept tabs as well? I don't think it will make any/much difference to content, but it would be nice to converge the parsers.

I've grepped through the 20130708 enwiki dump looking to see how many pages this change would affect. I found only 414 pages in the article namespace that are affected -- I put the full list at http://en.wikipedia.org/wiki/User:Cscott/bug41756

There are an additional 1,913 articles in the File: Wikimedia: or Portal: namespace which have lines with more than one space-separated comment. These appear to be mostly bot-generated and mostly harmless. I've put this list on the above page as well.

Change 77988 merged by jenkins-bot:
Preprocessor: Don't treat a line containing multiple comments as a blank line.

https://gerrit.wikimedia.org/r/77988

Verified fixed in beta and test.