Page MenuHomePhabricator

ProofreadPage: double newline after header can be a problem
Open, Needs TriagePublic

Description

In the ProofreadPage page namespace, the "header" and "body" are joined with a double newline \n\n before being fed to the parser:

		$wikitextContent = new WikitextContent(
			$this->header->getText() . "\n\n" . $this->body->getText() .
				$this->footer->getText()
		);
		$parserOutput = $wikitextContent->getParserOutput( $title, $revId, $options, $generateHtml );

https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/extensions/ProofreadPage/+/refs/heads/master/includes/Page/PageContent.php#280

This is presumably intended to allow this:

HeaderBody
Header textBody text

to enter the parser like this:

Header text

Body text

and come out as two paragraphs.

However, when the header contains a list item, this is wrong:

HeaderBody
* Item Foo** Sub item bar

which enters the parser like this:

* Item Foo

** Sub item bar

rather than:

* Item Foo
** Sub item bar

This causes mis-rendering for pages like this, which then has no way to keep the header and body part of the same list.

I am unsure if a straight change from \n\n to \n would cause major breakage, as it's very rare, at least at enWS for a header to end with inline content. They normally contain block content, but this is not guaranteed.

An alternative (if ugly) solution is a magic word or similar to suppress one of the \ns in the cases where it causes an issue.

Event Timeline

This was discussed on a Wikisource Discord.

One of the considerations I raised was the issue of the body content starting with {{template}}, as I wasn't clear if these were expanded before ProofreadPage saw them.

Or a behavior switch? (https://www.mediawiki.org/wiki/Help:Magic_words#Behavior_switches)

__NOBR__

to remove a 'soft-newline' where it would otherwise be generated?

That's what I mean by a magic word, but it would only affect that exact newline in the ProofreadPage extension, otherwise it would also mess with other line break handling which would be really confusing.

This also interferes with table headers in the Page NS headers:

https://en.wikisource.org/wiki/Page:Showell's_Dictionary_of_Birmingham.djvu/69

While the raw Wikitext looks like

<noinclude>
{| {{ts|mc|w40}} 
|+
|
! Miles.</noinclude>{{nopt}}
|-
|Tamworth
|18

What the parser actually receives in the Page NS is

<noinclude>
{| {{ts|mc|w40}} 
|+
|
! Miles.

{{nopt}}
|-
|Tamworth
|18

so you can't avoid the gap:

2021-05-04_092727_219x101_screenshot.png (101×219 px, 3 KB)