Page MenuHomePhabricator

TextExtracts renders single and double line breaks identically in plaintext
Open, Needs TriagePublic

Description

TextExtracts renders both single and double line breaks in wikitext as "\n"

This is problematic because single line breaks in wikitext render in the browser as a space, while double line breaks in wikitext render in browser as a new line. To fix this problem, I would like for the API to recognize a single line break in the wikitext as a space rather than a newline.

API call of sandbox: https://en.wikipedia.org/w/api.php?action=query&prop=extracts&explaintext&redirects&format=json&titles=User:Sam_at_Megaputer/sandbox
Actual sandbox: https://en.wikipedia.org/wiki/User:Sam_at_Megaputer/sandbox

wikicode:

False line break
False line break
False line break
False line break
False line break

True line break

True line break

True line break

True line break

True line break

Plaintext returns:

False line break\nFalse line break\nFalse line break\nFalse line break\nFalse line break\nTrue line break\nTrue line break\nTrue line break\nTrue line break\nTrue line break"

Plaintext expected:

False line break False line break False line break False line break False line break\nTrue line break\nTrue line break\nTrue line break\nTrue line break\nTrue line break"

Event Timeline

TheDJ renamed this task from TextExtracts renders single and double line breaks identically to TextExtracts renders single and double line breaks identically in plaintext.Apr 12 2021, 9:27 AM
TheDJ updated the task description. (Show Details)

I do think that this is something that should be fixed, as the generated output does not reflect the intent of the author. Shouldn't be too hard either.

You'd probably have to add a final transformation to getText()
https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/extensions/TextExtracts/+/refs/heads/master/includes/ExtractFormatter.php#47

To replace single line endings with a space. Something like a pattern as: /\n(?!\n)/ /