Page MenuHomePhabricator

Wikisource Export: Underscores show instead of spaces in some titles
Closed, ResolvedPublic3 Estimated Story PointsBUG REPORT

Description

As a Wikisource user, I want to see spaces correctly used between words in the title rather than underscores, so people's first impression of the ebook export is not to see an error.

Background: On the title page of an exported ebook, underscores are shown instead of spaces (see screenshot below).

Books to reproduce problem:

Acceptance Criteria:

  • Restore previous behavior to display spaces between the words in the title of ebook exports rather than underscores

Screenshots:
wsexport-test:

underscores.png (235×1 px, 29 KB)

wsexport production:

spaces.png (250×1 px, 32 KB)

Event Timeline

ifried updated the task description. (Show Details)
ARamirez_WMF set the point value for this task to 3.Jan 12 2021, 11:56 PM
ARamirez_WMF moved this task from To Be Estimated/Discussed to Estimated on the Community-Tech board.
ARamirez_WMF moved this task from Estimated to Kanban-2020-21-Q3 on the Community-Tech board.
Samwilson added a subscriber: Samwilson.

This seems to be a difference in how Parsoid handles links of the form [[../]]. With the MediaWiki parser (source) we get:

<a href="/wiki/Book_of_Jasher" title="Book of Jasher">Book of Jasher</a>

but with Parsoid (source), it's:

<a rel="mw:WikiLink" href="./Book_of_Jasher" title="Book of Jasher">Book_of_Jasher</a>

Note the underscores in the text part.

We could fix this in our tool, but I do wonder if it's intentional in Parsoid.

@ssastry do you have any insight into the above?

I'll raise this in our next weekly meeting and one of us will follow up.

What's the wikitext source? For [[Chapter 30]] Parsoid should definitely be outputting <a rel="mw:WikiLink" href="./Book_of_Jasher" title="Book of Jasher">Book of Jasher</a> and be totally compatible with the core parser output:

$ echo "[[Chapter 30]]" | php bin/parse.php  --normalize

<p><a href="Chapter_30" title="Chapter 30">Chapter 30</a></p>

The title output should be consistent between the two parsers as well. The problem seems to be in the export tool, not anything in Parsoid?

What's the wikitext source?

It's listed above. This is reproducible with,

echo "[[../]]" | php bin/parse.php --pageName "Book of Jasher/Chapter 30" --domain "en.wikisource.org"
<p data-parsoid='{"dsr":[0,7,0,0]}'><a rel="mw:WikiLink" href="./Book_of_Jasher" title="Book of Jasher" data-parsoid='{"stx":"simple","a":{"href":"./Book_of_Jasher"},"sa":{"href":"../"},"dsr":[0,7,2,2]}'>Book_of_Jasher</a></p>

Change 657201 had a related patch set uploaded (by Arlolra; owner: Arlolra):
[mediawiki/services/parsoid@master] Use prefixed text for content of links up the path

https://gerrit.wikimedia.org/r/657201

Change 657201 merged by jenkins-bot:
[mediawiki/services/parsoid@master] Use prefixed text for content of links up the path

https://gerrit.wikimedia.org/r/657201

Arlolra claimed this task.

Change 658462 had a related patch set uploaded (by C. Scott Ananian; owner: C. Scott Ananian):
[mediawiki/vendor@master] Bump wikimedia/parsoid to 0.13.0-a23

https://gerrit.wikimedia.org/r/658462

Change 658462 merged by jenkins-bot:
[mediawiki/vendor@master] Bump wikimedia/parsoid to 0.13.0-a23

https://gerrit.wikimedia.org/r/658462