Page MenuHomePhabricator

Wikisource Export: Underscores show instead of spaces in some titles
Closed, ResolvedPublic3 Estimated Story PointsBUG REPORT

Description

As a Wikisource user, I want to see spaces correctly used between words in the title rather than underscores, so people's first impression of the ebook export is not to see an error.

Background: On the title page of an exported ebook, underscores are shown instead of spaces (see screenshot below).

Books to reproduce problem:

Acceptance Criteria:

  • Restore previous behavior to display spaces between the words in the title of ebook exports rather than underscores

Screenshots:
wsexport-test:

wsexport production:

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptDec 17 2020, 10:33 AM
ifried updated the task description. (Show Details)Tue, Jan 12, 10:58 PM
ifried updated the task description. (Show Details)
ARamirez_WMF set the point value for this task to 3.Tue, Jan 12, 11:56 PM
ARamirez_WMF moved this task from To Be Estimated/Discussed to Estimated on the Community-Tech board.
ARamirez_WMF moved this task from Estimated to Kanban-2020-21-Q3 on the Community-Tech board.
Samwilson added a subscriber: Samwilson.

This seems to be a difference in how Parsoid handles links of the form [[../]]. With the MediaWiki parser (source) we get:

<a href="/wiki/Book_of_Jasher" title="Book of Jasher">Book of Jasher</a>

but with Parsoid (source), it's:

<a rel="mw:WikiLink" href="./Book_of_Jasher" title="Book of Jasher">Book_of_Jasher</a>

Note the underscores in the text part.

We could fix this in our tool, but I do wonder if it's intentional in Parsoid.

@ssastry do you have any insight into the above?

I'll raise this in our next weekly meeting and one of us will follow up.

cscott added a subscriber: cscott.Tue, Jan 19, 5:53 PM

What's the wikitext source? For [[Chapter 30]] Parsoid should definitely be outputting <a rel="mw:WikiLink" href="./Book_of_Jasher" title="Book of Jasher">Book of Jasher</a> and be totally compatible with the core parser output:

$ echo "[[Chapter 30]]" | php bin/parse.php  --normalize

<p><a href="Chapter_30" title="Chapter 30">Chapter 30</a></p>

The title output should be consistent between the two parsers as well. The problem seems to be in the export tool, not anything in Parsoid?

What's the wikitext source?

It's listed above. This is reproducible with,

echo "[[../]]" | php bin/parse.php --pageName "Book of Jasher/Chapter 30" --domain "en.wikisource.org"
<p data-parsoid='{"dsr":[0,7,0,0]}'><a rel="mw:WikiLink" href="./Book_of_Jasher" title="Book of Jasher" data-parsoid='{"stx":"simple","a":{"href":"./Book_of_Jasher"},"sa":{"href":"../"},"dsr":[0,7,2,2]}'>Book_of_Jasher</a></p>

Change 657201 had a related patch set uploaded (by Arlolra; owner: Arlolra):
[mediawiki/services/parsoid@master] Use prefixed text for content of links up the path

https://gerrit.wikimedia.org/r/657201

Change 657201 merged by jenkins-bot:
[mediawiki/services/parsoid@master] Use prefixed text for content of links up the path

https://gerrit.wikimedia.org/r/657201

Arlolra closed this task as Resolved.Wed, Jan 20, 7:38 PM
Arlolra claimed this task.