Page MenuHomePhabricator

ws-export is adding unwanted line from the header template module in the exported Epub
Closed, ResolvedPublicBUG REPORT

Description

While using Module:Header template in Bangla Wikisource or similar template in French Wikisource, the exported Epubs are having the following issue.

For ভানুসিংহের পত্রাবলী from Bangla Wikisource, the title page of in the Epub file has this line added at the top.

<link itemprop='mainEntityOfPage' href='https://bn.wikisource.org/wiki/%E0%A6%AD%E0%A6%BE%E0%A6%A8%E0%A7%81%E0%A6%B8%E0%A6%BF%E0%A6%82%E0%A6%B9%E0%A7%87%E0%A6%B0_%E0%A6%AA%E0%A6%A4%E0%A7%8D%E0%A6%B0%E0%A6%BE%E0%A6%AC%E0%A6%B2%E0%A7%80' /><meta itemprop='inLanguage' content='bn' />

This line is coming from the following code from the module

container:tag('link')
    :attr('itemprop', 'mainEntityOfPage')
    :attr('href', page:fullUrl(nil, 'canonical'))
container:tag('meta')
    :attr('itemprop', 'inLanguage')
    :attr('content', 'bn')

This issue was noticed very recently, it was not there a day ago (most probably).

Screenshot from 2025-11-29 20-49-23.png (963×595 px, 96 KB)

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Same problem on books exported from French Wikisource. Some tags are urlencoded and therefore not recognised.

@cscott I know you that y'all on the Parsoid team did a bunch of work on ProofreadPage and the <pages /> tag recently, this might be a regression that made it through ?

Soda removed a subscriber: Bodhisattwa.
Soda added a subscriber: Bodhisattwa.

This looks similar to this one: https://phabricator.wikimedia.org/T408915
I've already added some comments while investigating it here: https://phabricator.wikimedia.org/T408915#11331410

Hello, There is the same problem while exporting in "pdf" in French Wikisource, for example after exporting the book : https://fr.wikisource.org/wiki/Le_Ruban_(Feydeau)
I had this morning on the first page :
<link itemprop='mainEntityOfPage'
href='https://fr.wikisource.org/wiki/Le_Ruban_(Feydeau)'
/><meta itemprop='inLanguage' content='fr' /><meta
itemprop='http://purl.org/library/placeOfPublication'
content='Paris' /><link itemprop='mainEntityOfPage'
href='https://fr.wikisource.org/wiki/Fichier:Feydeau_-
_Th%C3%A9%C3%A2tre_complet,_volume_8,_1948.djvu
' /><meta itemprop='width' content='546' /><meta
itemprop='height' content='857' /><meta
itemprop='fileFormat' content='image/vnd.djvu' />

As this is affecting reading experience on Wikisource reader app, tagging to track the updates there

After many days beeing down, WS Export is running again... But we have still the same problem on some pages at the begining of the files exported.

For exemple, when exporting in "pdf" the book : https://fr.wikisource.org/wiki/Le_Ruban_(Feydeau)

We have on the third page oh the export :
<link itemprop='mainEntityOfPage'
href='https://fr.wikisource.org/wiki/Le_Ruban_(Feydeau)'
/><meta itemprop='inLanguage' content='fr' /><meta
itemprop='http://purl.org/library/placeOfPublication'
content='Paris' /><link itemprop='mainEntityOfPage'
href='https://fr.wikisource.org/wiki/Fichier:Feydeau_-
_Th%C3%A9%C3%A2tre_complet,_volume_8,_1948.djvu
' /><meta itemprop='width' content='546' /><meta
itemprop='height' content='857' /><meta
itemprop='fileFormat' content='image/vnd.djvu' />

Bodhisattwa claimed this task.

Apparently the issue is gone because of this reversal - https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/1225613 . If the issue reappears at some point in future, will reopen again.