Page MenuHomePhabricator

Bidirectional text sometimes in wrong order in PDF
Open, HighPublic

Description

Use [[de:Dad (Arabischer Buchstabe)]] as test case. There are multiple issues with wrongly ordered text in the PDF:

Swapped words:

Should be: ‏نظام التشابه‎ / niẓām at-tašābuh / ‚Regel der Ähnlichkeit‘
Is: نظام التشابه / at-tašābuh niẓām / ‚Regel der Ähnlichkeit‘

Swapped letters:

Should be: ‏روادف‎ / rawādif / ‚Nachkömmlinge‘
Is: رواد ف / rawādfi / ‚Nachkömmlinge‘


Version: unspecified
Severity: normal

Details

Reference
bz71869

Event Timeline

bzimport raised the priority of this task from to Needs Triage.Nov 22 2014, 3:59 AM
bzimport added projects: OCG-PDF-renderer, I18n.
bzimport set Reference to bz71869.
Schnark created this task.Oct 9 2014, 8:16 AM
Man77 added a comment.Oct 9 2014, 6:01 PM

In case it's not obvious: These errors occur when one of the templates {{ar}}, {{arF}}, {{arS}} (or similar) of :w:de: is in use. These are in use for Arabic text, transcription (parameter "w"), transliteration (parameter "d") and translation (parameter "b"). The wrongly ordered passages of the PDFs derive from the parameters "w" and "d", whose entries are, just as for "b", always completely ltr.

I tracked this down to lang="ar-Latn", which is used to tag the Latin transcription of the Arabic text:

<span lang="ar-Latn">niẓām at-tašābuh</span>, <span lang="ar-Latn">rawādif</span>

is rendered as

at-tašābuh niẓām, rawādfi

Michael: so you're saying that 'ar-Latn' text should be rendered LTR?

Man77 added a comment.Oct 28 2014, 6:00 PM

Latn stands for Latin script. Latin script is usually written from left to right, even if it is used for transliteration.

Actually, I don't think the lang attribute should *ever* have any influence on the direction of the text. It can be used to select an appropriate font, but only the dir attribute, <bdi> and <bdo> tags, and the unicode-bidi CSS property should have influence on the writing direction.

So I even expect

<span lang="ar">niẓām at-tašābuh</span>, <span lang="ar">rawādif</span>

to render as

niẓām at-tašābuh, rawādif

(and even if you treated lang="ar" as an override on the direction, the result should be

hubāšat-ta māzin, fidāwar

as it would be for <bdo dir="rtl">, but not the strange partially reversed display that currently is produced.)

Hm. You're probably right. The lang->dir implication is left over from before we had proper unicode bidi algorithm support. That can probably be removed now (but I should verify that we have top-level dir attributes in the Parsoid output where needed -- some wikis were adding these via various hacky methods on the outer <html> element IIRC).

cscott triaged this task as High priority.Jul 19 2015, 4:06 PM
cscott set Security to None.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJul 19 2015, 4:06 PM
Man77 awarded a token.Nov 6 2016, 8:43 PM

Is this bug about the MediaWiki PDF export feature ("Download as PDF" link in sidebar), the browser functionality (e.g. Ctrl+P, "Save as PDF"), or something else?

Man77 added a comment.EditedNov 22 2016, 9:45 PM

It is a bug in the MediaWiki export feature.

As already announced in Tech News, OfflineContentGenerator (OCG) will not be used anymore after October 1st, 2017 on Wikimedia sites. OCG will be replaced by Electron. You can read more on mediawiki.org.