Page MenuHomePhabricator

Bidirectional text sometimes in wrong order in PDF
Closed, DeclinedPublic

Description

Use [[de:Dad (Arabischer Buchstabe)]] as test case. There are multiple issues with wrongly ordered text in the PDF:

Swapped words:

Should be: ‏نظام التشابه‎ / niẓām at-tašābuh / ‚Regel der Ähnlichkeit‘
Is: نظام التشابه / at-tašābuh niẓām / ‚Regel der Ähnlichkeit‘

Swapped letters:

Should be: ‏روادف‎ / rawādif / ‚Nachkömmlinge‘
Is: رواد ف / rawādfi / ‚Nachkömmlinge‘


Version: unspecified
Severity: normal

Details

Reference
bz71869

Event Timeline

bzimport raised the priority of this task from to Needs Triage.Nov 22 2014, 3:59 AM
bzimport set Reference to bz71869.

In case it's not obvious: These errors occur when one of the templates {{ar}}, {{arF}}, {{arS}} (or similar) of :w:de: is in use. These are in use for Arabic text, transcription (parameter "w"), transliteration (parameter "d") and translation (parameter "b"). The wrongly ordered passages of the PDFs derive from the parameters "w" and "d", whose entries are, just as for "b", always completely ltr.

I tracked this down to lang="ar-Latn", which is used to tag the Latin transcription of the Arabic text:

<span lang="ar-Latn">niẓām at-tašābuh</span>, <span lang="ar-Latn">rawādif</span>

is rendered as

at-tašābuh niẓām, rawādfi

Michael: so you're saying that 'ar-Latn' text should be rendered LTR?

Latn stands for Latin script. Latin script is usually written from left to right, even if it is used for transliteration.

Actually, I don't think the lang attribute should *ever* have any influence on the direction of the text. It can be used to select an appropriate font, but only the dir attribute, <bdi> and <bdo> tags, and the unicode-bidi CSS property should have influence on the writing direction.

So I even expect

<span lang="ar">niẓām at-tašābuh</span>, <span lang="ar">rawādif</span>

to render as

niẓām at-tašābuh, rawādif

(and even if you treated lang="ar" as an override on the direction, the result should be

hubāšat-ta māzin, fidāwar

as it would be for <bdo dir="rtl">, but not the strange partially reversed display that currently is produced.)

Hm. You're probably right. The lang->dir implication is left over from before we had proper unicode bidi algorithm support. That can probably be removed now (but I should verify that we have top-level dir attributes in the Parsoid output where needed -- some wikis were adding these via various hacky methods on the outer <html> element IIRC).

Is this bug about the MediaWiki PDF export feature ("Download as PDF" link in sidebar), the browser functionality (e.g. Ctrl+P, "Save as PDF"), or something else?

It is a bug in the MediaWiki export feature.

As already announced in Tech News, OfflineContentGenerator (OCG) will not be used anymore after October 1st, 2017 on Wikimedia sites. OCG will be replaced by Electron. You can read more on mediawiki.org.

Aklapper removed cscott as the assignee of this task.

Cannot reproduce a problem anymore with current Proton and Electron-PDFs:
On current https://de.wikipedia.org/wiki/Dād , using "Download as PDF", the Arabic text is rendered in the correct order. Not a problem anymore.

Declining this task as OCG has been dead for years and superseded by Proton and Electron-PDFs on Wikimedia servers.