Page MenuHomePhabricator

Any Chinese Wiki's projects about "Download as PDF" can not auto change to Simplified Chinese or Traditional Chinese
Open, MediumPublic

Assigned To
None
Authored By
Yuriy_kosygin
Jun 11 2017, 5:25 PM
Referenced Files
F11137742: 广州南站.pdf
Dec 2 2017, 9:20 AM
F11131358: image.png
Dec 2 2017, 12:04 AM
F11131124: image.png
Dec 2 2017, 12:04 AM
F11131453: image.png
Dec 2 2017, 12:04 AM
F11131317: image.png
Dec 2 2017, 12:04 AM
F11131077: image.png
Dec 2 2017, 12:04 AM
F11131284: image.png
Dec 2 2017, 12:04 AM
F11131432: image.png
Dec 2 2017, 12:04 AM
Tokens
"Y So Serious" token, awarded by SD_hehua."Like" token, awarded by Shizhao."Love" token, awarded by Liuxinyu970226.

Description

I found Any Chinese Wiki's projects about "Download as PDF" can not auto change to Simplified Chinese or Traditional Chinese .

ex:
時區(form wikivoyage) use "Download as PDF"(打印/导出) to PDF File, the display for Simplified and Traditional Chinese mix issue text in PDF, and no auto change to Simplified Chinese or Traditional Chinese only one choose.

By the way, The sidebar in any Chinese Wiki's projects is the same problem...

Screenshot.png (362×837 px, 71 KB)

Can you improve this problem?

Reproduction criteria

  • for a number of browsers and operating systems, test whether the article appears in Simlified or Traditional Chinese in:
    • the browser print
    • download as PDF (Electron)
  • provide screenshots of the difference between the two PDFs

Related Objects

StatusSubtypeAssignedTask
StalledNone
In ProgressNone
ResolvedDAlangi_WMF
OpenNone
Resolved mobrovac
ResolvedEevans
ResolvedEevans
ResolvedDzahn
ResolvedEevans
ResolvedEevans
OpenNone
DeclinedNone
ResolvedEevans
Resolvedfgiunchedi
ResolvedEevans
Resolved Pchelolo
OpenNone
OpenNone
OpenNone
Invalid GWicke
Resolvedliangent
Resolvedthiemowmde
OpenNone
Resolvedcscott
Resolvedcscott
Resolved Elitre
Resolvedcscott
Resolvedcscott
Resolvedcscott
Resolvedcscott
Resolvedcscott
OpenNone
DuplicateBUG REPORTNone
Resolvedcscott
OpenNone
OpenNone
OpenNone
OpenNone
ResolvedBUG REPORTJgiannelos
OpenNone
Resolved mobrovac
ResolvedJdforrester-WMF
ResolvedJdforrester-WMF
ResolvedEevans
ResolvedEevans
ResolvedEevans
Resolved mobrovac
Resolvedcscott
Resolved Pchelolo
Resolved Pchelolo
OpenNone

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

As some rumor, The Electron service which is used for generateing of pdf uses the printable mode (like this) , and If the wiki enable LanguageConverter, the printable mode also send the parameter 'variant' to set the language convert (like this). Can the service accept this parameter and use it?

@Yuriy_kosygin It seems that per F8515577

By the way, The sidebar in any Chinese Wiki's projects is the same problem...

is no longer reproduceable? (shows "列印/匯出" in that screenshot)

@Liuxinyu970226 The "Download as PDF" auto change to Simplified Chinese(打印/导出) or Traditional Chinese(列印/匯出) has fixed by @LNDDYL.

At present, only PDF can not auto change Simplified Chinese or Traditional Chinese problems.

@D3r1ck01 Wondering if this is suitable for Google-Code-in-2017 or not?

@D3r1ck01 Wondering if this is suitable for Google-Code-in-2017 or not?

Or @jayvdb ?

@D3r1ck01 Wondering if this is suitable for Google-Code-in-2017 or not?

@@Liuxinyu970226: Would you mentor this task, and edit the task summary to provide way more information for a completely new developer? I don't see how this task is good first task at all currently wihtout more analysis.

@Aklapper Err, I just asked the current mentors that if one of them is interesting this. As no one says yes and I'm lacking time for GCI, I decided to use Community Wishlist instead now.

Probably blocked on T159985 (which is beyond GCI/wishlist scope).

ovasileva triaged this task as Medium priority.Nov 28 2017, 5:58 PM

Could not reproduce this myself. I tried with a few articles and it appears the PDF has a matching character set to the article. I just did a visual check. I can't actually read either variant.

image.png (768×1 px, 575 KB)

image.png (768×1 px, 561 KB)

image.png (768×1 px, 257 KB)

image.png (768×1 px, 441 KB)

image.png (768×1 px, 581 KB)

image.png (768×1 px, 561 KB)

image.png (768×1 px, 305 KB)

image.png (768×1 px, 253 KB)

@ABorbaWMF

image.png (768×1 px, 257 KB)

"旅行话题" (hans), "主要旅行話題", "全部話題列表" (both are hant)... "准备" (hans), "護照·签证·行李" (mixing hant and hans)
Anyway, zh-classicalwiki doesn't have this problem, as they don't use hans, only hant haven't set up the same converter


There's my playground, from the "Guangzhou South Railway Station" (I modified /wiki/ to /zh-tw/):


Actual results: 广州南站,又称新广州站或新客站,广州市民常直呼其为南站...广州南站是部分在建的廣深港高速鐵路...广州南站也是番禺区乃至广州市内其中一个綜合交通樞紐...tl;dr
Expected results: 廣州南站,又稱新廣州站或新客站,廣州市民常直呼其為南站...廣州南站是部分在建的廣深港高速鐵路...廣州南站也是番禺區乃至廣州市内其中一個綜合交通樞紐...tl;dr
Perhaps the filename should also be expected to "廣州南站.pdf", but that is another topic


Although, somethings like -{zh-hans:foo;zh-hant:bar}- or -{zh-cn:foo;zh-tw:bar;zh-hk:foobar}- are however working (Why? T43716 working now?)

I got asked to take a look at this ticket, and while the code is far from what I normally work on, I always like looking at issues dealing with language and all its interesting complexity.

I think @Cwek definitely identified the source of the problem and it looks like @Liuxinyu970226 has added the solution to the 2017 Community Wishlist; the language variant for the language converter is not getting used for whatever reason.

The random-seeming variation comes from the generated PDF using the "raw" text from the article, which for Chinese-language wikis can contain a mix of simplified and traditional characters. The problem extends to any wiki using the language converter.

The TV show Lost is on the main page of Chinese Wikipedia today and it happens to provide an example that's a little easier to see for people who don't read Chinese.

  • Lost, "raw" text. (If you are logged in, your language preferences can override this first example, so you may need to look at it in an incognito/private browser window.) Note that the title is 2 characters, and the character after the second em dash (—) is "個" (noting that it is boxy is detailed enough if you don't read Chinese at all).
  • Lost, zh-cn version. Title is still two characters, but the character after the second em dash looks like an up-arrow (个).
  • Lost, zh-hk version. Title is only one character, and the character after the second em dash is boxy again.

All three versions differ from each other, but if you download the PDF of the page from any of these three pages, you get the same result: the raw text with a two-character title and the boxy character after the second em dash. (I also checked and the PDFs are identical at the byte level.) Similarly, if you try to edit the page from any of the three versions, you get the raw text to edit.

For English speakers, two examples that are easier to parse come from Serbian and Inuktitut.

Serbian has both Latin and Cyrillic alphabets, and most articles are written in Cyrillic on Serbian Wikipedia. However our friend Lost is written in Latin. A random soccer article I found is written in Cyrillic. You can read both in either Latin or Cyrillic, but the PDF reverts to the original alphabet of the article—Latin for Lost, Cyrillic for soccer.

Here's a list of list of communities in Nunavut, Canada from the Inuktitut Wikipedia. The first column has the name in both syllabics and Latin writing systems. Converting the page to either syllabics or Latin gives you the same name twice in the first column. Whatever page you print the PDF from, though, you get one of each, because that's the "raw" version of the page.

Jdlrobson changed the task status from Open to Stalled.Jan 31 2018, 5:44 PM
Jdlrobson subscribed.

Stalled/Blocked on T159985

TheDJ changed the task status from Stalled to Open.Jun 4 2019, 12:38 PM
TheDJ added a project: PDF-Rendering.
TheDJ subscribed.

The stalling/blocking ticket was closed, so I guess this can be 'unstalled'. Needs implementation via T213368: Support language variants in Proton ???

ABorbaWMF subscribed.

Change 896035 had a related patch set uploaded (by Winston Sung; author: Winston Sung):

[mediawiki/extensions/ElectronPdfService@master] ElectronPdfService: Fix language conversion for SpecialDownloadAsPdf

https://gerrit.wikimedia.org/r/896035

The current RESTBase API URL structure does not allow pass the page view language code into it.

(See T213368: Support language variants in Proton .)

Change 897192 had a related patch set uploaded (by Winston Sung; author: Winston Sung):

[mediawiki/services/chromium-render@master] Services: Pass variant parameter to chromium-render

https://gerrit.wikimedia.org/r/897192