Page MenuHomePhabricator

Any Chinese Wiki's projects about "Download as PDF" can not auto change to Simplified Chinese or Traditional Chinese
Open, NormalPublic

Description

I found Any Chinese Wiki's projects about "Download as PDF" can not auto change to Simplified Chinese or Traditional Chinese .

ex:
時區(form wikivoyage) use "Download as PDF"(打印/导出) to PDF File, the display for Simplified and Traditional Chinese mix issue text in PDF, and no auto change to Simplified Chinese or Traditional Chinese only one choose.

By the way, The sidebar in any Chinese Wiki's projects is the same problem...

Can you improve this problem?

Reproduction criteria

  • for a number of browsers and operating systems, test whether the article appears in Simlified or Traditional Chinese in:
    • the browser print
    • download as PDF (Electron)
  • provide screenshots of the difference between the two PDFs

Related Objects

StatusAssignedTask
OpenABorbaWMF
Resolvedmobrovac
ResolvedEevans
ResolvedEevans
ResolvedDzahn
ResolvedEevans
ResolvedEevans
OpenNone
DeclinedNone
ResolvedEevans
Resolvedfgiunchedi
ResolvedEevans
ResolvedPchelolo
Opencscott
OpenNone
Opencscott
Invalid GWicke
Resolvedliangent
Resolvedthiemowmde
OpenNone
Resolvedcscott
Resolvedcscott
ResolvedElitre
Resolvedcscott
Resolvedcscott
Resolvedcscott
Resolvedcscott
Resolvedcscott
Opencscott
Resolvedcscott
Opencscott
Opencscott
Opencscott
Resolvedmobrovac
ResolvedJdforrester-WMF
ResolvedJdforrester-WMF
ResolvedEevans
ResolvedEevans
ResolvedEevans
Resolvedmobrovac
Resolvedcscott
ResolvedPchelolo
ResolvedPchelolo
OpenNone

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJun 11 2017, 5:25 PM
Cwek added a subscriber: Cwek.Jun 16 2017, 2:20 AM

As some rumor, The Electron service which is used for generateing of pdf uses the printable mode (like this) , and If the wiki enable LanguageConverter, the printable mode also send the parameter 'variant' to set the language convert (like this). Can the service accept this parameter and use it?

@Cwek The 'variant' is a great !

LNDDYL added a subscriber: LNDDYL.Jun 18 2017, 2:04 PM
ovasileva moved this task from Incoming to 2014-15 Q4 on the Readers-Web-Backlog board.
ovasileva added a subscriber: ovasileva.

@Yuriy_kosygin It seems that per F8515577

By the way, The sidebar in any Chinese Wiki's projects is the same problem...

is no longer reproduceable? (shows "列印/匯出" in that screenshot)

@Liuxinyu970226 The "Download as PDF" auto change to Simplified Chinese(打印/导出) or Traditional Chinese(列印/匯出) has fixed by @LNDDYL.

At present, only PDF can not auto change Simplified Chinese or Traditional Chinese problems.

@D3r1ck01 Wondering if this is suitable for Google-Code-in-2017 or not?

@D3r1ck01 Wondering if this is suitable for Google-Code-in-2017 or not?

Or @jayvdb ?

@D3r1ck01 Wondering if this is suitable for Google-Code-in-2017 or not?

@@Liuxinyu970226: Would you mentor this task, and edit the task summary to provide way more information for a completely new developer? I don't see how this task is good first bug at all currently wihtout more analysis.

@Aklapper Err, I just asked the current mentors that if one of them is interesting this. As no one says yes and I'm lacking time for GCI, I decided to use Community Wishlist instead now.

Tgr added a subscriber: Tgr.Nov 20 2017, 1:15 AM

Probably blocked on T159985 (which is beyond GCI/wishlist scope).

ovasileva triaged this task as Normal priority.Nov 28 2017, 5:58 PM
ovasileva updated the task description. (Show Details)

Could not reproduce this myself. I tried with a few articles and it appears the PDF has a matching character set to the article. I just did a visual check. I can't actually read either variant.








Liuxinyu970226 added a comment.EditedDec 2 2017, 9:20 AM

@ABorbaWMF

"旅行话题" (hans), "主要旅行話題", "全部話題列表" (both are hant)... "准备" (hans), "護照·签证·行李" (mixing hant and hans)
Anyway, zh-classicalwiki doesn't have this problem, as they don't use hans, only hant haven't set up the same converter


There's my playground, from the "Guangzhou South Railway Station" (I modified /wiki/ to /zh-tw/):


Actual results: 广州南站,又称新广州站或新客站,广州市民常直呼其为南站...广州南站是部分在建的廣深港高速鐵路...广州南站也是番禺区乃至广州市内其中一个綜合交通樞紐...tl;dr
Expected results: 廣州南站,又稱新廣州站或新客站,廣州市民常直呼其為南站...廣州南站是部分在建的廣深港高速鐵路...廣州南站也是番禺區乃至廣州市内其中一個綜合交通樞紐...tl;dr
Perhaps the filename should also be expected to "廣州南站.pdf", but that is another topic


Although, somethings like -{zh-hans:foo;zh-hant:bar}- or -{zh-cn:foo;zh-tw:bar;zh-hk:foobar}- are however working (Why? T43716 working now?)

TJones added a subscriber: TJones.Dec 5 2017, 8:09 PM

I got asked to take a look at this ticket, and while the code is far from what I normally work on, I'm always like looking at issues dealing with language and all its interesting complexity.

I think @Cwek definitely identified the source of the problem and it looks like @Liuxinyu970226 has added the solution to the 2017 Community Wishlist; the language variant for the language converter is not getting used for whatever reason.

The random-seeming variation comes from the generated PDF using the "raw" text from the article, which for Chinese-language wikis can contain a mix of simplified and traditional characters. The problem extends to any wiki using the language converter.

The TV show Lost is on the main page of Chinese Wikipedia today and it happens to provide an example that's a little easier to see for people who don't read Chinese.

  • Lost, "raw" text. (If you are logged in, your language preferences can override this first example, so you may need to look at it in an incognito/private browser window.) Note that the title is 2 characters, and the character after the second em dash (—) is "個" (noting that it is boxy is detailed enough if you don't read Chinese at all).
  • Lost, zh-cn version. Title is still two characters, but the character after the second em dash looks like an up-arrow (个).
  • Lost, zh-hk version. Title is only one character, and the character after the second em dash is boxy again.

All three versions differ from each other, but if you download the PDF of the page from any of these three pages, you get the same result: the raw text with a two-character title and the boxy character after the second em dash. (I also checked and the PDFs are identical at the byte level.) Similarly, if you try to edit the page from any of the three versions, you get the raw text to edit.

For English speakers, two examples that are easier to parse come from Serbian and Inuktitut.

Serbian has both Latin and Cyrillic alphabets, and most articles are written in Cyrillic on Serbian Wikipedia. However our friend Lost is written in Latin. A random soccer article I found is written in Cyrillic. You can read both in either Latin or Cyrillic, but the PDF reverts to the original alphabet of the article—Latin for Lost, Cyrillic for soccer.

Here's a list of list of communities in Nunavut, Canada from the Inuktitut Wikipedia. The first column has the name in both syllabics and Latin writing systems. Converting the page to either syllabics or Latin gives you the same name twice in the first column. Whatever page you print the PDF from, though, you get one of each, because that's the "raw" version of the page.

Jdlrobson changed the task status from Open to Stalled.Jan 31 2018, 5:44 PM
Jdlrobson added a subscriber: Jdlrobson.

Stalled/Blocked on T159985

lisong added a subscriber: lisong.Mar 17 2018, 8:39 AM
ovasileva moved this task from Triage to Backlog on the Proton board.Feb 22 2019, 3:27 PM
TheDJ changed the task status from Stalled to Open.Jun 4 2019, 12:38 PM
TheDJ added a project: PDF-Rendering.
TheDJ added a subscriber: TheDJ.

The stalling/blocking ticket was closed, so I guess this can be 'unstalled'. Needs implementation via T213368: Support language variants in Proton ???