Page MenuHomePhabricator

Collection and ProofreadPage in Wikisource sometimes create nearly empty PDF files
Open, MediumPublic

Description

Author: p.selitskas

Description:
It seems to me that you can compile a book from a ProofreadPage-driven page (i.e. created by means of <pages /> tag) only in English Wikisource.

I don't know why exactly, but it's not a font problem :) (I checked against Belarusian, Deutsch and Russian Wikisource, neither works.)

I didn't dig deep into the Collection code, but the only fundamental difference between English setup and Belarusian/German/Russian setup is the naming of ProofreadPage namespaces (Author, Page & Index). But all of the tested Wikisource projects (be, de, ru) have $wgNamespaceAliases set up to fall back to the English variant.

So, this is either Collection doesn't follow wgNamespaceAliases rules, or there is something way more complicated (that is why I post this bug here, and not in the Wikimedia section).

Way to reproduce (in dewikisource)

  1. Get a page with text from ProofreadPage extension, but not one where all icons are green - you could start e.g. from https://de.wikisource.org/wiki/Special:Random/Index
  2. Click 'Drucken/exportieren -> Als PDF herunterladen'
  3. Wait until the PDF is done.
  4. Download it: Dokument herunterladen
  5. ...
  6. PROFIT! You can see nothing but a title and some rubbish, but not the actual text.

Version: unspecified
Severity: major

Details

Reference
bz41324

Event Timeline

bzimport raised the priority of this task from to High.Nov 22 2014, 1:10 AM
bzimport set Reference to bz41324.
bzimport added a subscriber: Unknown Object (MLST).

It sounds very configuration orientated

Pavel: Can you still reproduce? I cannot:

(In reply to comment #0)

  1. Get a page with text from ProofreadPage extension -

https://de.wikisource.org/wiki/Ah_%E2%80%93_Bah!

That page is empty and doesn't provide the "Print/export" option.

I tried https://de.wikisource.org/wiki/Seite:Topographia_Circuli_Burgundici_%28Merian%29_215.jpg (not yet proof-read) and didn't face any issues.

I tried https://de.wikisource.org/wiki/Seite:Literarischer_Verein_Stuttgart_IX_122.png (proof-read once) and it also worked correctly.

Looking at https://noc.wikimedia.org/conf/InitialiseSettings.php.txt this problem would have to happen on every wikisource project:
'wmgUseProofreadPage' =>

'wikisource' => true,

'wmgUseCollection' =>
'wikisource' => true, # 2009-02-24
so I'd expect more bug reports if this was still a problem?

p.selitskas wrote:

Hi, Andre.

You did the steps wrong to reproduce it. First of all, this is a concern of <pages/>, not the Page: namespace itself. I was trying to get a PDF of _compiled_ book (by means of <pages/>, in the main namespace), and you tested it agains non-embedded pages from Page: namespace.

Secondly, please be tolerant to Bugzilla's link parser :( The link for Ah - Bah! includes the exclamation sign, so please copy it with the exclamation sign.

Thirdly, I noticed that in Belarusian Wikisource, I can get a proper PDF with true contents, if _every_ included page is marked Validated (green). If there are proofread only (yellow) pages, or even lower level, then it fails and gives me a page with nothing, but license data.

To conclude, I can say that it's not NOT working at all, but it's either a bug, or some undocumented(?) behaviour (showing nothing in PDF if at least one non-validated page) which I personally find wrong. Anyway, I can't reproduce the same in English Wikisource, it renders a proper PDF regardless of page status.

Created attachment 11591
Generated PDF

Generated PDF file

Attached:

I generated the above PDF from

https://pl.wikisource.org/w/index.php?title=Wniosek_w_sprawie_ACTA

this file contains <pages/> and is rendered from the pages in the index. I can see that pages at different approval level (green, yellow, red) are also included.

p.selitskas wrote:

(In reply to comment #7)

de Testcase in comment 0 works for me, the poem is included in the PDF file.

I can still reproduce the problem in
https://be.wikisource.org/wiki/
%D0%93%D0%B5%D0%BE%D0%B3%D1%80%D0%B0%D1%84%D1%96%D1%8F_%D0%AD%D1%9E%D1%80%D0%
BE%D0%BF%D1%8B/
%D0%9F%D0%B0%D1%9E%D0%B4%D0%BD%D1%91%D0%B2%D0%B0%D1%8F_%D0%AD%D1%9E%D1%80%D0%
BE%D0%BF%D0%B0

I wonder if this might be an I18N issue instead.

Deutsch testcase has changed since then: all pages are verified now (status: green). When every included page is 'green', you get a proper PDF. Otherwise you get rubbish.

If bug 47596 is deployed, then perhaps this bug has nothing commong wil i18n.

Why is this in site requests?

(In reply to comment #8)

Deutsch testcase has changed since then: all pages are verified now (status:
green). When every included page is 'green', you get a proper PDF. Otherwise
you get rubbish.

What was the previous colour? If it's red, the page doesn't actually exist; it's just magically preloaded and displayed by ProofreadPage *as if* is existed.

p.selitskas wrote:

(In reply to comment #9)

Why is this in site requests?

I guess it is because it seemed like a configuration issue.

(In reply to comment #8)

Deutsch testcase has changed since then: all pages are verified now (status:
green). When every included page is 'green', you get a proper PDF. Otherwise
you get rubbish.

What was the previous colour? If it's red, the page doesn't actually exist;
it's just magically preloaded and displayed by ProofreadPage *as if* is
existed.

No, the status is greenish. Try this: https://be.wikisource.org/wiki/%D0%96%D1%8B%D0%B4%D1%8B_%D0%BD%D0%B0_%D0%91%D0%B5%D0%BB%D0%B0%D1%80%D1%83%D1%81%D1%96

Aklapper lowered the priority of this task from High to Medium.Dec 29 2014, 12:19 AM

Is this still a problem nowadays?

I tried creating a PDF out of https://be.wikisource.org/wiki/%D0%96%D1%8B%D0%B4%D1%8B_%D0%BD%D0%B0_%D0%91%D0%B5%D0%BB%D0%B0%D1%80%D1%83%D1%81%D1%96 and it has six pages.

Aklapper changed the task status from Open to Stalled.Dec 29 2014, 12:19 AM
Nemo_bis changed the task status from Stalled to Open.Dec 29 2014, 8:21 AM
Nemo_bis updated the task description. (Show Details)
Nemo_bis set Security to None.

Thanks for explaining. Confirming:

  1. Went to https://de.wikisource.org/wiki/Index:Handbuch_der_Politik_Band_1.pdf
  2. Clicked 'Drucken/exportieren -> Als PDF herunterladen'
  3. Waited until the PDF is done: "Fertig erstellt"
  4. Downloaded it: "Dokument herunterladen"
  5. Confirming that you see nothing but a title and some rubbish.

Creating PDFs never works from Index pages, on any Wikisource. You have to go to the top-level page in the main namespace for a work, and then export the PDF. For example, for Index:Handbuch_der_Politik_Band_1.pdf go to Politik als Staatskunst, and get a 7 page PDF will all the right contents.

Of course, if you want to get a PDF of the whole work (from multiple wiki pages), it's better to use the wsexport tool.

Creating PDFs never works from Index pages, on any Wikisource.

Collection recognises outgoing links in a certain format as a list of pages to include in the PDF. It should just do the same with Index pages.